Lecture 2: Monte Carlo Simulation

Similar documents
Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Topic 9: Sampling Distributions of Estimators

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Hypothesis Testing. Evaluation of Performance of Learned h. Issues. Trade-off Between Bias and Variance

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Module 1 Fundamentals in statistics

Tests of Hypotheses Based on a Single Sample (Devore Chapter Eight)

Random Variables, Sampling and Estimation

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Expectation and Variance of a random variable

DS 100: Principles and Techniques of Data Science Date: April 13, Discussion #10

January 25, 2017 INTRODUCTION TO MATHEMATICAL STATISTICS

Monte Carlo Integration

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

1 Approximating Integrals using Taylor Polynomials

Simulation. Two Rule For Inverting A Distribution Function

Estimation for Complete Data

MATH/STAT 352: Lecture 15

32 estimating the cumulative distribution function

Direction: This test is worth 150 points. You are required to complete this test within 55 minutes.

Lecture 5: Parametric Hypothesis Testing: Comparing Means. GENOME 560, Spring 2016 Doug Fowler, GS

Chapter 6 Sampling Distributions

Chapter 2 The Monte Carlo Method

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

6.3 Testing Series With Positive Terms

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Chapter 6 Principles of Data Reduction

FACULTY OF MATHEMATICAL STUDIES MATHEMATICS FOR PART I ENGINEERING. Lectures

1 Review of Probability & Statistics

MOST PEOPLE WOULD RATHER LIVE WITH A PROBLEM THEY CAN'T SOLVE, THAN ACCEPT A SOLUTION THEY CAN'T UNDERSTAND.

Lecture 15: Learning Theory: Concentration Inequalities

A statistical method to determine sample size to estimate characteristic value of soil parameters

Stat 421-SP2012 Interval Estimation Section

(7 One- and Two-Sample Estimation Problem )

Properties and Hypothesis Testing

Lecture 11 and 12: Basic estimation theory

Math 152. Rumbos Fall Solutions to Review Problems for Exam #2. Number of Heads Frequency

Problem Set 4 Due Oct, 12

2 1. The r.s., of size n2, from population 2 will be. 2 and 2. 2) The two populations are independent. This implies that all of the n1 n2

Lecture 3: MLE and Regression

Big Picture. 5. Data, Estimates, and Models: quantifying the accuracy of estimates.

This is an introductory course in Analysis of Variance and Design of Experiments.

Exponential Families and Bayesian Inference

AMS570 Lecture Notes #2

Understanding Samples

Parameter, Statistic and Random Samples

1 Inferential Methods for Correlation and Regression Analysis

HOMEWORK #10 SOLUTIONS

Frequentist Inference

Recall the study where we estimated the difference between mean systolic blood pressure levels of users of oral contraceptives and non-users, x - y.

Introduction to Econometrics (3 rd Updated Edition) Solutions to Odd- Numbered End- of- Chapter Exercises: Chapter 3

The picture in figure 1.1 helps us to see that the area represents the distance traveled. Figure 1: Area represents distance travelled

Stat 319 Theory of Statistics (2) Exercises

Statistics 511 Additional Materials

If, for instance, we were required to test whether the population mean μ could be equal to a certain value μ

GG313 GEOLOGICAL DATA ANALYSIS

Power and Type II Error

Clases 7-8: Métodos de reducción de varianza en Monte Carlo *

Output Analysis (2, Chapters 10 &11 Law)

Table 12.1: Contingency table. Feature b. 1 N 11 N 12 N 1b 2 N 21 N 22 N 2b. ... a N a1 N a2 N ab

BIOSTATISTICS. Lecture 5 Interval Estimations for Mean and Proportion. dr. Petr Nazarov

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Bayesian Methods: Introduction to Multi-parameter Models

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Lecture 3. Properties of Summary Statistics: Sampling Distribution

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Element sampling: Part 2

4.1 Sigma Notation and Riemann Sums

The standard deviation of the mean

Topic 10: The Law of Large Numbers


Topic 10: Introduction to Estimation

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Math 155 (Lecture 3)

Math 140 Introductory Statistics

1 Introduction to reducing variance in Monte Carlo simulations

Output Analysis and Run-Length Control

A quick activity - Central Limit Theorem and Proportions. Lecture 21: Testing Proportions. Results from the GSS. Statistics and the General Population

Lecture 6 Chi Square Distribution (χ 2 ) and Least Squares Fitting

Topic 18: Composite Hypotheses

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Chapter 13: Tests of Hypothesis Section 13.1 Introduction

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Stochastic Simulation

Data Analysis and Statistical Methods Statistics 651

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Statistical inference: example 1. Inferential Statistics

SYDE 112, LECTURE 2: Riemann Sums

Lecture 33: Bootstrap

April 18, 2017 CONFIDENCE INTERVALS AND HYPOTHESIS TESTING, UNDERGRADUATE MATH 526 STYLE

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

CSE 527, Additional notes on MLE & EM

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

n outcome is (+1,+1, 1,..., 1). Let the r.v. X denote our position (relative to our starting point 0) after n moves. Thus X = X 1 + X 2 + +X n,

Introduction to Machine Learning DIS10

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

Transcription:

STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do? The fuctio e x3 does ot seem to have a closed form solutio so we have to use some computer experimet to evaluate this umber The traditioal approach to evaluate this itegratio is to use so-called the Riema Itegratio, where we choose poits x,, x K evely spread out over the iterval [, ] ad the we evaluate f(x,, f(x K ad fially use K K f(x i to evaluate the itegratio Whe the fuctio is smooth ad K, this umerical itegratio coverges to the actual itegratio ow we will itroduce a alterative approach to evaluate such a itegratio First, we rewrite the itegratio as ( e x3 dx = E e U 3, where U is a uiform radom variable over the iterval [, ] Thus, the itegratio is actually a expected value of a radom variable e U 3, which implies that evaluatig the itegratio is the same as estimatig the expected value So we ca geerate IID radom variables U,, U K Ui[, ] ad the compute W = e U 3,, WK = e U 3 K ad fially use W K = K K W i = K K e U 3 i as a umerical evaluatio of e x3 dx By the Law of Large umber, W K ( P E(Wi = E e U 3 i = so this alterative umerical method is statistically cosistet e x3 dx, I the above example, the itegratio ca be writte as I = f(xp(xdx, (2 2-

2-2 Lecture 2: ote Carlo Simulatio where f is some fuctio ad p is a probability desity fuctio Let X be a radom variable with desity p The equatio (2 equals f(xp(xdx = E(f(X = I amely, the result of this itegratio is the same as the expected value of the radom variable f(x The alterative umerical method to evaluate the above itegratio is to geerate IID X,, X p, data poits, ad the use the sample average Î = f(x i This method, the method of evaluatig the itegratio via simulatig radom poits, is called the itegratio by ote Carlo Simulatio A appealig feature of the ote Carlo Simulatio is that the statistical theory is rooted i the theory of sample average We are usig the sample average as a estimator of the expected value We have already see that the bias ad variace of a estimator are key quatities of evaluatig the quality of a estimator What will be the bias ad variace of our ote Carlo Simulatio estimator? The bias is simple we are usig the sample average as a estimator of it expected value, so the bias(î = The variace will the be Var(Î = Var(f(X = E(f 2 (X E 2 (f(x }{{} I 2 = ( f 2 (xp(xdx I 2 Thus, the variace cotais two compoets: f 2 (xp(xdx ad I 2 Give a problem of evaluatig a itegratio, the quatity I is fixed What we ca choose is the umber of radom poits ad the samplig distributio p! A importat fact is that whe we chage the samplig distributio p, the fuctio f will also chage For istace, i the example of evaluatig e x3 dx, we have see a example of usig uiform radom variables to evaluate it We ca also geerate IID B,, B K Beta(2, 2, K poits from the beta distributio Beta(2,2 ote that the PDF of Beta(2,2 is p Beta(2,2 (x = 6x( x (22 We ca the rewrite e x3 dx = ( e x3 e B3 6x( x dx = E 6x( x }{{} 6B ( B }{{} p(x f(x What is the effect of usig differet samplig distributio p? The expectatio is always fixed to be I so the secod part of the variace remais the same However, the first part of the variace f 2 (xp(xdx depeds how you choose p ad the correspodig f Thus, differet choices of p leads to a differet variace of the estimator We will talk about how to choose a optimal p i Chapter 4 whe we talk about theory of importace samplig

Lecture 2: ote Carlo Simulatio 2-3 22 Estimatig a Probability via Simulatio Here is a example of evaluatig the power of a Z-test Let X,, X 6 be a size 6 radom sample Let the ull hypothesis ad the alterative hypothesis be H : X i (,, H a : X i (µ,, where µ Uder the sigificace level α, the two-tailed Z-test is to reject H if 6 X 6 z α/2, where z t = F (t, where F is the CDF of the stadard ormal distributio Assume that the true value of µ is µ = I this case, the ull hypothesis is wrog ad we should reject the ull However, due to the radomess of samplig, we may ot be able to reject the ull every time So a quatity we will be iterested i is: what is the probability of rejectig the ull uder such µ? I statistics, this probability (the probability that we reject H is called the power of a test Ideally, if H is icorrect, we wat the power to be as large as possible What will the power be whe µ =? Here is the aalytical derivatio of the power (geerally deoted as β: β = P (Reject H µ = = P ( 6 X 6 z α/2 µ =, X6 (µ, /6 = P (4 (, /6 z α/2 = P ( (4, z α/2 = P ((4, z α/2 + P ((4, z α/2 Wellthis umber does ot seem to be a easy oe = P ((, z α/2 4 + P ((, 4 z α/2 What should we do i practice to compute the power? Here is a alterative approach of computig the power usig the ote Carlo Simulatio The idea is that we geerate samples, each cosists of 6 IID radom variables from (, (the distributio uder the alterative For each sample, we compute the Z-test statistic, 6 X 6, ad check if we ca reject H or ot (ie, checkig if this umber is greater tha or equal to z α/2 At the ed, we use the ratio of total umber of H beig rejected as a estimate of the power β Here is a diagram describig how the steps are carried out: (, geerates (, geerates (, geerates 6 observatios compute 6 observatios compute 6 observatios compute test statistic test statistic ( RejectH 6 X6 D = Yes(/o( ( RejectH 6 X6 D2 = Yes(/o( ( RejectH test statistic 6 X6 D = Yes(/o( Each sample will ed up with a umber D i such that D i = if we reject H ad D i = if we do ot reject H Because the ote Carlo Simulatio approach is to use the ratio of total umber of H beig rejected to estimate β, this ratio is j= D = D j Is the ote Carlo Simulatio approach a good approach to estimate β? The aswer is yes it is a good approach of estimatig β ad moreover, we have already leared the statistical theory of such a procedure! (23

2-4 Lecture 2: ote Carlo Simulatio The estimator D is just a sample average ad each D j turs out to be a Beroulli radom variable with parameter p = P (Reject H µ = = β by equatio (23 Therefore, bias ( D = E( D β = p β = Var ( p( p D = = SE ( D, β β( β = β( β Thus, the ote Carlo Simulatio method yields a cosistet estimator of the power: D P β Although here we study the ote Carlo Simulatio estimator of such a special case, this idea ca be easily to geeralize to may other situatio as log as we wat to evaluate certai umbers I moder statistical aalysis, most papers with simulatio results will use some ote Carlo Simulatios to show the umerical results of the proposed methods i the paper The followig two figures preset the power β as a fuctio of the value of µ (blue curve with α = The red curves are the estimated power by ote Carlo simulatios usig = 25 ad Power 2 4 6 8 =25 Power 2 4 6 8 = 2 2 µ 2 2 µ The gray lie correspods to the value of power beig Thik about why the power curve (blue curve hits the gray lie at µ = 23 Estimatig Distributio via Simulatio ote Carlo Simulatio ca also be applied to estimate a ukow distributio as log as we ca geerate data from such a distributio I Bayesia aalysis, people are ofte iterested i the so-called posterior distributio Very ofte, we kow how to geerate poits from a posterior distributio but we caot write dow its closed form I this situatio, what we ca do is to simulate may poits ad estimate the distributio usig these simulated poits So the task becomes:

Lecture 2: ote Carlo Simulatio 2-5 give X,, X F (or PDF p, we wat to estimate F (or the PDF p Estimatig the CDF usig EDF To estimate the CDF, a simple but powerful approach is to use the EDF: F (x = I(X i x We have already leared a lot about EDF i the previous chapter Estimatig the PDF usig histogram If the goal is to estimate the PDF, the this problem is called desity estimatio, which is a cetral topic i statistical research Here we will focus o the perhaps simplest approach: histogram ote that we will have a more i-depth discussio about other approaches i Chapter 8 For simplicity, we assume that X i [, ] so p(x is o-zero oly withi [, ] We also assume that p(x is smooth ad p (x L for all x (ie the derivative is bouded The histogram is to partitio the set [, ] (this regio, the regio with o-zero desity, is called the support of a desity fuctio ito several bis ad usig the cout of the bi as a desity estimate Whe we have bis, this yields a partitio: B = [ [ [,, B 2 =, 2 2,, B =, [ ], B =, I such case, the for a give poit x B l, the desity estimator from the histogram will be p (x = umber of observatios withi B l legth of the bi = I(X i B l The ituitio of this desity estimator is that the histogram assig equal desity value to every poits withi the bi So for B l that cotais x, the ratio of observatios withi this bi is I(X i B l, which should be equal to the desity estimate times the legth of the bi ow we study the bias of the histogram desity estimator E ( p (x = P (X i B l l = p(udu l ( ( l = F = F ( l F ( F l / = F ( ( l F l l l [ l = p(x, x ( l ], l The last equality is doe by the mea value theorem with F (x = p(x By the mea value theorem agai, there exists aother poit x betwee x ad x such that p(x p(x x x = p (x

2-6 Lecture 2: ote Carlo Simulatio Thus, the bias bias( p (x = E ( p (x p(x = p(x p(x = p (x (x x p (x x x L (24 ote that i the last iequality we use the fact that both x ad x are withi B l, whose total legth is /, so the x x / The aalysis of the bias tells us that the more bis we are usig, the less bias the histogram has This makes sese because whe we have may bis, we have a higher resolutio so we ca approximate the fie desity structure better ow we tur to the aalysis of variace Var( p (x = 2 Var ( I(X i B l = 2 P (X i B l ( P (X i B l By the derivatio i the bias, we kow that P (X i B l = p(x, so the variace ( p(x Var( p (x = 2 p(x = p(x + p2 (x The aalysis of the variace has a iterestig result: the more bis we are usig, the higher variace we are sufferig ow if we cosider the SE, the patter will be more ispirig The SE is (25 SE( p (x = bias 2 ( p (x + Var( p (x L2 2 + p(x + p2 (x (26 A iterestig feature of the histogram is that: we ca choose, the umber of bis Whe is too large, the first quatity (bias will be small while the secod quatity (variace will be large; this case is called udersmoothig Whe is too small, the first quatity (bias is large but the secod quatity (variace is small; this case is called oversmoothig To balace the bias ad variace, we choose that miimizes the SE, which leads to ( L 2 /3 opt = p(x (27 Although i practice the quatity L ad p(x are ukow so we caot chose the optimal opt, the rule i equatio (27 tells us how we should chage the umber of bis whe we have more ad more sample size Practical rule of selectig is related to the problem of badwidth selectio, a research topic i statistics