STAT/Q SCI 43: Itroductio to Resamplig ethods Sprig 27 Istructor: Ye-Chi Che Lecture 2: ote Carlo Simulatio 2 ote Carlo Itegratio Assume we wat to evaluate the followig itegratio: e x3 dx What ca we do? The fuctio e x3 does ot seem to have a closed form solutio so we have to use some computer experimet to evaluate this umber The traditioal approach to evaluate this itegratio is to use so-called the Riema Itegratio, where we choose poits x,, x K evely spread out over the iterval [, ] ad the we evaluate f(x,, f(x K ad fially use K K f(x i to evaluate the itegratio Whe the fuctio is smooth ad K, this umerical itegratio coverges to the actual itegratio ow we will itroduce a alterative approach to evaluate such a itegratio First, we rewrite the itegratio as ( e x3 dx = E e U 3, where U is a uiform radom variable over the iterval [, ] Thus, the itegratio is actually a expected value of a radom variable e U 3, which implies that evaluatig the itegratio is the same as estimatig the expected value So we ca geerate IID radom variables U,, U K Ui[, ] ad the compute W = e U 3,, WK = e U 3 K ad fially use W K = K K W i = K K e U 3 i as a umerical evaluatio of e x3 dx By the Law of Large umber, W K ( P E(Wi = E e U 3 i = so this alterative umerical method is statistically cosistet e x3 dx, I the above example, the itegratio ca be writte as I = f(xp(xdx, (2 2-
2-2 Lecture 2: ote Carlo Simulatio where f is some fuctio ad p is a probability desity fuctio Let X be a radom variable with desity p The equatio (2 equals f(xp(xdx = E(f(X = I amely, the result of this itegratio is the same as the expected value of the radom variable f(x The alterative umerical method to evaluate the above itegratio is to geerate IID X,, X p, data poits, ad the use the sample average Î = f(x i This method, the method of evaluatig the itegratio via simulatig radom poits, is called the itegratio by ote Carlo Simulatio A appealig feature of the ote Carlo Simulatio is that the statistical theory is rooted i the theory of sample average We are usig the sample average as a estimator of the expected value We have already see that the bias ad variace of a estimator are key quatities of evaluatig the quality of a estimator What will be the bias ad variace of our ote Carlo Simulatio estimator? The bias is simple we are usig the sample average as a estimator of it expected value, so the bias(î = The variace will the be Var(Î = Var(f(X = E(f 2 (X E 2 (f(x }{{} I 2 = ( f 2 (xp(xdx I 2 Thus, the variace cotais two compoets: f 2 (xp(xdx ad I 2 Give a problem of evaluatig a itegratio, the quatity I is fixed What we ca choose is the umber of radom poits ad the samplig distributio p! A importat fact is that whe we chage the samplig distributio p, the fuctio f will also chage For istace, i the example of evaluatig e x3 dx, we have see a example of usig uiform radom variables to evaluate it We ca also geerate IID B,, B K Beta(2, 2, K poits from the beta distributio Beta(2,2 ote that the PDF of Beta(2,2 is p Beta(2,2 (x = 6x( x (22 We ca the rewrite e x3 dx = ( e x3 e B3 6x( x dx = E 6x( x }{{} 6B ( B }{{} p(x f(x What is the effect of usig differet samplig distributio p? The expectatio is always fixed to be I so the secod part of the variace remais the same However, the first part of the variace f 2 (xp(xdx depeds how you choose p ad the correspodig f Thus, differet choices of p leads to a differet variace of the estimator We will talk about how to choose a optimal p i Chapter 4 whe we talk about theory of importace samplig
Lecture 2: ote Carlo Simulatio 2-3 22 Estimatig a Probability via Simulatio Here is a example of evaluatig the power of a Z-test Let X,, X 6 be a size 6 radom sample Let the ull hypothesis ad the alterative hypothesis be H : X i (,, H a : X i (µ,, where µ Uder the sigificace level α, the two-tailed Z-test is to reject H if 6 X 6 z α/2, where z t = F (t, where F is the CDF of the stadard ormal distributio Assume that the true value of µ is µ = I this case, the ull hypothesis is wrog ad we should reject the ull However, due to the radomess of samplig, we may ot be able to reject the ull every time So a quatity we will be iterested i is: what is the probability of rejectig the ull uder such µ? I statistics, this probability (the probability that we reject H is called the power of a test Ideally, if H is icorrect, we wat the power to be as large as possible What will the power be whe µ =? Here is the aalytical derivatio of the power (geerally deoted as β: β = P (Reject H µ = = P ( 6 X 6 z α/2 µ =, X6 (µ, /6 = P (4 (, /6 z α/2 = P ( (4, z α/2 = P ((4, z α/2 + P ((4, z α/2 Wellthis umber does ot seem to be a easy oe = P ((, z α/2 4 + P ((, 4 z α/2 What should we do i practice to compute the power? Here is a alterative approach of computig the power usig the ote Carlo Simulatio The idea is that we geerate samples, each cosists of 6 IID radom variables from (, (the distributio uder the alterative For each sample, we compute the Z-test statistic, 6 X 6, ad check if we ca reject H or ot (ie, checkig if this umber is greater tha or equal to z α/2 At the ed, we use the ratio of total umber of H beig rejected as a estimate of the power β Here is a diagram describig how the steps are carried out: (, geerates (, geerates (, geerates 6 observatios compute 6 observatios compute 6 observatios compute test statistic test statistic ( RejectH 6 X6 D = Yes(/o( ( RejectH 6 X6 D2 = Yes(/o( ( RejectH test statistic 6 X6 D = Yes(/o( Each sample will ed up with a umber D i such that D i = if we reject H ad D i = if we do ot reject H Because the ote Carlo Simulatio approach is to use the ratio of total umber of H beig rejected to estimate β, this ratio is j= D = D j Is the ote Carlo Simulatio approach a good approach to estimate β? The aswer is yes it is a good approach of estimatig β ad moreover, we have already leared the statistical theory of such a procedure! (23
2-4 Lecture 2: ote Carlo Simulatio The estimator D is just a sample average ad each D j turs out to be a Beroulli radom variable with parameter p = P (Reject H µ = = β by equatio (23 Therefore, bias ( D = E( D β = p β = Var ( p( p D = = SE ( D, β β( β = β( β Thus, the ote Carlo Simulatio method yields a cosistet estimator of the power: D P β Although here we study the ote Carlo Simulatio estimator of such a special case, this idea ca be easily to geeralize to may other situatio as log as we wat to evaluate certai umbers I moder statistical aalysis, most papers with simulatio results will use some ote Carlo Simulatios to show the umerical results of the proposed methods i the paper The followig two figures preset the power β as a fuctio of the value of µ (blue curve with α = The red curves are the estimated power by ote Carlo simulatios usig = 25 ad Power 2 4 6 8 =25 Power 2 4 6 8 = 2 2 µ 2 2 µ The gray lie correspods to the value of power beig Thik about why the power curve (blue curve hits the gray lie at µ = 23 Estimatig Distributio via Simulatio ote Carlo Simulatio ca also be applied to estimate a ukow distributio as log as we ca geerate data from such a distributio I Bayesia aalysis, people are ofte iterested i the so-called posterior distributio Very ofte, we kow how to geerate poits from a posterior distributio but we caot write dow its closed form I this situatio, what we ca do is to simulate may poits ad estimate the distributio usig these simulated poits So the task becomes:
Lecture 2: ote Carlo Simulatio 2-5 give X,, X F (or PDF p, we wat to estimate F (or the PDF p Estimatig the CDF usig EDF To estimate the CDF, a simple but powerful approach is to use the EDF: F (x = I(X i x We have already leared a lot about EDF i the previous chapter Estimatig the PDF usig histogram If the goal is to estimate the PDF, the this problem is called desity estimatio, which is a cetral topic i statistical research Here we will focus o the perhaps simplest approach: histogram ote that we will have a more i-depth discussio about other approaches i Chapter 8 For simplicity, we assume that X i [, ] so p(x is o-zero oly withi [, ] We also assume that p(x is smooth ad p (x L for all x (ie the derivative is bouded The histogram is to partitio the set [, ] (this regio, the regio with o-zero desity, is called the support of a desity fuctio ito several bis ad usig the cout of the bi as a desity estimate Whe we have bis, this yields a partitio: B = [ [ [,, B 2 =, 2 2,, B =, [ ], B =, I such case, the for a give poit x B l, the desity estimator from the histogram will be p (x = umber of observatios withi B l legth of the bi = I(X i B l The ituitio of this desity estimator is that the histogram assig equal desity value to every poits withi the bi So for B l that cotais x, the ratio of observatios withi this bi is I(X i B l, which should be equal to the desity estimate times the legth of the bi ow we study the bias of the histogram desity estimator E ( p (x = P (X i B l l = p(udu l ( ( l = F = F ( l F ( F l / = F ( ( l F l l l [ l = p(x, x ( l ], l The last equality is doe by the mea value theorem with F (x = p(x By the mea value theorem agai, there exists aother poit x betwee x ad x such that p(x p(x x x = p (x
2-6 Lecture 2: ote Carlo Simulatio Thus, the bias bias( p (x = E ( p (x p(x = p(x p(x = p (x (x x p (x x x L (24 ote that i the last iequality we use the fact that both x ad x are withi B l, whose total legth is /, so the x x / The aalysis of the bias tells us that the more bis we are usig, the less bias the histogram has This makes sese because whe we have may bis, we have a higher resolutio so we ca approximate the fie desity structure better ow we tur to the aalysis of variace Var( p (x = 2 Var ( I(X i B l = 2 P (X i B l ( P (X i B l By the derivatio i the bias, we kow that P (X i B l = p(x, so the variace ( p(x Var( p (x = 2 p(x = p(x + p2 (x The aalysis of the variace has a iterestig result: the more bis we are usig, the higher variace we are sufferig ow if we cosider the SE, the patter will be more ispirig The SE is (25 SE( p (x = bias 2 ( p (x + Var( p (x L2 2 + p(x + p2 (x (26 A iterestig feature of the histogram is that: we ca choose, the umber of bis Whe is too large, the first quatity (bias will be small while the secod quatity (variace will be large; this case is called udersmoothig Whe is too small, the first quatity (bias is large but the secod quatity (variace is small; this case is called oversmoothig To balace the bias ad variace, we choose that miimizes the SE, which leads to ( L 2 /3 opt = p(x (27 Although i practice the quatity L ad p(x are ukow so we caot chose the optimal opt, the rule i equatio (27 tells us how we should chage the umber of bis whe we have more ad more sample size Practical rule of selectig is related to the problem of badwidth selectio, a research topic i statistics