Estimating the Density of a Conditional Expectation

Size: px

Start display at page:

Download "Estimating the Density of a Conditional Expectation"

Louise Lester
5 years ago
Views:

1 Estiating te Density of a Conditional Expectation Sauel G. Steckley Sane G. Henderson David Ruppert Ran Yang Daniel W. Apley Jerey Stau Abstract In tis paper, we analyze etods for estiating te density of a conditional expectation. We copare an estiator based on a straigtforward application of kernel density estiation to a bias-corrected estiator tat we propose. We prove convergence results for tese estiators and sow tat te bias-corrected estiator as a superior rate of convergence. In a siulated test case, we sow tat te bias-corrected estiator perfors better in a practical exaple wit a realistic saple size.. Introduction Tis paper proposes and analyzes iproved etods for estiating te density of a conditional expectation. Te following exaple illustrates and otivates te proble of estiating te density of a conditional expectation. A paraceutical production process produces batces of an ingredient. Te true sodiu content of te ingredient randoly varies across batces. It is desired to learn te density of te true sodiu content. Wen a saple taken fro a batc is subjected to ass spectroetry, it yields an unbiased, noisy easureent of te sodiu content in tis batc. Due to tis noise, te sodiu content is easured separately for ultiple (oogenized saples taken fro eac batc. Te goal ere is to learn te density of te true sodiu content, based on te noisy spectroetry easureents. We consider a general fraework into wic tis exaple fits. Let Z be an unobserved rando variable. In te sodiu easureent exaple, Z is te true sodiu content in a batc. Let X be an observed rando variable tat as probabilistic dependence wit Z. In te sodiu easureent exaple, X is a easureent of sodiu content. Let Y = E(X Z be te conditional expectation of X given Z. We are seeking to estiate te density of Y based on observations of X. In te sodiu easureent exaple, Y = Z, te true sodiu content in a batc. In general, owever, Y and Z can be different. For exaple, Z could be a student s standardized test scores and Y could be te conditional expectation of X, te student s incoe at age 3. Suppose tat observations of X are available in n saples, eac saple being associated wit a single value of te rando variable Z, as in te following statistical odel: X ij = Y i + U ij, i =,..., n, j =,...,, ( Y i = E(X Z = Z i, i =,..., n. ( Here X ij is te jt observation of X in te it saple, U ij = X ij Y i is its observation error relative to Y i, its conditional expectation given Z i, and Z i is te value of te rando variable Z associated wit te it saple of observations, X i,..., X i. Our assuptions are:. We ave an i.i.d. saple Z,..., Z n fro te distribution of Z.. For eac i =,..., n, X i,..., X i is an i.i.d. saple fro te conditional distribution of X given Z = Z i.

2 3. For any value of Z, te conditional expectation Y = E(X Z exists and is finite. 4. Te conditional expectation Y as a density. We do not assue tat te distribution of te observation error U = Y Z is independent of Y or Z, or tat it is known. In te sodiu easureent exaple, te variance of te observation error U is larger for larger values of te sodiu content Y. Significant eteroscedasticity also appears in stocastic siulation input uncertainty analysis [Henderson, 3], anoter exaple tat as otivated tis work. Substantial bias in density estiation can result fro ignoring eteroscedasticity [Staudenayer et al., 8]. We estiate te density of conditional expectation using kernel sooting [Wand and Jones, 995]. Te standard setting for kernel sooting for density estiation is as follows: Suppose (Y i : i n is a sequence of independent rando variables wit density g. Te standard kernel sooting estiator is ĝ(x; = n n i= K ( x Yi, (3 were te kernel K is typically cosen to be a uniodal probability density function tat is syetric about zero, and is te bandwidt. Te estiator of (3 iediately suggests tat we can estiate f(x, te density of Y = E(X Z evaluated at x, fro odel ( via ˆf(x;, n, = n n i= K ( x X (Z i, (4 wit X (Z i = Y i + j= U ij/. We call tis te standard estiator. It was analyzed by Steckley and Henderson [3]. In teir fraework, te cost of te experient fro wic tis estiator is generated is δ n + δ n, were δ and δ are te average cost used to generate Z i and X ij given Z i, respectively. Steckley and Henderson [3] gave results on te convergence rate of te ean squared error of te standard kernel estiator as te experient cost goes to infinity, including an analysis of asyptotically optial coices of, n, and. Our paper akes tree contributions to estiation of te density of a conditional expectation. First, we extend te results of Steckley and Henderson [3], wo analyzed only te case in wic Z is univariate and te conditional expectation Y = E(X Z is onotone in Z. In tis paper, results are presented tat apply ore broadly to ultivariate Z. Second, we propose and analyze a bias-corrected estiator, wic as a better rate of convergence tan te standard estiator. Tird, we create practical etods for selecting te saple sizes n and in te experient design and te bandwidt in te experient analysis. Before addressing tese contributions in detail, we explain wy tis paper is based on kernel sooting instead of kernel deconvolution [Wand and Jones, 995]. One reason is tat, wen using kernel sooting, it is possible to derive expressions for asyptotic ean integrated squared error. Tese expressions provide te foundation for sowing tat te bias-corrected estiator as a better rate of convergence tan te standard estiator, and for te etods to select te saple sizes and te bandwidt. Anoter reason is tat deconvolution is, in a sense, not necessary. In our asyptotic setting in wic, te standard estiator (4 based on kernel sooting is consistent. Furterore, our biascorrected estiator is an alternative to deconvolution, in te sense tat it does soeting different to reduce te bias caused by observation error. A tird reason is tat kernel deconvolution can be applied to te proble of estiating te density of a conditional expectation only under ore restrictive assuptions tan we require wen using kernel sooting.

3 Most kernel deconvolution etods are based on te assuption tat te easureent error U is independent of Y, i.e., te easureent errors U ij ave a coon distribution for all i =,..., n and j =,..., [Carroll and Hall, 988]. Delaigle and Meister [8] ave a kernel deconvolution etod tat allows for errors to ave different distributions, but all te error distributions ust be known. Delaigle et al. [8] ave a kernel deconvolution etod for te case of a single unknown error distribution. It sees tat it would not be practical to identify te error distributions wen tey differ and are unknown, unless one could ake soe very strong assuptions. Oter deconvolution etods tat allow for eteroscedastic errors assue te errors are noral [Staudenayer et al., 8, McIntyre and Stefanski, ]. Te etods we consider do not require te errors to be noral; we erely perfor asyptotic analysis using a noral approxiation to te distribution of an average of any errors, wic can be justified by te central liit teore. We analyze te convergence rate of our proposed estiators in te easure of ean squared error (se and also ean integrated squared error (ise. It is well known tat if te density function g is continuous, te standard estiator (3 is consistent in quadratic ean. Tat is to say, se(ĝ(x; converges to zero for all x R. It is also well known [Prakasa Rao, 983] tat if g is twice continuously differentiable suc tat g is bounded and square integrable, ise converges to zero at an optial rate of n 4/5 were n ere is te saple size. We will sow tat our standard estiator (4 is consistent in quadratic ean and ise converges to zero at an optial rate of c 4/7 were c is te experient budget. Tis is te sae rate tat Steckley and Henderson [3] coputed for te case in wic Z is univariate. We also discuss te convergence of te bias-corrected version of our estiator and sow tat se optially converges to zero at a rate of c 8/. Tese optial rates of convergence depend on asyptotically optial coices of te saple sizes n and and te bandwidt. Tese questions of optial rates of convergence and te allocation of an experient budget c to saple sizes and n ave also been addressed in related but distinct settings. Lee [998], Lee and Glynn [3], Gordy and Juneja [], and Broadie et al. [] studied estiation of te distribution function of a conditional expectation. We believe tat density estiation is also iportant because a density is ore easily interpreted visually tan a distribution function. Estiation of te distribution is rater different fro estiation of te density, because tecniques suc as kernel sooting are not necessary. Sun et al. [] studied estiation of te variance of a conditional expectation. Developent of a bias-corrected estiator is anoter priary contribution of te present paper. Jones and Signorini [997] review bias-correction in kernel density estiation. Te bias tey address is caused by te kernel sooting, wile we attept to address te bias due to bot kernel sooting and noisy observations. We ipleent a etod siilar to jackknife bias-correction [Efron and Tibsirani, 993]. Kernel sooting etods require te selection of te bandwidt. Te perforance of kernel sooting is quite dependent on bandwidt selection, wic as received uc attention [Wand and Jones, 995]. Sculan [998] reviews soe odern bandwidt selection etods in te context of local polynoial regression, a type of kernel regression. One suc etod is te epirical-bias bandwidt selection (EBBS developed by Ruppert [997]. In our setting, we ust coose te bandwidt, but given an experient budget c, we ust also coose te nuber n of saples of Z and te nuber of saples of X given eac Z. Applying te ideas fro EBBS, we develop a data-driven etod to select eac of tese paraeters. Te rest of te paper is organized as follows. In Section we forulate estiators for te density of te conditional expectation and present convergence results. In Section 3, we develop a data-based selection etod for te bandwidt and te saple sizes n and,

4 based on EBBS. We discuss te reasons for coosing tis etod and present te algorit. In Section 4, we ten explore te perforance of te estiators for a siulated test case and for te sodiu easureent exaple.. Estiating te Density of te Conditional Expectation and Convergence Results First consider te standard estiator (4 were X (Z i is considered as an observation of E(X Z i wit easureent error. Tis standard estiator is otivated by te standard estiator for standard kernel density estiation. Te easureent error results in additional sooting beyond tat coes fro kernel sooting. A siilar double sooting was noted in Staudenayer and Ruppert [4]. He considered te proble of local polynoial regression in wic te covariates are easured wit error. Te double sooting increases te bias of our estiator given in (4 as copared wit te estiator (3. Specifically, te additional sooting results in an additional leading ter in te bias expansion. Tis creates an additional leading ter in te se and ise expansions given in Teores and 3 in Section., were we present convergence results and proofs for te standard estiator. In Section. we consider a bias-corrected version. We derive asyptotic expressions of te se for te estiators and establis an iproveent in te optial rate of convergence.. Convergence Results: Standard Estiator In tis section we study te error in te estiator ˆf(x; (c, n(c, as te experient budget c goes to infinity. For any fixed c, te nuber of internal saples, and te nuber of external saples n, ust be cosen so tat te total cost is c. Note tat (c and n(c are tus functions of te experient budget c. We assue tat (c as c so tat X (c (z E(X z alost surely. Assuing (c, δ n(c + δ n(c(c δ n(c(c. One can assue, by a selection of units, tat δ = witout loss of generality. Ten (c and n(c ust be cosen to satisfy te asyptotic relationsip (cn(c/c as c. Te bandwidt = (c is also a function of c. To keep te notation less cubersoe, te dependence of, n, and on c will be suppressed in te calculations. We will present results concerning te convergence of te estiator as te experient budget c tends to. We consider te following two error criteria. For all x R, define te ean squared error (se of te estiator evaluated at x as se( ˆf(x;, n, = E ( ˆf(x;, n, f(x. Define te ean integrated squared error (ise of te estiator as ise( ˆf( ;, n, = E ( ˆf(x;, n, f(x dx. Tese error criteria are not witout drawbacks (see Devroye and Lúgosi [] but te ateatical siplicity is appealing. Before stating our results, we consider te distribution of te observations ( X (Z i : i n and in doing so, we will collect soe of te assuptions needed for te results. Let N(α, α denote a norally distributed rando variable wit ean α and variance α. For two rando objects X and Y, define te notation X = d Y to ean X and Y are equal in distribution. Denote µ( E(X Z = and σ ( var(x Z =. Trougout tis paper we assue te following:

5 A. Conditional on (Z i : i n, X (Z i = d N(µ(Z i, σ (Z i for i =,..., n and ( X (Z i : i n are conditionally independent. Tis essentially iplies tat te internal saples X(Z conditional on Z are unbiased and norally distributed. Of course, if te assuptions of one of te any versions of te central liit teore old, ten for large tis assuption is approxiately true. We now turn to te distribution of te observations ( X (Z i : i n, wic are i.i.d. Under Assuption A, were X (Z i = d Y i + S i U ij for i =,..., n, (i ((Y, S,..., (Y n, S n are i.i.d. wit (Y i, S i = d (µ(z, σ(z; (ii (U ij : i n, j are i.i.d. wit U ij = d N(,. j= Let U i = / j= U ij so tat for i =,..., n, X (Z i = d Y i + S i j= U ij = Y i + S i U i. Note tat U i = d N(, for i =,..., n, and (U i : i n are i.i.d. Let F denote te distribution function of X (Z i. Assuing P(S = =, ( F (x = P Y i + S i U i Te following is also assued trougout: ( x = P Ui (x Y i. S i A. For eac y R suc tat f(y >, te conditional density wit respect to Lebesgue easure of te conditional distribution P(σ(Z µ(z = y exists. Denote tis density g( y. Since σ(z and µ(z are rando variables we know tat te regular conditional distribution P(σ(Z µ(z = y exists for all y R. Tis assuption siply requires tat for eac y R suc tat f(y >, P(σ(Z µ(z = y is absolutely continuous wit respect to Lebesgue easure. We believe tat wen Z is of diension or greater, tere will be any cases in wic A is satisfied. However, for univariate Z, A will rarely old. By assuing A in tis paper, we focus on te case in wic Z is of diension or greater. Steckley and Henderson [3] treat te case in wic Z is univariate and µ is onotone. Teir results for ise are very siilar to te ones presented in tis paper. Te proofs are soewat sipler but require different etods. For te sake of space, we oit tese results and proofs and refer te reader to Steckley and Henderson [3]. Assuing A, ( F (x = P Ui (x Y i S i ( = P Ui (x y g(s yf(y ds dy, s

6 were g( y can be defined arbitrarily for y R suc tat f(y =. Let Φ and φ denote te standard noral cuulative distribution function and density, respectively. In tis notation, ( (x y F (x = Φ g(s yf(y ds dy s ( ( (x Y = E Φ. S Assuing we can differentiate te RHS, and intercange te derivative and expectation, we ave tat te density f of te distribution function F exists and is given by ( (x y f (x = φ g(s yf(y ds dy. (5 s s A sufficient condition for te intercange is A3. (/s g(s yf(y ds dy <, wic coes fro a result given by L Ecuyer [99] and L Ecuyer [995]; see also Glasseran [988], and Lea of Steckley [5] for te application in te present context. Returning to te density of te observations X (Z given in (5, te cange of variable z = (x y gives f (x = ( z s φ g(s x s z f(x z ds dz. Suppose f( is continuous. For y suc tat f(y =, suppose tat g( y can be defined so tat g(s is continuous for all s R. We assue te following: A4. For alost all y R, g( y is nonnegative; A5. For alost all y R, g(s y = for s <. Te Assuptions A4 and A5 are certainly true for y suc tat f(y > since in tat case g( y is a density for a nonnegative rando variable. Under A4, te order of integration can be canged so tat f (x = ( z s φ g(s x s z f(x z dz ds. (6 It will be useful to tink in ters of te joint density of µ(z and σ(z. Let us denote tis density by α. Of course α(x, s = g(s xf(x. (7 Define for nonnegative integer k, α (k+ (x, s = d dy α(k (y, s y=x, (8 were α ( (x, s = α(x, s. Also define for nonnegative integer k, g (k+ (s x = d dy g(k (s y y=x, were g (s x = g(s x. For ease of notation we define te following set of Assuptions paraeterized by nonnegative integer k as A6(k.

7 . f( is k ties continuously differentiable;. for all s R, g(s is k ties continuously differentiable; 3. B f > suc tat f (j ( B f for j =,,..., k; 4. B g > suc tat g (j ( B g for j =,,..., k; 5. B S > suc tat σ ( B S everywere. Note tat f ( and g ( are siply f and g, respectively, and wen k =, Assuptions and iply tat f( and g(s are continuous. Te following teore gives sufficient conditions for te consistency in quadratic ean for te estiator forulated in (4. Teore Assue A-A5, and A6(. Also assue tat. K is a bounded probability density;. (c, (c, and n(c(c, as c. Ten for all x R, li se( ˆf(x;, n, =. A proof is given in te Appendix. We now turn to te asyptotic expressions of se and ise. More restrictive assuptions are needed to copute tese asyptotic expansions. For one ting, it is assued tat te function f( and te set of functions {g(s : s R} are four ties continuously differentiable. For sequences of real nubers a n and b n, we say tat a n = o(b n as n iff li n a n/b n =. For sequences of real nubers a n and b n, we say tat a n = O(b n as n iff C s.t. a n C b n for n sufficiently large. Teore Assue A-A5, and A6(4. Also assue. K is a bounded probability distribution function syetric about zero wit finite second oent;. (c, n(c, (c, and n(c(c as c. Ten se( ˆf(x;, n, = ( f (x + n f(x u K(u du + K (u du + o ( ( + s α ( (x, s ds + n, (9 were α is defined in (7 and (8. Teore 3 Assue A-A5 and A6(4. Also assue. f ( is ultiately onotone, eaning tat tere exists a B > suc tat f is onotone on [B, and onotone on (, B;

8 . f (k ( is integrable for k =,, 3, 4; 3. K is a bounded probability density function syetric about zero wit finite second oent; 4. (c, n(c, (c, and n(c(c as c. Ten ise( ˆf( ;, n, = ( ( + n u K(u du K (u du + o f (x + ( ( + s α ( (x, s ds dx +, ( n were α is defined in (7 and (8. Teore 3 follows fro Teore provided te o ter in (9 is integrable. Proofs of Teores and 3 are presented in te Appendix. Copare ( to te ise for standard kernel density estiation (e.g., Wand and Jones [995], ise(ĝ( ; = ( ( + o u K(u du ( 4 + n g (x dx + n K(u du. ( It is known tat ise can be decoposed into integrated squared bias and integrated variance. We get siilar forulas for te standard kernel density estiator ĝ. Note tat te O(/n ters in te ise expansions in ( and ( are te sae for bot estiators. In te proof of Teore 3 we sow tat tis ter is te leading ter for te integrated variance. Te reaining leading ters in ( and ( are tose of te integrated squared bias. For our estiator ˆf, te bias itself can be furter decoposed. Suppose tat te density of an observation X (Z exists and is given by f (. Ten bias( ˆf(x;, n, = (E( ˆf(x;, n, f (x + (f (x f(x ( Te first coponent, E( ˆf(x;, n, f (x, is te bias due to kernel sooting, wile te second coponent is te bias due to easureent error. Bot te standard kernel density estiator and our estiator are biased due to te kernel sooting, and te leading ter of tis bias for bot estiators is O(. However, due to easureent error our estiator as an additional bias wose leading ter is O(/, and tis bias also depends on te distribution of te conditional variance function σ ( troug α. Te asyptotic ise for our estiator ˆf is ( ( u K(udu f (x+ s α ( (x, sds dx+ n K (udu. (3 By coosing, n, and to iniize tis asyptotic ise, we can acieve te optial asyptotic convergence. Define A = β (x dx β (x dx + ( β (xβ (x dx 6( β (xβ (x dx β (x dx 4 β (x dx,

9 were β (x = f (x u K(u du and β (x = Ten te optial, n, and, denoted, n, and, are s α ( (x, s ds. (4 = ( A 3 β (xβ (x dx + A /7 β (x dx K c /7, (5 (u du ( K n /7 (u du = A 3 β (xβ (x dx + A c 5/7, and (6 β (x dx ( K /7 (u du = A A 3 β (xβ (x dx + A c /7. (7 β (x dx Substituting, n, and into (3 sows tat te optial rate of convergence is of te order c 4/7. In fact, wen, n, and are cosen suc tat is of te order c /7, n is of te order c 5/7, and is of te order c /7 te optial rate of convergence of ise is acieved. We note tat for te case in wic Z is assued to be univariate, te optial rate of convergence is also c 4/7 [Steckley and Henderson, 3]. Te constants in Equations (5 7 are unlikely to be tractable to estiate; te ain purpose of te result is to provide te optial rate of convergence. In standard kernel density estiation, te optial rate of convergence is c 4/5 (Wand and Jones [995], wile te associated constants are often intractable. One of te contributions of tis paper is to provide te optial rate of convergence of our estiator given additional bias due to easureent error. Te decrease in te rate of convergence is a consequence of te additional bias. For eac of te n observations X (Z i, we ust use internal saples to deal wit te easureent error bias, and as c. In te standard kernel density estiation setting, eac observation requires only one saple since tere is no easureent error. Note tat altoug we prased te optial rate of convergence in ters of ise, te sae applies to te se. So te optial rate of convergence of se for our estiator ˆf(x;, n, is c 4/7. A local kernel estiate can be constructed, based on local kernel density estiation. It allows te bandwidt to be a function of te point at wic te density function is being estiated, i.e., te local estiator is constructed by replacing in Equation (4 wit (x. Te ise convergence rate of te local estiator is te sae as tat of te standard estiator, but te local estiator can ave better perforance in practice. Results are available in Steckley [5].. A Bias-Corrected Estiator In tis section, we introduce a bias-corrected estiator of te density of te conditional expectation. We otivate te estiator wit a discussion of te jackknife bias-corrected estiator; see Efron and Tibsirani [993] for an introduction. We present soe results on te asyptotic bias and variance of te bias-corrected estiate and sow tat te optial rate of se convergence is faster tan for te standard estiator. Te jackknife estiator can be tougt of as an extrapolation fro one estiate back to anoter estiate tat as nearly zero bias (e.g., Stefanski and Cook [995]. To understand tis interpretation of te jackknife estiator, we turn to an exaple. A siilar exaple was presented in Stefanski and Cook [995]. Suppose we want to estiate θ = g(µ were g is nonlinear and twice continuously differentiable. We are given i.i.d. data {X,..., X } drawn fro a N(µ, σ distribution. We take our estiate, denoted ˆθ, to be g( X were

10 X is te saple ean of te data. Under integrability assuption on te error, we can use Taylor expansion to sow tat for an estiate based on any saple size, E(ˆθ θ + β. (8 We actually know tat β = σ g (µ/, but tat is not needed for our discussion. Te point is tat te bias, E(ˆθ θ, is approxiately linear in te inverse saple size. Ten if we know β and E(ˆθ for soe, by extrapolating on te line given in (8 back to / =, we ave a nearly unbiased estiate of θ. Te reaining bias is fro te lower order ters in te Taylor expansion of E(ˆθ. If we ave an estiate of E(ˆθ, all we need is anoter estiate E(ˆθ for to estiate β. For te standard jackknife estiator, E(ˆθ is estiated wit ˆθ and E(ˆθ is estiated wit ˆθ ( = ˆθ k= (k / were for k =,...,, ˆθ (k, te leave-one-out estiator, is te estiator based on all te data less X k. Te jackknife bias-corrected estiator θ is ten θ = ˆθ ( (ˆθ ( ˆθ = ˆθ ( ˆθ (. For our standard estiator (4, we know fro Teore tat E( ˆf(x;, n, f(x + β + β, (9 were β and β are defined in Equation (4. Here te bias is approxiately linear in te square of te bandwidt ( and te inverse of te internal saple size (/. Given an estiate of E( ˆf(x;, n, for soe and, we would like to extrapolate back to / = and = on te plane specified in (9. Siilar to te typical jackknife estiator, we take te standard estiate ˆf(x;, n, as an approxiation of E( ˆf(x;, n,. To deterine β and β and tus extrapolate back to / = and =, we need to estiate E( ˆf(x;, n, at two oter pairs of (,. Alternatively, we can save ourselves a bit of work by coosing only one oter pair (, suc tat (/, lies on te line deterined by (, and (/,. We could estiate E( ˆf(x;, n, as te average of te leave-one-out estiators as is done for te typical jackknife estiator. Tis will require coputations of te density estiator. As a coputationally attractive alternative, consider instead taking = / and = and take te estiate ˆf(x;, n, as an approxiation of E( ˆf(x;, n,. Note tat (/, lies on te line deterined by (, and (/,. Using te data points ˆf(x;, n, and ˆf(x; /, n, and extrapolating back to / = and = gives te bias-corrected estiator f(x;, n, = ˆf(x;, n, ˆf(x; /, n,. ( We epasize tat just like te leave-one-out jackknife estiator, te data can be reused to estiate ˆf(x; /, n,. Tat is to say, te estiator ˆf(x; /, n, can be coputed wit te sae data set wit wic ˆf(x;, n, is coputed less alf of te internal saples. However in soe cases, it would be possible to generate a new data set to estiate ˆf(x; /, n,. For te reainder of tis section, we consider te asyptotic bias and variance of te bias-corrected estiator given in (. Te results cover bot te case were te data is reused in coputing ˆf(x; /, n, and te case were a new data set is generated.

11 Based on Equation (, te bias of te estiate f(x;, n, can be expressed as bias( f(x;, n, = E( f(x;, n, f(x [ = E( ˆf(x; ] [, n, f(x E ˆf(x; /, n, ] f(x [ = (E( ˆf(x; ], n, f (x + (f (x f(x [ (E( ˆf(x; /, n, ] f / (x + (f / (x f(x. ( Fro Lea 6 in te Appendix, E( ˆf(x;, n, f (x = f ( (x u K(u du + s α (4 (x, s ds u K(u du f (4 (x ( + o + 4 u 4 K(u du and E( ˆf(x; /, n, = f ( (x u K(u du + 4 s α (4 (x, s ds u K(u du f (4 (x ( + o + 4. u 4 K(u du Fro Lea 5 in te Appendix, f (x f(x = s α ( (x, s ds + 8 ( s 4 α (4 (x, s ds + o and f / (x f(x = s α ( (x, s ds ( s 4 α (4 (x, s ds + o. Substituting into ( proves te following teore. Teore 4 Assue A-A5 and A6(6. Also assue. K is a bounded probability distribution function syetric about zero wit finite fourt oent;. and as c. Ten bias( f(x;, n, = 4 f (4 (x u 4 K(u du s α (4 (x, s ds u K(u du ( ( s 4 α (4 (x, s ds + o +. 4

12 As for te variance of f(x;, n,, note tat fro te proof of Teore, var( ˆf(x;, n, = ( n f(x K (u du + o n and var( ˆf(x; /, n, = f(x n ( K (u du + o. n Also, cov( ˆf(x;, n,, ˆf(x; /, n, var( ˆf(x;, n, var( ˆf(x; /, n, ( /4 n f(x K (u du + o. n Ten var( f(x;, n, = var( ˆf(x;, n, ˆf(x; /, n, = 4var( ˆf(x;, n, + var( ˆf(x; /, n, 4cov( ˆf(x;, n,, ˆf(x; /, n, 4 n f(x K (u du + f(x K (u du n +4 ( /4 n f(x K (u du + o n ( = 4 + / + 4 ( /4 n f(x K (u du + o. ( n Tis sows tat var( f(x;, n, is O( n. Siilarly, var( f(x;, n, ( 4 + / 4 /4 n f(x ( K (u du + o. (3 n Since 4 + / 4.34, /4 we conclude tat te asyptotic variance of f(x;, n, is greater tan te asyptotic variance of te standard estiator ˆf(x;, n,. Terefore, it is likely te actual variance of te bias-corrected estiate is greater tan te variance for te standard estiate. Tis is a coon tee for bias-corrected estiates (Efron and Tibsirani [993]. Te above asyptotic bias and variance results for f(x;, n, iply tat if, n, and are cosen suc tat is of te order c /, n is of te order c 9/, and is of te order c / te optial rate of convergence of se is obtained and tat optial rate is c 8/. Recall te optial rate of se for te standard estiator ˆf(x;, n, was c 4/7. Tus, te bias-correction leads to iproved convergence. But as we noted above, te variance is greater for te bias-corrected estiate and tis can adversely affect perforance, especially for odest saple sizes. 3. Experient Design and Bandwidt Selection In tis section, we address te ipleentation of our estiators for te density of te conditional expectation discussed in Section and study teir perforance. Ipleentation

13 requires te specification of a nuber of inputs. For te standard kernel density estiator presented in (3, one ust coose te kernel K and te bandwidt. For te estiators of te density of te conditional expectation including te standard kernel density estiator (4, and te bias-corrected estiator (, one ust coose K,, as well as te nuber of external saples n and te nuber of internal saples. We coose K to be te Epanecnikov kernel wic is K(x =.75( x I( x <. Epanecnikov [967] sowed tis kernel was optial in ters of iniizing te ise for te standard kernel density estiator (3; see Wand and Jones [995]. Te rest of tis section deals wit te coice of te paraeters, n, and. In Section 3. we consider te selection of tese paraeters for te standard kernel density estiator (4. We present a data-based etod to select tese paraeters based on EBBS developed by Ruppert [997]. We present te algorit and briefly discuss wy we cose tis etod. In Section 3., te data-based paraeter selection etod is applied to te bias-corrected estiator (. 3. Standard Estiator In Section we saw ow to coose te bandwidt, te nuber of internal saples, and te nuber of external saples n for te standard estiator ˆf(x;, n, to obtain optial convergence: see (5. However te expressions for, n, and given in (5 involve unknowns suc as f (x, te second derivative of te target density, and s α ( (x, s ds were α ( is defined in (7 and (8 as te second derivative wit respect to te first arguent of te function α(y, s = g(s yf(y. To ipleent te estiator ˆf(x;, n, in an optial way, one could attept to estiate tese unknown quantities and plug tese estiates into te expressions given in (5. Tis type of estiator is known as a plug-in estiator (Wand and Jones [995]. In fact it is quite doable to estiate te unknowns f and f needed for te plug-in estiator. Oter needed estiates, including an estiate of te second derivative of α, appear very difficult to obtain. To coose te paraeters, n, and needed to ipleent te estiator ˆf(x;, n, we turn fro optiizing te asyptotic ise to optiizing an approxiation of ise. Note tat ise can be decoposed as ise( ˆf( ;, n, = bias ( ˆf(x;, n, dx + var( ˆf(x;, n, dx. It was sown in te proof of Teore 3 tat var( ˆf(x;, n, dx = n ( K (u du + o. n An approxiation for te variance coponent in ise is te asyptotic approxiation, K (u du, n wic is readily available. Also in te proof of Teore 3, it was sown tat bias ( ˆf(x;, n, dx ( = ( +o u K(u du ( ( +. f (x + s α ( (x, s ds dx

14 As explained above, te asyptotic approxiation ( ( u K(u du f (x + s α ( (x, s ds dx is not iediately useful given te unknowns in te approxiation. To approxiate te bias coponent in ise we will instead build and estiate a odel of bias for eac x. Squaring te bias and nuerically integrating will ten provide an epirical odel of integrated squared bias. Adding te integrated variance approxiation to tis gives an epirical odel of ise wic can ten be optiized wit respect to, n, and. Te idea of building and epirically estiating a odel of bias to be used in te selection of an estiator s paraeters was introduced in Ruppert [997]. In tis paper, te etod, called epirical-bias bandwidt selection (EBBS, was applied to bandwidt selection in local polynoial regression. Sculan [998] establised convergence results for te bandwidt selector in te context of local polynoial regression. Staudenayer and Ruppert [4] applied EBBS to local polynoial regression in wic te covariates are easured wit error. EBBS uses a odel of bias suggested by te asyptotic expression of te expected value of te estiator. In our case, by Leas 5 and 6 in te Appendix, E ( ˆf(x;, n, = f(x + f (x u K(u du + s α ( (x, s ds +o( +. Te asyptotic expression E ( ˆf(x;, n, = f(x + f (x suggests te following odel: u K(u du + s α ( (x, s ds, (4 E ( ˆf(x;, n, = β (x + β (x + β (x. (5 Here β (x approxiately corresponds to f(x, te target density evaluated at x. Te bias of ˆf(x;, n, is ten approxiately given by β (x + β (x. (6 Te EBBS odel of bias used in local polynoial regression is a polynoial in (Ruppert [997], sta. In our case te odel of bias is polynoial in as well as /. Leas 5 and 6 in te Appendix allow for ore ters used in te asyptotic expression of E ( ˆf(x;, n, given in (4 wic would give ore ters in odel (5. Suc a odel would be a better approxiation of E ( ˆf(x;, n, but would require te estiation of additional paraeters. In tis paper, we use te odel (5. Toug approxiate, notice tat te odel of bias does capture te fact tat as and /, bias tends to zero. Suppose tat we can estiate te odel (5. Tis not only gives us an epirical odel of bias tat can be used in selecting te needed paraeters, n, and but also gives anoter estiator wic will be of soe use. Extrapolating te estiated odel to = / = gives an approxiately unbiased estiate of f(x. Tis approxiately unbiased estiate of f(x is of course ˆβ, te estiate of β. Based on te discussion of jackknife bias-correction, one can argue ˆβ is essentially a jackknife estiate. For ore on tis see Staudenayer and Ruppert [4]. Te estiation of te odel (5 at x for a given experient budget c is outlined in te following algorit.

15 . Generate a saple of te data using alf of te experient budget. To do tis fix n and suc tat n = c/.. Establis a grid of pairs of bandwidts and internal saples given by te Cartesian product (,..., I (,..., I. Te largest internal saple, I, is equal to so tat only alf te experient budget is used. Ruppert [997] suggests evenly spacing te bandwidts on a log scale. We follow tis suggestion for te bandwidts and te nuber of internal saples. 3. For eac pair in te grid of bandwidts and internal saples copute te kernel density estiator. Tis gives te data: [( i, j, ˆf(x ; j, n, i ] i =,..., I, j =,..., I. 4. Take ˆf(x ; j, n, i as an approxiation of E ( ˆf(x ; j, n, i for eac i and j. Estiate (5 wit te data coputed in step 3. We use global least squares regression. Note tat in te context of local polynoial regression, Ruppert [997], Sculan [998], and Staudenayer and Ruppert [4] use local least squares to estiate te odel. Local least squares ay provide a better estiate but it requires te specification of additional tuning paraeters to be discussed below. We are content wit global least squares as it gives good perforance for te test cases considered in Section 4. Te estiation procedure above is repeated on an equally spaced grid of x values over te range of te observations X j (Z i, j =,...,, i =,..., n. Following Ruppert s suggestion (Ruppert [997], we soot te estiates ˆβ (x and ˆβ (x over x. Te result is an approxiation of bias ˆβ (x + ˆβ (x at eac x in te grid. Squaring te bias at eac x on te grid and nuerically integrating gives us an approxiation of te bias coponent in ise as a function of and. Adding to tis te variance coponent approxiation n K (u du, gives an approxiation of ise as a function of, n, and. To copute te optial, n, and for te experient budget c, we iniize te approxiation of ise wit respect to, n, and given te constraints. n = c;. n n n and. Te first constraint is siply te budget constraint. Te second constraint arises because we ave already used alf of te experient budget to generate n external saples and internal saples. It is iplicitly assued tat our approxiation for ise is appropriate for te experient budget c. Te approxiation for bias was estiated under te constraint tat te experient cost not exceed c/. As Ruppert [997] points out, te EBBS bias approxiation captures te bias for te given finite saple. Asyptotics are used only to suggest a odel for te bias. Tis iplies tat te bias coefficients β and β in (6 sould be different for different saple sizes corresponding to different experient budgets. We will

16 assue tat te cange in tese coefficients is sall enoug suc tat te estiates ˆβ and ˆβ coputed for te budget c/ are reasonably good estiates for β and β given an experient budget of c. In te above algorit tere are a nuber of tuning paraeters tat ust be selected including n,,, I,, I, and I. Ideally, we would like to establis values for tese tuning paraeters tat work for ost instances of te proble. Tis could be done wit an experient involving Monte Carlo siulation as in Ruppert [997], Sculan [998], and Staudenayer and Ruppert [4]. However, for tis paper we siply offer soe guidelines and report te values tat worked well in te test cases presented below. For coosing n, te initial allocation of external saples given alf of te experient budget, we note tat it is asyptotically optial to set n = b(c/ 5/7 for soe positive constant b. We found a constant of b = /3 worked well for te cases in wic Z is univariate. For Z ultivariate, it was better to take ore external saples (b =. We ten took, te initial allocation of internal saples given alf te experient budget, to be (c//n. We ust also coose te lower and upper bounds for te bandwidt grid, and I, respectively. We note tat if I is cosen too large, (6 is not a good odel for te larger bandwidts. But if is cosen too sall te variance approxiation will not be very good for te saller bandwidts. We found tat =. and I =.5 worked well. Siilar considerations need to be ade in coosing, te lower bound for te internal saples grid. If is too sall, (6 is not a good odel for te saller nubers of internal saples. We found tat.5 worked well. Finally, for te nuber of bandwidts I and te nuber of internal saples I, we found tat I = I = 5 was adequate. 3. Bias-Corrected Estiator Now we turn to te ipleentation of te bias-corrected estiator presented in Section.. We use te sae data to copute te estiators on te RHS. We again would like to use an expression for asyptotic ise to guide te odeling of ise. Recalling te decoposition of ise, we tus need asyptotic expressions for integrated, squared bias and integrated variance. Teore 4 gives an asyptotic expression for bias. Let us assue tat we can integrate squared bias so tat we ave te asyptotic expression of integrated, squared bias ( 4 f (4 (x u 4 K(u du 4 s 4 α (4 (x, s ds dx. s α (4 (x, s ds Tis suggests tat we odel te expectation of f L (x;, n, as E ( f L (x;, n, = β (x + β (x 4 + β (x + β 3(x. Te bias of f L (x;, n, is ten approxiately u K(u du β (x 4 + β (x + β 3(x. (7 Let us also assue tat te upper and lower bounds on variance given in ( and (3 integrate. Moreover, since we are reusing te data, assue tat te covariance of ˆf L (x;, n,

17 and ˆf L (x; /, n, is equal to te approxiate upper bound /4 n f(x K (u du, so tat we can approxiate te variance coponent in ise wit te integrated asyptotic expression fro te lower bound of var( f L (x;, n,. Tis approxiation is ( 4 + / 4 /4 K (u du. (8 n We tus ave an approxiation for te variance coponent of ise (8 and a odel for te bias (7. Te tuning paraeter values fro te previous sections work well ere. 4. Nuerical Experients In tis section we exaine te perforance of te ipleentations discussed in te previous section on te sodiu easureent exaple and anoter test case. To assess perforance we consider representative plots and te beavior of estiated ise. 4. Test Case In tis two-diensional test case, Z = (Z, Z as a standard bivariate noral distribution. Conditional on Z, ( ( X(Z = d N Z + Z, + /. Z Z Ten te rando variable E(X Z = Z + Z is norally distributed wit ean and variance. Tis is a straigtforward exaple in wic all te assuptions for Teore 3 are satisfied. We consider tis exaple ainly to nuerically verify tat te rate of ise convergence for te standard estiator is c 4/7 as suggested by Teore 3. In Figure, te standard density estiator is plotted for two different experient budgets along wit te target density for te first test case. Te figure sows tat, as expected, te perforance of te estiator iproves as te experient budget increases. We now turn to ise convergence. For clarity, we no longer suppress te dependence of te various estiators and paraeters on te experient budget c. To estiate ise(c, ise at a given experient budget c, we first replicate te density estiator 5 ties: { ˆf( ; (c, n(c, (c k : k =,..., 5}. We define integrated squared error (ise as follows: ise(c = [ ˆf(x; (c, n(c, (c f(x] dx. For eac k =,..., 5, we use nuerical integration to copute ise k (c = [ ˆf(x; (c, n(c, (c k f(x] dx. Our estiate for ise(c is ten ise(c ˆ = 5 5 k= ise k (c.

18 .35.3 c= 8 c= 4 true density.5 density x Figure : Te standard kernel density estiator for two different experient budgets along wit te target density. In Figure, we plot log(ise(c vs. log(c at c = 8,,, 4 and te least squares regression line for te standard estiator. Te linearity of te plot suggests tat over te particular range of experient budgets c, te estiator s ise(c as te for ise(c = V c γ for soe constants V and γ. Suppose tat ˆδ and ˆδ are te estiated intercept and slope of te regression line plotted in te figures. Ten ˆδ estiates γ and exp( ˆδ / ˆδ estiates V. Given tat te optial ise convergence rate is c 4/7 we expect tat, asyptotically, γ = 4/7.57. Te estiated intercept and slope in Figure are -7.5 and -.6, respectively. So it appears tat te estiator perfors as expected. 4. Sodiu Measureent Exaple In tis section, we return to te sodiu easureent exaple described in Section and sow tat te bias-corrected estiator we proposed in Section. outperfors te standard estiator even wen te experient budget is sall. Consider an experient wit = 5 repeated sodiu easureents for eac of n = 3 batces of ingredient aving true sodiu content Z i, i =,..., n. For te purpose of assessing te perforance of te estiators, we use siulation to generate data for tis exaple. In te siulation, Te distribution of Z is a tree paraeter lognoral distribution: lognoral(µ =.544, σ =.5, t =, wic as ean 7.37 and standard deviation.88. Note tat paraeters µ and σ correspond to te ean and standard deviation of te variable s natural logarit, wile t is te location paraeter. For any i and j, we take te easureent error U ij = X ij Z i to be norally distributed wit standard deviation.6 +.5Z i. Te bias-corrected estiator significantly outperfors te standard estiator in ters of ean integrated squared error: wen te experient was repeated 5 ties, te estiators ad ise of.6 and.3, respectively. Figure 3 plots te bias-corrected estiator (solid curve copared wit te standard estiator (dased curve for one siulated data set. Te bias-corrected estiator is uc closer to te true density in te center of te distribution and in te left tail. Altoug te

19 log(mise log(c Figure : Plot of log(ise(c vs. log(c at c = 8,, for te standard kernel density estiator. bias-corrected estiator is less soot tan te standard estiator in te rigt tail, tis is ainly due to te presence of few observations near extree quantiles. In tis situation, poor density estiation in te tail is a typical proble for kernel density estiators [Wand and Jones, 995]. 5. Conclusions We proposed a bias-corrected estiator for te density of a conditional expectation, based on kernel sooting. We derived results about te convergence rates of tis estiator and a standard kernel sooting estiator; te bias-corrected estiator as a superior convergence rate. Using te asyptotic analysis and EBBS, we created algorits for coosing te bandwidt and te saple sizes given an experient budget. Wen applied to a practical exaple wit oderate saple sizes, te bias-corrected estiator perfored better tan te standard estiator. PD tesis. References M. Broadie, Y. Du, and C. C. Moallei. Efficient risk estiation via nested sequential siulation. Manageent Science, 57:7 94,. R. J. Carroll and P. Hall. Optial rates of convergence for deconvolving a density. Journal of te Aerican Statistical Association, 83(44:84 86, Dec 988. A. Delaigle and A. Meister. Density estiation wit eteroscedastic error. Bernoulli, 4 (:56 579, 8. A. Delaigle, P. Hall, and A. Meister. On deconvolution wit repeated easureents. Annals of Statistics, 36(: , 8.

20 density true density _ks=.89 _bcks= x Figure 3: Te true density, standard kernel-sooting estiator, and bias-corrected kernelsooting estiator, in te sodiu easureent exaple. ks and bcks are bandwidts used for standard kernel-sooting estiator, and bias-corrected kernel-sooting estiator respectively. L. Devroye and G. Lúgosi. Cobinatorial Metods in Density Estiation. Springer, New York,. B. Efron and R. J. Tibsirani. An Introduction to te Bootstrap. Capan & Hall, New York, 993. V. A. Epanecnikov. Non-paraetric estiation of a ultivariate probability density. Teory of Probability and its Applications, 4(:53 58, 967. P. Glasseran. Perforance continuity and differentiability in Monte Carlo optiization. In M. Abras, P. Haig, and J. Cofort, editors, Proceedings of te 988 Winter Siulation Conference, pages 58 54, Piscataway, NJ, 988. IEEE. M. B. Gordy and S. Juneja. Nested siulation in portfolio risk easureent. Manageent Science, 56: ,. S. G. Henderson. Input odel uncertainty: wy do we care and wat sould we do about it? In S. E. Cick, P. J. Sáncez, D. J. Morrice, and D. Ferrin, editors, Proceedings of te 3 Winter Siulation Conference, page To appear, Piscataway, NJ, 3. IEEE. M. C. Jones and D. F. Signorini. A coparison of iger-order bias kernel density estiators. Journal of te Aerican Statistical Association, 9(439:63 73, Septeber 997.

21 P. L Ecuyer. A unified view of te IPA, SF and LR gradient estiation tecniques. Manageent Science, 36: , 99. P. L Ecuyer. On te intercange of derivative and expectation for likeliood ratio derivative estiators. Manageent Science, 4: , 995. S. H. Lee. Monte Carlo Coputation of Conditional Expectation Quantiles. PD tesis, Stanford University, Stanford, CA, 998. S. H. Lee and P. W. Glynn. Coputing te distribution function of a conditional expectation via Monte Carlo: discrete conditioning spaces. ACM Transactions on Modeling and Coputer Siulation, 3(3:38 58, July 3. J. McIntyre and L. A. Stefanski. Density estiation wit replicate eteroscedastic easureents. Annals of te Institute of Statistical Mateatics, 63:8 99,. B. L. S. Prakasa Rao. Nonparaetric Functional Estiation. Acadeic Press, New York, 983. W. Rudin. Real and Coplex Analysis. McGraw-Hill, New York, 987. D. Ruppert. Epirical-bias bandwidts for local polynoial nonparaetric regression and density estiation. Journal of te Aerican Statistical Association, 9(439:49 6, Septeber 997. A. E. Sculan. A Coparison of Local Bandwidt Selectors for Local Polynoial Regression. PD tesis, Cornell University, Itaca, NY, 998. J. Staudenayer and D. Ruppert. Local polynoial regression and siulationextrapolation. Journal of te Royal Statistical Society. Series B (Statistical Metodology, 66(:pp. 7 3, 4. ISSN URL ttp:// J. Staudenayer, D. Ruppert, and J. P. Buonaccorsi. Density estiation in te presence of eteroscedastic easureent error. Journal of te Aerican Statistical Association, 3 (48:76 736, 8. S. G. Steckley. Estiating te density of a conditional expectation. PD tesis, Cornell University, Itaca, NY, 5. S. G. Steckley and S. G. Henderson. A kernel approac to estiating te density of a conditional expectation. In S. E. Cick, P. J. Sáncez, D. J. Morrice, and D. Ferrin, editors, Proceedings of te 3 Winter Siulation Conference, pages , Piscataway, NJ, 3. IEEE. L. A. Stefanski and J. R. Cook. Siulation-extrapolation: Te easureent error jacknife. Journal of te Aerican Statistical Association, 9(43:47 56, Dec 995. Y. Sun, D. W. Apley, and J. Stau. Efficient nested siulation for estiating te variance of a conditional expectation. Operations Researc, 59:998 7,. M. P. Wand and M. C. Jones. Kernel Sooting. Capan & Hall, London, 995.

22 A. Te proof of te consistency in quadratic ean Te following lea is useful in te proof of Teore. Lea Assue A-A4 and A6(. Also assue tat Ten. K is nonnegative and integrable;. and as c. li K ( x y Proof: By (6, ( x y K f (y dy ( x y ( z = K s φ s ( x y ( z = K s φ s = K(u ( z s φ s K(u s φ ( z s f (y dy = f(x g(s y g(s y g(s x u z f(y z f(y K(u du. z f(x u z dz ds dy z dz ds dy z dz ds du. (9 We will sow tat tere exists an integrable function f suc tat for all c > C, for soe nonnegative nuber C, g(s x u z f(x u Ten by Lebesgue s doinated convergence teore, ( x y li K f (y dy = li K(u ( z s φ g(s x u s ( z = li K(u s φ g(s x u s = K(u ( z s φ li g(s x u z s z f(s, z, u. (3 z f(x u z f(x u f(x u z dz ds du z dz ds du z dz ds du. (3 For any given u R, u and for any given z R, z/ as c. Ten, by te continuity of f( and g(s, Tus, li li g(s x u z K ( x y f(x u f (y dy = z = g(s xf(x s R. K(u ( z s φ s = f(x K(u ( z s φ s = f(x K(u du g(s x ds. g(s xf(x dz ds du g(s x du dz ds

23 Te second equality follows fro an application of Fubini s Teore for a nonnegative integrand. For x R suc tat f(x is nonzero, g(s x ds =. Te result ten follows, once we establis (3. By Assuptions A5 and A6(, K(u s φ ( z s g(s x u z f(x u z K(u ( z s φ I( < s B S B g B f s = f(s, z, u. Proof of Teore : Let x be arbitrary. Recall te decoposition of se( ˆf(x;, n, = bias ( ˆf(x;, n, + var( ˆf(x;, n, x R (3 Since ( X (Z i : i n are i.i.d. wit coon probability density f, ( E( ˆf(x; n ( x, n, = E n K X (Z i i= ( ( x = E K X (Z i ( x y = K f (y dy. Since K is a probability density and tus nonnegative and integrable, all of te assuptions for Lea old, and so li E( ˆf(x;, n, = f(x K(u du = f(x. It follows tat bias ( ˆf(x;, n, tends to zero as c. Again since ( X (Z i : i n are i.i.d. wit coon probability density f, ( var( ˆf(x; n ( x, n, = var n K X (Z i i= = ( ( x n var K X (Z i = ( ( x n E K X (Z i ( ( ( x E n K X (Z i = ( ( x n E X (Z i K ( ( ( x E n K X (Z i (33 ( ( x n E X (Z i K = ( x y n K f (y dy. Since K is a bounded probability density, K is integrable. Terefore Lea olds and ( x y li K f (y dy = f(x K (u du <.

24 Now, /n tends to zero as c, and so By te decoposition in (3, li var( ˆf(x;, n, =. li se( ˆf(x;, n, =, and since x was arbitrary, te result follows. B. Te proofs of te asyptotic expressions for se and ise To copute te rates at wic se and ise converge to zero, we will ake use of te decoposition of bias given in (. As a reinder, te decoposition is bias( ˆf(x;, n, = (E( ˆf(x;, n, f (x + (f (x f(x. Coputing te rates at wic se and ise converge to zero requires additional sootness in f and g. Recall tat for all x R and s R, and for nonnegative integer k, α(x, s = g(s yf(y, α (k+ (x, s = d dy α(k (y, s y=x, were α ( (x, s = α(x, s. Leas 3 and 4 presented below are useful for bot te se result and te ise result. Bot results use Taylor s teore wit integral reainder, presented ere as a lea. Lea Let k be an integer. Assue f is k ties continuously differentiable and k + ties differentiable. Also assue tat f (k+ is integrable on (x, x +. Ten f(x+ = f(x+f (x+! f ( (x+ + k k! f (k (x+ k+ k! ( t k f (k+ (x+t dt. Te proof of Taylor s teore wit integral reainder involves repeated application of integration by parts wic follows fro te fundaental teore of calculus (e.g., Rudin [987, p.49]. Lea 3 Assue tat for soe k,. f( is k ties continuously differentiable, and. for all s R, g(s is k ties continuously differentiable. Ten for all x, f (x = k j= j j j! + k (k! s j α (j (x, s ds ( t k z k s φ(z s α(k (x t z, s dt dz ds.

A KERNEL APPROACH TO ESTIMATING THE DENSITY OF A CONDITIONAL EXPECTATION. Samuel G. Steckley Shane G. Henderson

Proceedings of te 3 Winter Siulation Conference S Cick P J Sáncez D Ferrin and D J Morrice eds A KERNEL APPROACH TO ESTIMATING THE DENSITY OF A CONDITIONAL EXPECTATION Sauel G Steckley Sane G Henderson