SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 B Technical details B.1 Variance of ˆf dec in the ersmooth case We derive the variance in the case where ϕ K (t) = ( 1 t 2) κ I( 1 t 1), i.e. we take K as in (A3), and assume that ϕ U (t) = a(t) exp ( t α), where r, α > and a denotes a symmetric, real-valued function satisfying a() = 1 and a(t) ξ t α 1 as t, (B.1) with < α 1 < and ξ >. That is, we put γ = 1, α 2 = α 1 and d = d 1 in (3.2). The more general setting at (3.2) obtains similarly. Assume too that the distribution of which f W is the density has finite variance: w 2 f W (w) dw <. (B.2) Recall the definition of K U at (2.2). In this notation, it is well known that the asymptotic variance of ˆf dec is given by Var ˆfdec (x) } = n 1 K 2 U f W (x) n 1 E ˆf dec (x) } 2. Theorem 3. If (B.1) (B.2) hold then, for each x, K 2 U f W (x) as h. 2 κ ξ 1 Γ(κ + 1) h (κ+1)α+α 1 exp ( h α) } 2 2π α κ+1 cos(x y)} 2 f W (y) dy (B.3) Proof. It is notationally convenient to put b = a 1. Note that 2π K U (x) = = h 1 cos(tx) ( 1 t 2) κ b(t/h) exp ( t/h α ) dt 1/h cos(hux) 1 (hu) 2} κ b(u) exp ( u α ) du = h I 1 (x; h), (B.4)
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 2 say. Let ϵ = ϵ(h) decrease to zero slowly as h, at a rate that we shall address shortly, and observe that, by (B.1), 1/h I 1(x; h) cos(hux) 1 (hu) 2} κ ( ) b(u) exp u α du (1 ϵ)/h (1 ϵ)/h C } 1 1 (hu) 2 κ (1 + u ) α 1 exp ( u α) du (1 ϵ)/h C 2 (1 + u ) α 1 exp ( u α) du C 3 h α 1 exp [ (1 ϵ)/h} α], (B.5) where, here and below, C 1, C 2,... denote generic positive constants not depending on h or x. Note too that 1/h (1 ϵ)/h cos(hux) 1 (hu) 2} κ b(u) exp ( u α ) du cos(x) 1/h C 3 ϵ x (1 ϵ)/h 1/h (1 ϵ)/h } 1 (hu) 2 κ ( ) b(u) exp u α du 1 (hu) 2 } κ (1 + u ) α 1 exp ( u α) du. (B.6) Since α > then, writing o u (1) for a generic function of u that satisfies u ϵ/h α o u (1) = o(1), and taking ϵ to decrease to zero so slowly that ϵ/h α, we have: 1/h 1 (hu) 2 } κ b(u) exp ( u α ) du (B.7) (1 ϵ)/h = ϵ/h 2 hv (hv) 2 } κ b ( h 1 v ) exp ( h 1 v α) dv ϵ/h = (2h) κ v ( κ 1 1 hv) κ ( 2 b h 1 v ) exp ( h 1 v α) dv ξ 1 (2h) κ h α 1 = ξ 1 (2h) κ h α 1 ϵ/h ϵ/h v κ exp ( h 1 v α) dv v κ exp ( h α 1 hv α) dv
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 3 ϵ/h α = 2 κ ξ 1 h (κ+1)α+α 1 1 = 2 κ ξ 1 h (κ+1)α+α 1 1 ϵ/h α u κ exp ( h α 1 h α u α) du u κ exp h α (1 α h α u) + o u (1) } du = 2 κ ξ 1 h (κ+1)α+α1 1 exp ( h α) ϵ/h α u κ exp α u + o u (1) } du 2 κ ξ 1 h (κ+1)α+α1 1 exp ( h α) u κ exp( α u) du (B.8) = 2κ ξ 1 Γ(κ + 1) α κ+1 h (κ+1)α+α 1 1 exp ( h α). (B.9) Combining (B.6) and (B.9) we deduce that 1/h cos(hux) 1 (hu) 2} κ ( ) b(u) exp u α du (1 ϵ)/h cos(x) + o unif (1)} 2κ ξ 1 Γ(κ + 1) α κ+1 C 4 ϵ x h (κ+1)α+α 1 1 exp ( h α), h (κ+1)α+α1 1 exp ( h α) (B.1) where, here and below, o unif (1) and O unif (1) denote quantities that depend on x and equal o(1) and O(1), respectively, uniformly in < x <, as h. Together, (B.5) and (B.1) imply that I 1 (x; h) = cos(x) 2κ ξ 1 Γ(κ + 1) α κ+1 h (κ+1)α+α 1 1 exp ( h α) + o unif (1) + O unif (1) ϵ x } h (κ+1)α+α 1 1 exp ( h α) + O unif (1) h α 1 exp [ (1 ϵ)/h} α]. (B.11) Together, (B.4) and (B.11) imply that, if ϵ decreases to zero slowly as h, KU 2 f W (x) = KU(x 2 y) f W (y) dy = h2 I (2π) 1(x 2 y; h) f 2 W (y) dy 2 κ ξ 1 Γ(κ + 1) = h (κ+1)α+α 1 exp ( h α) } 2 2π α κ+1 cos(x y)} 2 f W (y) dy + o[ h (κ+1)α+α 1 exp ( h α)} ] 2. (B.12) (Here we have used (B.2).) This result is equivalent to (B.3).
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 4 B.2 Main steps in derivation of convergence rate of ˆf rat (see (3.13)) in the case n 1/2 η Theorem 4 below demonstrates that, under the assumption that ˆθ = θ 1 + O p (n 1/2 ), the asymptotic bias of ˆfrat (x ˆθ) equals that of ˆfrat (x θ 1 ), and the error about the mean of ˆf rat (x ˆθ) equals that of ˆf dec (x) plus negligible terms. Write E fx for expectation when the distribution of the data W j has density f U f X ; and, for any random variable R with finite mean, let (1 E fx ) R denote R E fx (R). Theorem 4. If (3.1), (3.6) and (B1) (B6) in section 3.5 hold, then, for each ϵ >, ˆf rat (x ˆθ) (1 E fx ) ˆf dec (x) E fx ˆfrat (x θ 1 ) } = O p n ( ϵ nh 2α 1) } 1/2 + n 1/2 + h r s+1. (B.13) Moreover, the remainder term on the right-hand side is of the stated order uniformly in densities f X for which (3.1), (B1) (B6) in section 3.5, and (3.6) hold, in the sense that lim lim C n f X F > C P fx [ ˆf rat (x ˆθ) (1 E fx ) ˆf dec (x) E fx ˆfrat (x θ 1 ) } n ϵ ( nh 2α 1) 1/2 + n 1/2 + h r s+1 } ] =. (B.14) Our next result describes the bias of ˆf rat ( θ 1 ). Let s(ψ) be as in (3.15). Theorem 5. If (3.1), and (B1) (B6) in section 3.5 hold, then θ Θ h β s(ψ) } 1 EfX ˆfrat (x θ 1 )} f X (x) = O(1) as n. (B.15) Let Var fx denote the variance operator when the data W j have density f U f X. If (3.1), (B1), (B2) and (B3) hold, then standard deconvolution results imply that f X F Var fx ˆfdec (x) } = O (nh 2α+1 ) 1}, (B.16)
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 5 where the order is exact when f W (x) is nonzero. Hence, the remainder on the righthand side of (B.13) is negligibly small relative to the term (1 E fx ) ˆf dec (x) on the left-hand side, for any x such that f W (x) is nonzero. Combining (B.14), (B.15) and (B.16) we deduce that, under the conditions of Theorem 5, if the term in h r s+1 in (B.14) can be ignored then [ lim lim P fx ˆfrat (x) f X (x) (nh > C ) }] 2α+1 1/2 + h β s(ψ) =. C n θ Θ (B.17) Under the assumptions for Theorem 5, if ψ represents a sequence of functions, and we take h min1, (nη 2 ) 1/(2α+2β+1) }, then the minimiser, with respect to h, of the term within braces (i.e. the coefficient of C) on the left-hand side of (B.17) is of size (η 2α+1 /n β ) 1/(2α+2β+1) in the case n 1/2 η, in which instance h is asymptotic to a constant multiple of (nη 2 ) 1/(2α+2β+1). The term in h r s+1 in (B.14) can be ignored if h r s+1 = O(η 2α+1 /n β ) 1/(2α+2β+1) }, or equivalently, since h is asymptotic to a constant multiple of (nη 2 ) 1/(2α+2β+1), if n 1 = O ( η (2α+2r 2s+3)/(r s+1 β)) (B.18) (provided that r > s 1 + β). For example, if η = O(n t ) where t (, 1 2 ) and 2 t α + β (1 2 t) (r s + 1) t, then (B.18) holds, in which case h r s+1 can be ignored in (B.2) and therefore property (3.13) follows from (B.17). B.3 Proof of Theorem 4 Observe that 2π f(x θ) L x (w θ, h) =f(x θ) =f(x θ) e it(x w) = ϕ U (t) dt e itw ϕ K x( θ,h)(t) dt ϕ U (t) e itw ϕ U (t) dt e it(x hu) K(u) f(x hu θ) du e ihtu f(x θ) K(u) f(x hu θ) du
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 6 = r k= h k e it(x w) k! g k(x θ) ϕ U (t) dt + h r+1 e it(x w) ϕ U (t) dt Since γ h,x has s derivatives then s integrations by parts give: e ihtu ω(hu, x θ) u r+1 K(u) du = (iht) s Therefore, e it(x w) ϕ U (t) dt e ihtu ω(hu, x θ) u r+1 K(u) du 1 dt γ h,x (u θ) du + ϕ U (t) 1 e ihtu u k K(u) du e ihtu ω(hu, x θ) u r+1 K(u) du. t >1 e ihtu γ (s) h,x (u θ) du. Combining this bound with (B.19), and noting (B6), we deduce that: r f(x θ) L h k g k (x θ) x(w θ, h) <w< θ Θ 2πk! k= 1 B 4 h r+1 dt ϕ U (t) + 1 (B.19) dt γ (s) ht s h,x ϕ U (t) (u θ) du. e it(x w) ϕ U (t) dt t >1 dt ht s ϕ U (t) e ihtu u k K(u) du }, (B.2) uniformly in < h C 1, where B 4 > is a constant. Property (3.1) implies that the first integral on the right-hand side of (B.2) is finite, and (3.1) and (B6) imply that the second integral is bounded above by t >1 dt ht s ϕ U (t) C (1 + t ) α 1 dt = O ( h s). t >1 ht s These results, (B.2) and the property e ihtu u k K(u) du = ( i) k ϕ (k) K ( ht), entail: f(x θ) L x(w θ, h) <w< θ Θ r ( ih) k g k (x θ) e it(w x) 2πk! ϕ U (t) ϕ K (k) (ht) dt = O( h r s+1). (B.21) k=
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 7 Note that (B.21), and (B.22) and (B.24) below, involve only the densities f( θ) and f U, which we take to be fixed; these formulae do not involve f X, which can vary with n. Let s k = ( 1) k/2 if k is even, and s k = ( 1) (k 1)/2 if k is odd. Since the distribution of U is symmetric then ϕ U is symmetric, and so, defining cos(tu) ϕ U (t) ϕ K (k) (ht) dt if k is even ψ k (u, h) = sin(tu) ϕ U (t) ϕ K (k) (ht) dt if k is odd we have: <w< θ Θ Hence, defining f(x θ) L x(w θ, h) 1 2π r k= s k h k k! Ψ k (x) = 1 n g k (x θ) ψ k (w x, h) = O( h r s+1). n ψ k (W j x, h), j=1 (B.22) (B.23) we have: for a constant B 5 >, not depending on h, n or the data W 1,..., W n, ˆf rat (x θ) 1 r s k h k g k (x θ) 2π k! Ψ k (x) B 5 h r s+1, (B.24) θ Θ k= where the right-hand side denotes a deterministic quantity of the stated size. Define and note that, by (B2), L k (u) = L k (u) B 6 h α <u< exp(itu) ϕ U (t/h) ϕ K (k) (t) dt (1 + t ) α ϕk (k) (t) dt B7 h α, (B.25) where B 6, B 7 > are constants not depending on h or n. Using (3.1), (B1), (B2) and (B.25) we deduce that, for each integer m 2, h m E ψ k (W x, h) m} E [ L k (W x)/h} m] L k m 2 E [ L k (W x)/h} 2]
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 8 =h L k m 2 L k (u) 2 f W (x + hu) du B 8 h ( B 7 h α) m 2 L k (u) 2 du =B 9 (m) h 1 (m 2)α (2π) 1 ϕ (k) K (t) ϕ U (t/h) B 1 (m) h 1 (m 2)α (1 + t/h ) α ϕ K (k) (t) } 2 dt =O ( h 1 mα), (B.26) 2 dt where the bounds apply uniformly in x I, and B 8 = f W, B 9 (m) = B m 2 7 B 8, and B l (m) or B l, for l 1, denote constants not depending on h or n. (The second identity in the string at (B.26) follows using Parseval s identity. Note that if (B1) holds then f X (x) is bounded uniformly in x and n, from which it follows that the same is true for f W (x).) Therefore, E ψ k (W x, h) m} = O ( h 1 m(α+1)). (B.27) Let 1 j r, and recall the definition of Ψ k at (B.23). Replacing m by either 2 or 2m in the bound at (B.27), we have for each m 1, using Rosenthal s identity (see e.g. Hall and Heyde (198), p. 23): E (1 E) Ψj (x) ( [ 2m B 11 (m) n 1 E ψ k (W x, h) 2}] m + n (2m 1) E ψ k (W x, h) 2m}) (nh 2α+1 B 12 (m) ) m ( + h α nh α+1) } (2m 1) B 13 (m) ( nh 2α+1) m, uniformly in x I, where the last inequality is valid whenever nh 1. Hence, if I(n) is a set of at most n B values x I, then, for each ϵ >, P (n) (1 E) Ψj (x) ( > n ϵ nh 2α+1) } 1/2
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 9 n B (1 P E) Ψj (x) ( > n ϵ nh 2α+1) } 1/2 (n) n B n ϵ ( nh 2α+1) 1/2 } 2m E (1 E) Ψj (x) 2m B 14 (m) n B 2mϵ. (B.28) Observe too that if 1 k r and x, x I, then, using (3.1), (B2) and (B3), Ψk (x) Ψ k (x ) 1 n n j=1 C 11 n 1 x x ψk (W j x, h) ψ k (W j x, h) n j=1 t (1 + t ) α ϕk (k) (ht) dt B 15 x x h (α+2) B 16 x x n (α+2)/(2α+1). The last inequality follows from the fact that, by (B3), nh 2α+1 is bounded away from. Therefore, if I(n) represents a grid in I with edge width n 1 (α+2)/(2α+1), and if for each x I we define x to be the point in I(n) nearest to x, then and therefore Ψk (x) Ψ k (x ) B17 n 1 (α+2)/(2α+1) n (α+2)/(2α+1) = B 17 n 1, (1 E) Ψk (x) Ψ k (x ) } 2 B 17 n 1. Hence the following version of (B.28), with I(n) there replaced by I, holds: for each ϵ >, P (1 E) Ψ j (x) > n ( ϵ nh 2α+1) } 1/2. Combining (B.24) and (B.29) we deduce that, for each ϵ >, θ Θ (B.29) ˆf rat (x θ) g (x θ) Ψ (x) 1 r s k h k g k (x θ) E Ψ k (x)} 2π k! k=1 = O p n ( ϵ nh 2α 1) } 1/2 + h r s+1. (B.3) Note too that e it(w x) E ϕ U (t) } e ϕ (k) itx ϕ X (t) ϕ U (t) K (ht) dt = ϕ U (t) ϕ K (k) (ht) dt
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 = e itx ϕ X (t) ϕ K (k) (ht) dt = 2π i k f X (x hu) u k K(u) du, which implies that E Ψk (x) } = 2π s k f X (x hu) u k K(u) du. (B.31) Furthermore, g (x θ) 1. Hence, by (B.3), ˆf rat (x θ) Ψ r (x) θ Θ where ϵ > is arbitrary. k=1 h k k! g k(x θ) f X (x hu) u k K(u) du = O p n ϵ ( nh 2α 1) 1/2 + h r s+1 }, (B.32) Assumption (B5) implies that g k (x θ), and its first derivative with respect to θ, are bounded uniformly in x I and θ Θ. Using this result, (B5) and (3.6) we deduce that: max gk (x ˆθ) g k (x θ 1 ) ( ) = Op n 1/2. 1 k r Conditions (B1) and (B2) imply that f X (x hu) u k K(u) du is bounded uniformly in x and h, for k =,..., r. Combining these results with (B.32), and recalling that g (x θ 1 ) 1, we deduce that, for each ϵ >, ˆf rat (x ˆθ) (1 E) Ψ r (x) h k k! g k(x θ 1 ) f X (x hu) u k K(u) du k= = O p n ( ϵ nh 2α 1) } 1/2 + n 1/2 + h r s+1. (B.33) Result (B.22) implies that f(x θ 1) EL x (W θ 1, h)} r k= h k k! g k(x θ 1 ) f X (x hu) u k K(u) du} = O( h r s+1). (B.34) Since Ψ = ˆf dec where ˆf dec is as defined at (2.6), and E ˆfrat (x θ 1 ) } = f(x θ 1 ) EL x (W θ 1, h)},
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 11 where ˆf rat (x θ) is as defined at (2.3), then (B.33) and (B.34) together imply (B.13). The results mentioned in the second and third sentences of this paragraph, together with (B.24), (B.29), (B.31) and (B.34), similarly imply (B.14). B.4 Proof of Theorem 5 Noting that f X = f( θ 1 ) + ψ, and that, by construction, EL x (W θ 1, h)} = EK x (W θ 1, h)}, we have: EL x (W θ 1, h)} = K x (u θ 1, h) du 1 e itu ϕ X (t) dt 2π = 1 ( x u ) fx (u) K h h f(u θ 1 ) du = K(u) f(x hu θ 1) + ψ(x hu) du f(x hu θ 1 ) ψ(x hu) =1 + K(u) f(x hu θ 1 ) du. Therefore, in notation introduced in section 3.5, E ˆfrat (x θ 1 )} f X (x) = = K(u) f(x θ 1 ) f(x hu θ 1 ) ψ(x hu) du ψ(x) K(u) ψ(x hu) ψ(x)} du + h r+1 γ h,x (u θ 1 ) ψ(x hu) du + r k=1 h k k! g k(x θ 1 ) u k K(u) ψ(x hu) du (B.35) =O h β s(ψ) }, (B.36) uniformly in θ Θ and x I. This is equivalent to (B.15). The last identity in (B.36) follows on using the moment condition u j K(u) du =, for j = 1,..., β 1.
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 12 C Details of implementation C.1 Using a ridge when computing ˆf rat If U has a Laplace distribution with scale parameter λ, so that ϕ U (t) = (1 + λ 2 t 2 ) 1, then if K and f( θ) are twice differentiable, we can write L x (u θ, h) = 1 2π e itu (1 + λ 2 t 2 ) ϕ Kx (t θ, h)dt = K x (u θ, h) λ 2 K x(u θ, h). More generally, if U has the distribution of an N-fold Laplace convolution, then L x is a linear combination of derivatives of K x of the form K x (k) (W j θ, h) = k K(x u)/h} = u k hf(u θ) u=wj k+1 l=1 g l (W j x, h, θ) f l (W j θ), for some functions g l which are sums and products of positive powers of K(x W j )/h}, f(w j θ) and their derivatives. In practice, the denominators f l (W j θ) are often too close to zero for some W j s, which makes the estimator work rather poorly. To avoid this problem, we can use a ridge parameter in the denominators. We tried several approaches to ridging in the particular Laplace error case, and found that the following approach performed well in practice: replace f l ( θ) by maxf l ( θ), δ} where δ > is a ridge parameter. In our numerical work, we used this approach with δ =.4 f( θ). C.2 Details of SIMEX bandwidth for ˆf rat It follows from our theoretical results that, under regularity conditions, the asymptotic mean integrated squared error (AMISE) of ˆf rat is equal to AMISE ˆfrat ( θ) } = h 4 µ 2 2(K)Rr f( θ)}/4 + (2πnh) 1 ϕ K (t) 2 ϕ U (t/h) 2 dt, where µ 2 (K) = x 2 K(x) dx, r = f X /f( θ), and we used the notation R(f) = f 2. We suggest choosing h for ˆf rat by minimising a SIMEX estimator of the AMISE;
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 13 see Cook and Stefanski (1994) and Stefanski and Cook (1995) for an introduction to SIMEX. The idea of SIMEX methods is that, in some way, the relation between data from f W and f X can be mimicked by that between data from f W (1) f W f U and f W (2) f W (1) f U. Now, quantities related to f W are easy to estimate from the data, since we have a sample of W i s and can generate data from f W (1) be exploited to estimate unknown quantities related to f X. and f W (2). This can Let h, h 1 and h 2 be the bandwidths that minimise, respectively, AMISE ˆfrat ( θ) }, AMISE ˆfrat,W ( θ) } and AMISE ˆfrat,W (1)( θ) }, where ˆf rat,w and ˆf rat,w (1) denote the ratio estimators of, respectively, f W and f W (1) computed from a sample of size n having a density, respectively f W (1) and f W (2). Then, extending the SIMEX-based bandwidth of Delaigle and Hall (28) to our problem, h /h 1 can be mimicked by h 1 /h 2, and hence h can be approximated by ĥ2 1/ĥ2, where ĥj is an estimator of h j, for j = 1, 2. To estimate h j, construct a sample W (1) 1,..., W (1) n from f W (1) and a sample W (2) 1,..., W n (2) from f W (2), by taking W (1) i = W i + U (1) i and W (2) i = W (1) i + U (2) i for j = 1, 2, U (j) 1,..., U n (j) where, is a sample of independent observations generated from f U, independently of the W i s. Let f W () f W, f () ( θ) f( θ) f U, f (1) ( θ) = f () ( θ) f U, f (2) ( θ) = f (1) ( θ) f U, and, for j =, 1, 2, let r (j) = f W (j)/f (j) ( θ). By definition, we have, for j =, 1, h 4 µ 2 h j+1 = argmin 2(K) R r h 4 (j)f (j) ( θ) } + 1 ϕ K (t) 2πnh ϕ U (t/h) and to estimate h j+1 it suffices to estimate r (j) (x) and f (j)( θ). We estimate the latter by f (j) ( ˆθ), and to estimate the former, we take ˆr (j) (x) = ˇf W (j)(x)/f (j) (x ˆθ)}, where ˇf W () and ˇf W (1) denote standard error-free kernel estimators of f W and f W (1) computed from the data W 1,..., W n and W (1) 1,..., W n (1), respectively, and using, for example, a normal reference bandwidth for estimating second derivatives of densities; see Sheather and Jones (1991) and the references therein. In practice, to compute 2 dt,
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 14 R r (j) f (j)( θ) } we truncate the integral to the interval [W.5, (j) W.95], (j) where W α () and W α (1) denote the 1αth percentile of, respectively, the W i s and the W (1) s. Again, here, when computing ˆr (j)(x) = ˇf (x) ˇfW W (j) f (j) (x θ) (j)(x)f(j) (x θ) f(j) 2 (x θ) 2 ˇf W (j)(x)f(j) (x θ) + (x θ) f 2 (j) 2 ˇf W (j)(x)f(j) (x θ)}2 f(j) 3 (x θ), we need to use a ridge at each denominator. We implemented this procedure in the Laplace case only, and we used the same ridge as the one described in section C.1. Note that estimated bandwidth depends on the SIMEX sample W (j) 1..., W n (j). To reduce the effect of the random sampling step, as in Delaigle and Hall (28) we generate B sets of SIMEX samples of size n (we took B = 3), and take h j+1 to minimise the average of the corresponding B estimated AMISE values. C.3 NR bandwidth for ˆf dec with the sinc kernel The mean integrated squared error (MISE) of ˆf dec, computed with the sinc kernel, is given by i MISE(h) = 1 2πn 1/h 1/h ϕ U (t) 2 dt 1 + n 1 2π 1/h 1/h ϕ X (t) 2 dt + f 2 X, (C.1) and to find the bandwidth that minimises the MISE, we need to find the roots of MISE (h) = 1 πnh 2 ϕ U(1/h) 2 + n + 1 nπh 2 ϕ W (1/h) 2 ϕ U (1/h) 2. Equivalently, this bandwidth is a solution of 1 + (n + 1) ϕ W (1/h) 2 =. The NR rule assumes that X N(ˆµ, ˆσ 2 ), where ˆµ = W and ˆσ 2 = W and Var(W ) Var(U), with Var(W ) denoting the empirical mean and variance of the W i s. This amounts to estimating ϕ W (1/h) by ˆϕ W (1/h) = ϕ U (1/h) exp(iˆµ/h) exp(.5ˆσ 2 /h 2 ). Then the bandwidth is estimated by ĥnr, the solution of 1 + (n + 1) ˆϕ W (1/h) 2 =.
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 15 D Additional simulation results Tables D.1 and D.3 report the integrated squared bias (ISB), computed from 1 samples, of the estimators ˆf dec, ˆfbco and ˆf wgt for the various cases considered in our simulations. These tables show that, as could be expected, ˆfbco is less biased than ˆf dec. ˆfwgt also benefits from a bias reduction in the Laplace error case, and also in the normal case for densities (i) and (ii). However, for the other densities, in the normal error case, the ISB of ˆfwgt is slightly larger than that of ˆfdec. This is because θ is more difficult to estimate in the normal error case than in the Laplace error case. As a result, in the normal error case, the parametric component of ˆf wgt is less good, and the weight ŵ of ˆf wgt, which also depends on θ, is smaller. See Table D.2, where we show the median and interquartile range of ŵ, when estimating densities (i) to (iii). Table D.1: 1 3 ISB of 1 estimators of densities (i) to (iii) in the Laplace and normal error cases, when NSR = 1% or 25% and n = 2 or 7, using the estimators ˆf dec, ˆf bco ( ; ˆθ MD ) and ˆf wgt ( ; ˆθ ML ). Laplace Normal Density (i) Density (ii) Density (iii) Density (i) Density (ii) Density (iii) n n 2 7 2 7 2 7 2 7 2 7 2 7 NSR = 1% NSR = 1% ˆf dec 4.66 2.22 2.41 1.27 4.94 7.57 6.27 3.48 2.97 1.62 6.44 3.7 ˆf bco ( ; ˆθ MD ).26.6 1.12.31 2.36 5.49.47.3.78.9 2.3.6 ˆf wgt ( ; ˆθ ML ).47.15.24.1.77 6.59 3.4 1.66 1.43.91 6.73 3.94 NSR = 25% NSR = 25% ˆf dec 7.57 4.26 3.27 2.16 2.36 4.3 15.5 11.2 5.33 4.16 12.9 9.35 ˆf bco ( ; ˆθ MD ).84.19 2.19 1.25 1.1 2.42 8.9 2.12 2.96 1.35 7.37 3.2 ˆf wgt ( ; ˆθ ML ).83.25.25.14.19 2.74 7.44 5.8 2.16 1.87 12.6 9.68
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 16 Table D.2: Median (IQR) of 1 values of ŵ when estimating densities (i) to (iii) in the Laplace and normal error cases, with NSR = 1% or 25% and n = 2 or 7. Density (i) Density (ii) Density (iii) n ˆθ ML 2 7 2 7 2 7 Lap, NSR = 1%.7 (.7).74 (.7).71 (.9).73 (.7).68 (.9).72 (.1) Lap, NSR = 25%.73 (.7).76 (.5).73 (.9).78 (.7).71 (.8).73 (.8) Norm, NSR = 1%.29 (.9).28 (.12).29 (.9).28 (.9).14 (.7).5 (.2) Norm, NSR = 25%.33 (.1).33 (.9).3 (.9).33 (.12).24 (.13).17 (.9) Table D.3: 1 3 ISB of 1 estimators of densities (iv) to (vi) in the normal error case, when NSR = 1% or 25% and n = 2 or 7, using the estimators ˆf dec, ˆf bco ( ; ˆθ MD ) and ˆf wgt ( ; ˆθ ML ). Density (iv) Density (v) Density (vi) Density (iv) Density (v) Density (vi) n n 2 7 2 7 2 7 2 7 2 7 2 7 NSR = 1% NSR = 25% ˆf dec 2.77 1.84 2.1 1.2 3.69 2.28 4.85 4.6 4.86 2.96 6.5 4.49 ˆf bco ( ; ˆθ MD ) 1.65 1.12.27.8 2.31.77 2.61 2.2 1.27.27 4.73 2.85 ˆf wgt ( ; ˆθ ML ) 2.92 1.89 2.12 1.6 4.4 2.42 5.5 4.15 4.72 2.92 6.57 4.79 References Cook, J.R. and Stefanski, L.A. (1994). Simulation-extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc., 89 1314 1328. Hall, P. and Heyde, C. (198). Martingale Limit Theory and its Application. Academic Press. Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Royal Statist. Society Ser. B., 53 683 69. Stefanski, L. and Cook, J.R. (1995). Simulation-extrapolation: The measurement error jackknife. J. Amer. Statist. Assoc. 9 1247 1256.