SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1

Similar documents
Goodness-of-fit tests for the cure rate in a mixture cure model

arxiv: v1 [stat.me] 17 Jan 2008

On robust and efficient estimation of the center of. Symmetry.

Local Polynomial Regression

Density estimators for the convolution of discrete and continuous random variables

A Novel Nonparametric Density Estimator

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

Smooth simultaneous confidence bands for cumulative distribution functions

Nonparametric confidence intervals. for receiver operating characteristic curves

Non-parametric Inference and Resampling

Time Series and Forecasting Lecture 4 NonLinear Time Series

Math 180A. Lecture 16 Friday May 7 th. Expectation. Recall the three main probability density functions so far (1) Uniform (2) Exponential.

VARIANCE REDUCTION BY SMOOTHING REVISITED BRIAN J E R S K Y (POMONA)

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Bandwith selection based on a special choice of the kernel

12 - Nonparametric Density Estimation

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

On variable bandwidth kernel density estimation

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

TIKHONOV S REGULARIZATION TO DECONVOLUTION PROBLEM

Log-Density Estimation with Application to Approximate Likelihood Inference

Convergence rates in weighted L 1 spaces of kernel density estimators for linear processes

1 General problem. 2 Terminalogy. Estimation. Estimate θ. (Pick a plausible distribution from family. ) Or estimate τ = τ(θ).

Continuous Distributions

Adaptive Kernel Estimation of The Hazard Rate Function

Kernel Density Estimation

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Minimum Hellinger Distance Estimation in a. Semiparametric Mixture Model

Partial Differential Equations

1 + lim. n n+1. f(x) = x + 1, x 1. and we check that f is increasing, instead. Using the quotient rule, we easily find that. 1 (x + 1) 1 x (x + 1) 2 =

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Kernel density estimation of reliability with applications to extreme value distribution

Integral approximation by kernel smoothing

Density Estimation with Replicate Heteroscedastic Measurements

Random fraction of a biased sample

Density Deconvolution for Generalized Skew-Symmetric Distributions

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

ON THE CHOICE OF TEST STATISTIC FOR CONDITIONAL MOMENT INEQUALITES. Timothy B. Armstrong. October 2014 Revised July 2017

the convolution of f and g) given by

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

ON DECONVOLUTION WITH REPEATED MEASUREMENTS

Nonparametric Density Estimation (Multidimension)

Statistics: Learning models from data

Estimation of the Bivariate and Marginal Distributions with Censored Data

A Design-Adaptive Local Polynomial Estimator for the Errors-in-Variables Problem

ζ (s) = s 1 s {u} [u] ζ (s) = s 0 u 1+sdu, {u} Note how the integral runs from 0 and not 1.

Properties of Principal Component Methods for Functional and Longitudinal Data Analysis 1

EXPOSITORY NOTES ON DISTRIBUTION THEORY, FALL 2018

UNIVERSITÄT POTSDAM Institut für Mathematik

Econometrics I, Estimation

The Central Limit Theorem: More of the Story

INTRODUCTION TO REAL ANALYSIS II MATH 4332 BLECHER NOTES

Econ 582 Nonparametric Regression

Extending clustered point process-based rainfall models to a non-stationary climate


Limiting Distributions

MATH 205C: STATIONARY PHASE LEMMA

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Stochastic Processes

h(x) lim H(x) = lim Since h is nondecreasing then h(x) 0 for all x, and if h is discontinuous at a point x then H(x) > 0. Denote

Efficient Regressions via Optimally Combining Quantile Information

Bootstrap of residual processes in regression: to smooth or not to smooth?

A Note on Tail Behaviour of Distributions. the max domain of attraction of the Frechét / Weibull law under power normalization

arxiv: v2 [stat.me] 3 Dec 2016

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

6 Lecture 6b: the Euler Maclaurin formula

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

MATH 220 solution to homework 4

Monte-Carlo MMD-MA, Université Paris-Dauphine. Xiaolu Tan

ON THE ESTIMATION OF EXTREME TAIL PROBABILITIES. By Peter Hall and Ishay Weissman Australian National University and Technion

Estimation of a quadratic regression functional using the sinc kernel

Summer 2017 MATH Solution to Exercise 5

A Gentle Introduction to Stein s Method for Normal Approximation I

Conditional Distributions

Sobolev Spaces. Chapter 10

Solutions to Exam 1, Math Solution. Because f(x) is one-to-one, we know the inverse function exists. Recall that (f 1 ) (a) =

Modelling Non-linear and Non-stationary Time Series

x 2 y = 1 2. Problem 2. Compute the Taylor series (at the base point 0) for the function 1 (1 x) 3.

Non-parametric Inference and Resampling

Asymptotic Distributions for the Nelson-Aalen and Kaplan-Meier estimators and for test statistics.

Fourier Series. (Com S 477/577 Notes) Yan-Bin Jia. Nov 29, 2016

Discussion Paper No. 28

IEOR 165 Lecture 7 1 Bias-Variance Tradeoff

Brief Review on Estimation Theory

Closest Moment Estimation under General Conditions

Asymptotic Properties of an Approximate Maximum Likelihood Estimator for Stochastic PDEs

Statistica Sinica Preprint No: SS

Week 9 The Central Limit Theorem and Estimation Concepts

Anomalous transport of particles in Plasma physics

Technical Supplement for: The Triangular Model with Random Coefficients

Brownian motion. Samy Tindel. Purdue University. Probability Theory 2 - MA 539

Introduction to Machine Learning. Lecture 2

EMPIRICAL EDGEWORTH EXPANSION FOR FINITE POPULATION STATISTICS. I. M. Bloznelis. April Introduction

Lecture 7 Introduction to Statistical Decision Theory

Spring 2012 Math 541B Exam 1

Web Appendix for Hierarchical Adaptive Regression Kernels for Regression with Functional Predictors by D. B. Woodard, C. Crainiceanu, and D.

Estimation Theory. as Θ = (Θ 1,Θ 2,...,Θ m ) T. An estimator

Transcription:

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 B Technical details B.1 Variance of ˆf dec in the ersmooth case We derive the variance in the case where ϕ K (t) = ( 1 t 2) κ I( 1 t 1), i.e. we take K as in (A3), and assume that ϕ U (t) = a(t) exp ( t α), where r, α > and a denotes a symmetric, real-valued function satisfying a() = 1 and a(t) ξ t α 1 as t, (B.1) with < α 1 < and ξ >. That is, we put γ = 1, α 2 = α 1 and d = d 1 in (3.2). The more general setting at (3.2) obtains similarly. Assume too that the distribution of which f W is the density has finite variance: w 2 f W (w) dw <. (B.2) Recall the definition of K U at (2.2). In this notation, it is well known that the asymptotic variance of ˆf dec is given by Var ˆfdec (x) } = n 1 K 2 U f W (x) n 1 E ˆf dec (x) } 2. Theorem 3. If (B.1) (B.2) hold then, for each x, K 2 U f W (x) as h. 2 κ ξ 1 Γ(κ + 1) h (κ+1)α+α 1 exp ( h α) } 2 2π α κ+1 cos(x y)} 2 f W (y) dy (B.3) Proof. It is notationally convenient to put b = a 1. Note that 2π K U (x) = = h 1 cos(tx) ( 1 t 2) κ b(t/h) exp ( t/h α ) dt 1/h cos(hux) 1 (hu) 2} κ b(u) exp ( u α ) du = h I 1 (x; h), (B.4)

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 2 say. Let ϵ = ϵ(h) decrease to zero slowly as h, at a rate that we shall address shortly, and observe that, by (B.1), 1/h I 1(x; h) cos(hux) 1 (hu) 2} κ ( ) b(u) exp u α du (1 ϵ)/h (1 ϵ)/h C } 1 1 (hu) 2 κ (1 + u ) α 1 exp ( u α) du (1 ϵ)/h C 2 (1 + u ) α 1 exp ( u α) du C 3 h α 1 exp [ (1 ϵ)/h} α], (B.5) where, here and below, C 1, C 2,... denote generic positive constants not depending on h or x. Note too that 1/h (1 ϵ)/h cos(hux) 1 (hu) 2} κ b(u) exp ( u α ) du cos(x) 1/h C 3 ϵ x (1 ϵ)/h 1/h (1 ϵ)/h } 1 (hu) 2 κ ( ) b(u) exp u α du 1 (hu) 2 } κ (1 + u ) α 1 exp ( u α) du. (B.6) Since α > then, writing o u (1) for a generic function of u that satisfies u ϵ/h α o u (1) = o(1), and taking ϵ to decrease to zero so slowly that ϵ/h α, we have: 1/h 1 (hu) 2 } κ b(u) exp ( u α ) du (B.7) (1 ϵ)/h = ϵ/h 2 hv (hv) 2 } κ b ( h 1 v ) exp ( h 1 v α) dv ϵ/h = (2h) κ v ( κ 1 1 hv) κ ( 2 b h 1 v ) exp ( h 1 v α) dv ξ 1 (2h) κ h α 1 = ξ 1 (2h) κ h α 1 ϵ/h ϵ/h v κ exp ( h 1 v α) dv v κ exp ( h α 1 hv α) dv

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 3 ϵ/h α = 2 κ ξ 1 h (κ+1)α+α 1 1 = 2 κ ξ 1 h (κ+1)α+α 1 1 ϵ/h α u κ exp ( h α 1 h α u α) du u κ exp h α (1 α h α u) + o u (1) } du = 2 κ ξ 1 h (κ+1)α+α1 1 exp ( h α) ϵ/h α u κ exp α u + o u (1) } du 2 κ ξ 1 h (κ+1)α+α1 1 exp ( h α) u κ exp( α u) du (B.8) = 2κ ξ 1 Γ(κ + 1) α κ+1 h (κ+1)α+α 1 1 exp ( h α). (B.9) Combining (B.6) and (B.9) we deduce that 1/h cos(hux) 1 (hu) 2} κ ( ) b(u) exp u α du (1 ϵ)/h cos(x) + o unif (1)} 2κ ξ 1 Γ(κ + 1) α κ+1 C 4 ϵ x h (κ+1)α+α 1 1 exp ( h α), h (κ+1)α+α1 1 exp ( h α) (B.1) where, here and below, o unif (1) and O unif (1) denote quantities that depend on x and equal o(1) and O(1), respectively, uniformly in < x <, as h. Together, (B.5) and (B.1) imply that I 1 (x; h) = cos(x) 2κ ξ 1 Γ(κ + 1) α κ+1 h (κ+1)α+α 1 1 exp ( h α) + o unif (1) + O unif (1) ϵ x } h (κ+1)α+α 1 1 exp ( h α) + O unif (1) h α 1 exp [ (1 ϵ)/h} α]. (B.11) Together, (B.4) and (B.11) imply that, if ϵ decreases to zero slowly as h, KU 2 f W (x) = KU(x 2 y) f W (y) dy = h2 I (2π) 1(x 2 y; h) f 2 W (y) dy 2 κ ξ 1 Γ(κ + 1) = h (κ+1)α+α 1 exp ( h α) } 2 2π α κ+1 cos(x y)} 2 f W (y) dy + o[ h (κ+1)α+α 1 exp ( h α)} ] 2. (B.12) (Here we have used (B.2).) This result is equivalent to (B.3).

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 4 B.2 Main steps in derivation of convergence rate of ˆf rat (see (3.13)) in the case n 1/2 η Theorem 4 below demonstrates that, under the assumption that ˆθ = θ 1 + O p (n 1/2 ), the asymptotic bias of ˆfrat (x ˆθ) equals that of ˆfrat (x θ 1 ), and the error about the mean of ˆf rat (x ˆθ) equals that of ˆf dec (x) plus negligible terms. Write E fx for expectation when the distribution of the data W j has density f U f X ; and, for any random variable R with finite mean, let (1 E fx ) R denote R E fx (R). Theorem 4. If (3.1), (3.6) and (B1) (B6) in section 3.5 hold, then, for each ϵ >, ˆf rat (x ˆθ) (1 E fx ) ˆf dec (x) E fx ˆfrat (x θ 1 ) } = O p n ( ϵ nh 2α 1) } 1/2 + n 1/2 + h r s+1. (B.13) Moreover, the remainder term on the right-hand side is of the stated order uniformly in densities f X for which (3.1), (B1) (B6) in section 3.5, and (3.6) hold, in the sense that lim lim C n f X F > C P fx [ ˆf rat (x ˆθ) (1 E fx ) ˆf dec (x) E fx ˆfrat (x θ 1 ) } n ϵ ( nh 2α 1) 1/2 + n 1/2 + h r s+1 } ] =. (B.14) Our next result describes the bias of ˆf rat ( θ 1 ). Let s(ψ) be as in (3.15). Theorem 5. If (3.1), and (B1) (B6) in section 3.5 hold, then θ Θ h β s(ψ) } 1 EfX ˆfrat (x θ 1 )} f X (x) = O(1) as n. (B.15) Let Var fx denote the variance operator when the data W j have density f U f X. If (3.1), (B1), (B2) and (B3) hold, then standard deconvolution results imply that f X F Var fx ˆfdec (x) } = O (nh 2α+1 ) 1}, (B.16)

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 5 where the order is exact when f W (x) is nonzero. Hence, the remainder on the righthand side of (B.13) is negligibly small relative to the term (1 E fx ) ˆf dec (x) on the left-hand side, for any x such that f W (x) is nonzero. Combining (B.14), (B.15) and (B.16) we deduce that, under the conditions of Theorem 5, if the term in h r s+1 in (B.14) can be ignored then [ lim lim P fx ˆfrat (x) f X (x) (nh > C ) }] 2α+1 1/2 + h β s(ψ) =. C n θ Θ (B.17) Under the assumptions for Theorem 5, if ψ represents a sequence of functions, and we take h min1, (nη 2 ) 1/(2α+2β+1) }, then the minimiser, with respect to h, of the term within braces (i.e. the coefficient of C) on the left-hand side of (B.17) is of size (η 2α+1 /n β ) 1/(2α+2β+1) in the case n 1/2 η, in which instance h is asymptotic to a constant multiple of (nη 2 ) 1/(2α+2β+1). The term in h r s+1 in (B.14) can be ignored if h r s+1 = O(η 2α+1 /n β ) 1/(2α+2β+1) }, or equivalently, since h is asymptotic to a constant multiple of (nη 2 ) 1/(2α+2β+1), if n 1 = O ( η (2α+2r 2s+3)/(r s+1 β)) (B.18) (provided that r > s 1 + β). For example, if η = O(n t ) where t (, 1 2 ) and 2 t α + β (1 2 t) (r s + 1) t, then (B.18) holds, in which case h r s+1 can be ignored in (B.2) and therefore property (3.13) follows from (B.17). B.3 Proof of Theorem 4 Observe that 2π f(x θ) L x (w θ, h) =f(x θ) =f(x θ) e it(x w) = ϕ U (t) dt e itw ϕ K x( θ,h)(t) dt ϕ U (t) e itw ϕ U (t) dt e it(x hu) K(u) f(x hu θ) du e ihtu f(x θ) K(u) f(x hu θ) du

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 6 = r k= h k e it(x w) k! g k(x θ) ϕ U (t) dt + h r+1 e it(x w) ϕ U (t) dt Since γ h,x has s derivatives then s integrations by parts give: e ihtu ω(hu, x θ) u r+1 K(u) du = (iht) s Therefore, e it(x w) ϕ U (t) dt e ihtu ω(hu, x θ) u r+1 K(u) du 1 dt γ h,x (u θ) du + ϕ U (t) 1 e ihtu u k K(u) du e ihtu ω(hu, x θ) u r+1 K(u) du. t >1 e ihtu γ (s) h,x (u θ) du. Combining this bound with (B.19), and noting (B6), we deduce that: r f(x θ) L h k g k (x θ) x(w θ, h) <w< θ Θ 2πk! k= 1 B 4 h r+1 dt ϕ U (t) + 1 (B.19) dt γ (s) ht s h,x ϕ U (t) (u θ) du. e it(x w) ϕ U (t) dt t >1 dt ht s ϕ U (t) e ihtu u k K(u) du }, (B.2) uniformly in < h C 1, where B 4 > is a constant. Property (3.1) implies that the first integral on the right-hand side of (B.2) is finite, and (3.1) and (B6) imply that the second integral is bounded above by t >1 dt ht s ϕ U (t) C (1 + t ) α 1 dt = O ( h s). t >1 ht s These results, (B.2) and the property e ihtu u k K(u) du = ( i) k ϕ (k) K ( ht), entail: f(x θ) L x(w θ, h) <w< θ Θ r ( ih) k g k (x θ) e it(w x) 2πk! ϕ U (t) ϕ K (k) (ht) dt = O( h r s+1). (B.21) k=

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 7 Note that (B.21), and (B.22) and (B.24) below, involve only the densities f( θ) and f U, which we take to be fixed; these formulae do not involve f X, which can vary with n. Let s k = ( 1) k/2 if k is even, and s k = ( 1) (k 1)/2 if k is odd. Since the distribution of U is symmetric then ϕ U is symmetric, and so, defining cos(tu) ϕ U (t) ϕ K (k) (ht) dt if k is even ψ k (u, h) = sin(tu) ϕ U (t) ϕ K (k) (ht) dt if k is odd we have: <w< θ Θ Hence, defining f(x θ) L x(w θ, h) 1 2π r k= s k h k k! Ψ k (x) = 1 n g k (x θ) ψ k (w x, h) = O( h r s+1). n ψ k (W j x, h), j=1 (B.22) (B.23) we have: for a constant B 5 >, not depending on h, n or the data W 1,..., W n, ˆf rat (x θ) 1 r s k h k g k (x θ) 2π k! Ψ k (x) B 5 h r s+1, (B.24) θ Θ k= where the right-hand side denotes a deterministic quantity of the stated size. Define and note that, by (B2), L k (u) = L k (u) B 6 h α <u< exp(itu) ϕ U (t/h) ϕ K (k) (t) dt (1 + t ) α ϕk (k) (t) dt B7 h α, (B.25) where B 6, B 7 > are constants not depending on h or n. Using (3.1), (B1), (B2) and (B.25) we deduce that, for each integer m 2, h m E ψ k (W x, h) m} E [ L k (W x)/h} m] L k m 2 E [ L k (W x)/h} 2]

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 8 =h L k m 2 L k (u) 2 f W (x + hu) du B 8 h ( B 7 h α) m 2 L k (u) 2 du =B 9 (m) h 1 (m 2)α (2π) 1 ϕ (k) K (t) ϕ U (t/h) B 1 (m) h 1 (m 2)α (1 + t/h ) α ϕ K (k) (t) } 2 dt =O ( h 1 mα), (B.26) 2 dt where the bounds apply uniformly in x I, and B 8 = f W, B 9 (m) = B m 2 7 B 8, and B l (m) or B l, for l 1, denote constants not depending on h or n. (The second identity in the string at (B.26) follows using Parseval s identity. Note that if (B1) holds then f X (x) is bounded uniformly in x and n, from which it follows that the same is true for f W (x).) Therefore, E ψ k (W x, h) m} = O ( h 1 m(α+1)). (B.27) Let 1 j r, and recall the definition of Ψ k at (B.23). Replacing m by either 2 or 2m in the bound at (B.27), we have for each m 1, using Rosenthal s identity (see e.g. Hall and Heyde (198), p. 23): E (1 E) Ψj (x) ( [ 2m B 11 (m) n 1 E ψ k (W x, h) 2}] m + n (2m 1) E ψ k (W x, h) 2m}) (nh 2α+1 B 12 (m) ) m ( + h α nh α+1) } (2m 1) B 13 (m) ( nh 2α+1) m, uniformly in x I, where the last inequality is valid whenever nh 1. Hence, if I(n) is a set of at most n B values x I, then, for each ϵ >, P (n) (1 E) Ψj (x) ( > n ϵ nh 2α+1) } 1/2

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 9 n B (1 P E) Ψj (x) ( > n ϵ nh 2α+1) } 1/2 (n) n B n ϵ ( nh 2α+1) 1/2 } 2m E (1 E) Ψj (x) 2m B 14 (m) n B 2mϵ. (B.28) Observe too that if 1 k r and x, x I, then, using (3.1), (B2) and (B3), Ψk (x) Ψ k (x ) 1 n n j=1 C 11 n 1 x x ψk (W j x, h) ψ k (W j x, h) n j=1 t (1 + t ) α ϕk (k) (ht) dt B 15 x x h (α+2) B 16 x x n (α+2)/(2α+1). The last inequality follows from the fact that, by (B3), nh 2α+1 is bounded away from. Therefore, if I(n) represents a grid in I with edge width n 1 (α+2)/(2α+1), and if for each x I we define x to be the point in I(n) nearest to x, then and therefore Ψk (x) Ψ k (x ) B17 n 1 (α+2)/(2α+1) n (α+2)/(2α+1) = B 17 n 1, (1 E) Ψk (x) Ψ k (x ) } 2 B 17 n 1. Hence the following version of (B.28), with I(n) there replaced by I, holds: for each ϵ >, P (1 E) Ψ j (x) > n ( ϵ nh 2α+1) } 1/2. Combining (B.24) and (B.29) we deduce that, for each ϵ >, θ Θ (B.29) ˆf rat (x θ) g (x θ) Ψ (x) 1 r s k h k g k (x θ) E Ψ k (x)} 2π k! k=1 = O p n ( ϵ nh 2α 1) } 1/2 + h r s+1. (B.3) Note too that e it(w x) E ϕ U (t) } e ϕ (k) itx ϕ X (t) ϕ U (t) K (ht) dt = ϕ U (t) ϕ K (k) (ht) dt

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 = e itx ϕ X (t) ϕ K (k) (ht) dt = 2π i k f X (x hu) u k K(u) du, which implies that E Ψk (x) } = 2π s k f X (x hu) u k K(u) du. (B.31) Furthermore, g (x θ) 1. Hence, by (B.3), ˆf rat (x θ) Ψ r (x) θ Θ where ϵ > is arbitrary. k=1 h k k! g k(x θ) f X (x hu) u k K(u) du = O p n ϵ ( nh 2α 1) 1/2 + h r s+1 }, (B.32) Assumption (B5) implies that g k (x θ), and its first derivative with respect to θ, are bounded uniformly in x I and θ Θ. Using this result, (B5) and (3.6) we deduce that: max gk (x ˆθ) g k (x θ 1 ) ( ) = Op n 1/2. 1 k r Conditions (B1) and (B2) imply that f X (x hu) u k K(u) du is bounded uniformly in x and h, for k =,..., r. Combining these results with (B.32), and recalling that g (x θ 1 ) 1, we deduce that, for each ϵ >, ˆf rat (x ˆθ) (1 E) Ψ r (x) h k k! g k(x θ 1 ) f X (x hu) u k K(u) du k= = O p n ( ϵ nh 2α 1) } 1/2 + n 1/2 + h r s+1. (B.33) Result (B.22) implies that f(x θ 1) EL x (W θ 1, h)} r k= h k k! g k(x θ 1 ) f X (x hu) u k K(u) du} = O( h r s+1). (B.34) Since Ψ = ˆf dec where ˆf dec is as defined at (2.6), and E ˆfrat (x θ 1 ) } = f(x θ 1 ) EL x (W θ 1, h)},

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 11 where ˆf rat (x θ) is as defined at (2.3), then (B.33) and (B.34) together imply (B.13). The results mentioned in the second and third sentences of this paragraph, together with (B.24), (B.29), (B.31) and (B.34), similarly imply (B.14). B.4 Proof of Theorem 5 Noting that f X = f( θ 1 ) + ψ, and that, by construction, EL x (W θ 1, h)} = EK x (W θ 1, h)}, we have: EL x (W θ 1, h)} = K x (u θ 1, h) du 1 e itu ϕ X (t) dt 2π = 1 ( x u ) fx (u) K h h f(u θ 1 ) du = K(u) f(x hu θ 1) + ψ(x hu) du f(x hu θ 1 ) ψ(x hu) =1 + K(u) f(x hu θ 1 ) du. Therefore, in notation introduced in section 3.5, E ˆfrat (x θ 1 )} f X (x) = = K(u) f(x θ 1 ) f(x hu θ 1 ) ψ(x hu) du ψ(x) K(u) ψ(x hu) ψ(x)} du + h r+1 γ h,x (u θ 1 ) ψ(x hu) du + r k=1 h k k! g k(x θ 1 ) u k K(u) ψ(x hu) du (B.35) =O h β s(ψ) }, (B.36) uniformly in θ Θ and x I. This is equivalent to (B.15). The last identity in (B.36) follows on using the moment condition u j K(u) du =, for j = 1,..., β 1.

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 12 C Details of implementation C.1 Using a ridge when computing ˆf rat If U has a Laplace distribution with scale parameter λ, so that ϕ U (t) = (1 + λ 2 t 2 ) 1, then if K and f( θ) are twice differentiable, we can write L x (u θ, h) = 1 2π e itu (1 + λ 2 t 2 ) ϕ Kx (t θ, h)dt = K x (u θ, h) λ 2 K x(u θ, h). More generally, if U has the distribution of an N-fold Laplace convolution, then L x is a linear combination of derivatives of K x of the form K x (k) (W j θ, h) = k K(x u)/h} = u k hf(u θ) u=wj k+1 l=1 g l (W j x, h, θ) f l (W j θ), for some functions g l which are sums and products of positive powers of K(x W j )/h}, f(w j θ) and their derivatives. In practice, the denominators f l (W j θ) are often too close to zero for some W j s, which makes the estimator work rather poorly. To avoid this problem, we can use a ridge parameter in the denominators. We tried several approaches to ridging in the particular Laplace error case, and found that the following approach performed well in practice: replace f l ( θ) by maxf l ( θ), δ} where δ > is a ridge parameter. In our numerical work, we used this approach with δ =.4 f( θ). C.2 Details of SIMEX bandwidth for ˆf rat It follows from our theoretical results that, under regularity conditions, the asymptotic mean integrated squared error (AMISE) of ˆf rat is equal to AMISE ˆfrat ( θ) } = h 4 µ 2 2(K)Rr f( θ)}/4 + (2πnh) 1 ϕ K (t) 2 ϕ U (t/h) 2 dt, where µ 2 (K) = x 2 K(x) dx, r = f X /f( θ), and we used the notation R(f) = f 2. We suggest choosing h for ˆf rat by minimising a SIMEX estimator of the AMISE;

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 13 see Cook and Stefanski (1994) and Stefanski and Cook (1995) for an introduction to SIMEX. The idea of SIMEX methods is that, in some way, the relation between data from f W and f X can be mimicked by that between data from f W (1) f W f U and f W (2) f W (1) f U. Now, quantities related to f W are easy to estimate from the data, since we have a sample of W i s and can generate data from f W (1) be exploited to estimate unknown quantities related to f X. and f W (2). This can Let h, h 1 and h 2 be the bandwidths that minimise, respectively, AMISE ˆfrat ( θ) }, AMISE ˆfrat,W ( θ) } and AMISE ˆfrat,W (1)( θ) }, where ˆf rat,w and ˆf rat,w (1) denote the ratio estimators of, respectively, f W and f W (1) computed from a sample of size n having a density, respectively f W (1) and f W (2). Then, extending the SIMEX-based bandwidth of Delaigle and Hall (28) to our problem, h /h 1 can be mimicked by h 1 /h 2, and hence h can be approximated by ĥ2 1/ĥ2, where ĥj is an estimator of h j, for j = 1, 2. To estimate h j, construct a sample W (1) 1,..., W (1) n from f W (1) and a sample W (2) 1,..., W n (2) from f W (2), by taking W (1) i = W i + U (1) i and W (2) i = W (1) i + U (2) i for j = 1, 2, U (j) 1,..., U n (j) where, is a sample of independent observations generated from f U, independently of the W i s. Let f W () f W, f () ( θ) f( θ) f U, f (1) ( θ) = f () ( θ) f U, f (2) ( θ) = f (1) ( θ) f U, and, for j =, 1, 2, let r (j) = f W (j)/f (j) ( θ). By definition, we have, for j =, 1, h 4 µ 2 h j+1 = argmin 2(K) R r h 4 (j)f (j) ( θ) } + 1 ϕ K (t) 2πnh ϕ U (t/h) and to estimate h j+1 it suffices to estimate r (j) (x) and f (j)( θ). We estimate the latter by f (j) ( ˆθ), and to estimate the former, we take ˆr (j) (x) = ˇf W (j)(x)/f (j) (x ˆθ)}, where ˇf W () and ˇf W (1) denote standard error-free kernel estimators of f W and f W (1) computed from the data W 1,..., W n and W (1) 1,..., W n (1), respectively, and using, for example, a normal reference bandwidth for estimating second derivatives of densities; see Sheather and Jones (1991) and the references therein. In practice, to compute 2 dt,

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 14 R r (j) f (j)( θ) } we truncate the integral to the interval [W.5, (j) W.95], (j) where W α () and W α (1) denote the 1αth percentile of, respectively, the W i s and the W (1) s. Again, here, when computing ˆr (j)(x) = ˇf (x) ˇfW W (j) f (j) (x θ) (j)(x)f(j) (x θ) f(j) 2 (x θ) 2 ˇf W (j)(x)f(j) (x θ) + (x θ) f 2 (j) 2 ˇf W (j)(x)f(j) (x θ)}2 f(j) 3 (x θ), we need to use a ridge at each denominator. We implemented this procedure in the Laplace case only, and we used the same ridge as the one described in section C.1. Note that estimated bandwidth depends on the SIMEX sample W (j) 1..., W n (j). To reduce the effect of the random sampling step, as in Delaigle and Hall (28) we generate B sets of SIMEX samples of size n (we took B = 3), and take h j+1 to minimise the average of the corresponding B estimated AMISE values. C.3 NR bandwidth for ˆf dec with the sinc kernel The mean integrated squared error (MISE) of ˆf dec, computed with the sinc kernel, is given by i MISE(h) = 1 2πn 1/h 1/h ϕ U (t) 2 dt 1 + n 1 2π 1/h 1/h ϕ X (t) 2 dt + f 2 X, (C.1) and to find the bandwidth that minimises the MISE, we need to find the roots of MISE (h) = 1 πnh 2 ϕ U(1/h) 2 + n + 1 nπh 2 ϕ W (1/h) 2 ϕ U (1/h) 2. Equivalently, this bandwidth is a solution of 1 + (n + 1) ϕ W (1/h) 2 =. The NR rule assumes that X N(ˆµ, ˆσ 2 ), where ˆµ = W and ˆσ 2 = W and Var(W ) Var(U), with Var(W ) denoting the empirical mean and variance of the W i s. This amounts to estimating ϕ W (1/h) by ˆϕ W (1/h) = ϕ U (1/h) exp(iˆµ/h) exp(.5ˆσ 2 /h 2 ). Then the bandwidth is estimated by ĥnr, the solution of 1 + (n + 1) ˆϕ W (1/h) 2 =.

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 15 D Additional simulation results Tables D.1 and D.3 report the integrated squared bias (ISB), computed from 1 samples, of the estimators ˆf dec, ˆfbco and ˆf wgt for the various cases considered in our simulations. These tables show that, as could be expected, ˆfbco is less biased than ˆf dec. ˆfwgt also benefits from a bias reduction in the Laplace error case, and also in the normal case for densities (i) and (ii). However, for the other densities, in the normal error case, the ISB of ˆfwgt is slightly larger than that of ˆfdec. This is because θ is more difficult to estimate in the normal error case than in the Laplace error case. As a result, in the normal error case, the parametric component of ˆf wgt is less good, and the weight ŵ of ˆf wgt, which also depends on θ, is smaller. See Table D.2, where we show the median and interquartile range of ŵ, when estimating densities (i) to (iii). Table D.1: 1 3 ISB of 1 estimators of densities (i) to (iii) in the Laplace and normal error cases, when NSR = 1% or 25% and n = 2 or 7, using the estimators ˆf dec, ˆf bco ( ; ˆθ MD ) and ˆf wgt ( ; ˆθ ML ). Laplace Normal Density (i) Density (ii) Density (iii) Density (i) Density (ii) Density (iii) n n 2 7 2 7 2 7 2 7 2 7 2 7 NSR = 1% NSR = 1% ˆf dec 4.66 2.22 2.41 1.27 4.94 7.57 6.27 3.48 2.97 1.62 6.44 3.7 ˆf bco ( ; ˆθ MD ).26.6 1.12.31 2.36 5.49.47.3.78.9 2.3.6 ˆf wgt ( ; ˆθ ML ).47.15.24.1.77 6.59 3.4 1.66 1.43.91 6.73 3.94 NSR = 25% NSR = 25% ˆf dec 7.57 4.26 3.27 2.16 2.36 4.3 15.5 11.2 5.33 4.16 12.9 9.35 ˆf bco ( ; ˆθ MD ).84.19 2.19 1.25 1.1 2.42 8.9 2.12 2.96 1.35 7.37 3.2 ˆf wgt ( ; ˆθ ML ).83.25.25.14.19 2.74 7.44 5.8 2.16 1.87 12.6 9.68

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 16 Table D.2: Median (IQR) of 1 values of ŵ when estimating densities (i) to (iii) in the Laplace and normal error cases, with NSR = 1% or 25% and n = 2 or 7. Density (i) Density (ii) Density (iii) n ˆθ ML 2 7 2 7 2 7 Lap, NSR = 1%.7 (.7).74 (.7).71 (.9).73 (.7).68 (.9).72 (.1) Lap, NSR = 25%.73 (.7).76 (.5).73 (.9).78 (.7).71 (.8).73 (.8) Norm, NSR = 1%.29 (.9).28 (.12).29 (.9).28 (.9).14 (.7).5 (.2) Norm, NSR = 25%.33 (.1).33 (.9).3 (.9).33 (.12).24 (.13).17 (.9) Table D.3: 1 3 ISB of 1 estimators of densities (iv) to (vi) in the normal error case, when NSR = 1% or 25% and n = 2 or 7, using the estimators ˆf dec, ˆf bco ( ; ˆθ MD ) and ˆf wgt ( ; ˆθ ML ). Density (iv) Density (v) Density (vi) Density (iv) Density (v) Density (vi) n n 2 7 2 7 2 7 2 7 2 7 2 7 NSR = 1% NSR = 25% ˆf dec 2.77 1.84 2.1 1.2 3.69 2.28 4.85 4.6 4.86 2.96 6.5 4.49 ˆf bco ( ; ˆθ MD ) 1.65 1.12.27.8 2.31.77 2.61 2.2 1.27.27 4.73 2.85 ˆf wgt ( ; ˆθ ML ) 2.92 1.89 2.12 1.6 4.4 2.42 5.5 4.15 4.72 2.92 6.57 4.79 References Cook, J.R. and Stefanski, L.A. (1994). Simulation-extrapolation estimation in parametric measurement error models. J. Amer. Statist. Assoc., 89 1314 1328. Hall, P. and Heyde, C. (198). Martingale Limit Theory and its Application. Academic Press. Sheather, S. J. and Jones, M. C. (1991). A reliable data-based bandwidth selection method for kernel density estimation. J. Royal Statist. Society Ser. B., 53 683 69. Stefanski, L. and Cook, J.R. (1995). Simulation-extrapolation: The measurement error jackknife. J. Amer. Statist. Assoc. 9 1247 1256.