I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)

Size: px
Start display at page:

Download "I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S (I S B A)"

Transcription

1 I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S I S B A UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 20/4 SEMIPARAMETRIC M-ESTIMATION WITH NON-SMOOTH CRITERION FUNCTIONS DELSOL, L. and I. VAN KEILEGOM

2 Semiparametric M-Estimation with Non-Smooth Criterion Functions Laurent Delsol Ingrid Van Keilegom December 8, 20 Abstract We are interested in the estimation of a parameter θ that maximizes a certain criterion function depending on an unknown, possibly infinite dimensional nuisance parameter h. A common estimation procedure consists in maximizing the corresponding empirical criterion, in which the nuisance parameter is replaced by a nonparametric estimator. In the literature, this research topic, commonly referred to as semiparametric M-estimation, has received a lot of attention in the case where the criterion function M satisfies certain smoothness properties. In certain applications, these smoothness conditions are howeveot satisfied. The aim of this paper is therefore to extend the existing theory on semiparametric M-estimation problems, in order to coveon-smooth M-estimation problems as well. In particular, we develop high-level conditions under which the proposed M-estimator is consistent and has an asymptotic limit. We also check these conditions in detail for a specific example of a semiparametric M-estimation problem, which comes from the area of classification with missing data, and which cannot be dealt with using the existing results in the literature. Key Words: Asymptotic distribution; Classification; Empirical processes; M-estimation; Missing data; Nonstandard asymptotics; Nuisance parameter; Semiparametric regression. MAPMO, Université d Orléans, Bâtiment de mathématiques, Rue de Chartres B.P. 6759, Orléans cedex 2, France, address: laurent.delsol@univ-orleans.fr Most of the research leading to this paper was carried out while L. Delsol was a postdoctoral fellow at the Université catholique de Louvain, financed by the European Research Council under the European Community s Seventh Framework Programme FP7/ / ERC Grant agreement No Institute of Statistics, Université catholique de Louvain, Voie du Roman Pays 20, B 348 Louvain-la- Neuve, Belgium. address: ingrid.vankeilegom@uclouvain.be. Research supported by IAP research network grant nr. P6/03 of the Belgian government Belgian Science Policy, by the European Research Council under the European Community s Seventh Framework Programme FP7/ / ERC Grant agreement No , and by the contract Projet d Actions de Recherche Concertées ARC /6-039 of the Communauté française de Belgique, granted by the Académie universitaire Louvain.

3 Introduction Consider the estimation of a parameter of interest θ 0 that maximizes a criterion function Mθ, h 0, where h 0 is the true value of an unknown, possibly infinite dimensional nuisance parameter h. A common estimation procedure consists in maximizing an empirical criterion function M n θ, ĥ, where M n is an estimator of the unknown function M and ĥ is a nonparametric estimator of the unknown nuisance parameter h 0. In the literature, this research topic, commonly referred to as semiparametric M-estimation, has received a lot of attention in the case where the criterion function M satisfies certain smoothness properties. In certain applications, these smoothness conditions are howeveot satisfied. The aim of this paper is therefore to extend the existing theory on semiparametric M-estimation problems, in order to coveon-smooth M-estimation problems as well. In particular, we develop high-level conditions under which the proposed M-estimator is consistent and has an asymptotic limit. We also check these conditions in detail for a specific example of a semiparametric M-estimation problem, which comes from the area of classification with missing data, and which cannot be dealt with using the existing results in the literature. In the literature it is often assumed that Mθ, h can be written as Mθ, h = E [ mz, θ, hz, θ ], where m is a known function and h is allowed to depend on θ and on a random vector Z taking values in some space F. A common estimation procedure consists then in maximizing the corresponding empirical criterion: M n θ, ĥ = n n mz i, θ, ĥz i, θ, 2 i= with respect to θ, where the random vectors Z,..., Z n have the same distribution as Z. We assume in the remainder of this paper that for all θ the functions m, θ, ĥ, θ and m, θ, h 0, θ are measurable. When the function mz, θ, hz, θ is differentiable with respect to θ and when Mθ, h 0 is concave in θ, then the M-estimation problem can be reduced to a Z-estimation problem, by solving the equation M n θ, ĥ/ θ = 0 or by minimizing the norm of M n θ, ĥ/ θ if a solution would not exist. A general result on semiparametric Z-estimators can be found in Chen, Linton and Van Keilegom In that paper high-level conditions are given under which the estimator of θ 0 is weakly consistent and asymptotically normal. The criterion function m/ θ is not required to be 2

4 smooth in θ nor in h. See also Van der Vaart and Wellner 2007 for high-level conditions for the stochastic equicontinuity in semiparametric Z-estimation problems. For specific examples of semiparametric Z-estimation problems we refer among others to Chen and Fan 2006, Linton, Sperlich and Van Keilegom 2008, Escanciano, Jacho-Chavez and Lewbel 200, 20, Mammen, Rothe and Schienle 20 and the references therein. On the other hand, when either mz, θ, hz, θ is not differentiable with respect to θ, or when Mθ, h 0 has more than one local maximum, then the M-estimation problem can not be reduced to a Z-estimation problem, and we need to use other procedures. In the parametric case where no infinite dimensional nuisance parameter is present, we refer to Kim and Pollard 990 for a general result on parametric M-estimators that have n /3 -rate of convergence, and to Van der Vaart and Wellner 996 for a result on both estimators that converge at n /2 -rate in the smooth case, and at a rate slower than n /2 foon-smooth functions. See also Groeneboom and Wellner 993, Groeneboom, Jongbloed and Wellner 200, Goldenshluger and Zeevi 2004, Mohammadi and Van de Geer 2005 and Radchenko 2008, among others for important contributions on results for specific parametric M-estimation problems with slower than n /2 -rate of convergence. The problem becomes more difficult when the model is semiparametric. Basically two main approaches can be considered in that case. In the first approach M n θ, h is maximized jointly with respect to θ and h, and then the criterion function is modified in order to obtain an estimator of θ 0 converging at n /2 -rate. The second approach, which we will follow, consists in maximizing M n θ, ĥ with respect to θ, where ĥ is a preliminary estimator of h 0. When the m-function is in some sense smooth usually differentiable or Lipschitz continuous in L p -norm several contributions on both approaches can be found in the literature. See e.g. Van der Vaart and Wellner 996, Van de Geer 2000, Ma and Kosorok 2005, Kosorok 2008, Kristensen and Salanié 200 and Ichimura and Lee 200. In these cases, the estimator of θ 0 is n /2 -consistent, even when the nuisance parameter is estimated at slower rate. This rate is obtained thanks to the regularity of the criterion function m. However, in numerous situations we are faced to semiparametric M-estimation problems, where the function m does not satisfy the smoothness property that makes the estimator of θ 0 n /2 -consistent. Examples can be found in classification problems with variables missing at random see Section 6, or in the selection of the most informative interval for an additional variable in regression. This context has, to the best of our knowledge, not been considered so far in the literature. It is substantially more difficult than 3

5 the smooth case. This can be understood e.g. from the fact that it leads to non-standard asymptotics and to estimators of θ 0 that are not n /2 -consistent and that converge to nonnormal limits. The results that we will obtain allow to show that in the case where m is not smooth e.g. when m includes an indicator function, we can obtain the same rate of convergence and sometimes even the same asymptotic distribution as in the case where the nuisance parameter would be known, even when the nuisance parameter is estimated at slower rate. The approach followed in this paper consists in developing high-level or primitive conditions, under which the estimator of θ 0 satisfies certain asymptotic properties. As explained before, non-smooth semiparametric M-estimation problems form an unsolved open problem in the literature. We aim at filling this gap in the literature by combining results foon-smooth parametric M-estimation problems with smooth semiparametric M-estimation problems. However, as will be seen later, the problem requires much more than simply combining ideas from these two domains. In fact, delicate mathematical derivations will be required to cope with estimators of the nuisance parameters inside non-smooth criterion functions. The paper is organized as follows. In the next section we introduce some notations and give the formal definition of the M-estimator. In Section 3 we show under which conditions the estimator of θ 0 is weakly consistent. Section 4 deals with the development of the rate of convergence of the estimator, whereas in Section 5 we state the asymptotic distribution of the estimator. In Section 6 a particular example of a non-smooth semiparametric M- estimation problem is considered, for which we check the conditions of the asymptotic results in detail. Finally, the Appendix contains the proofs of the asymptotic results. 2 Notations and definitions Throughout the paper we assume that the data Z,..., Z n are identically distributed random vectors. In many applications the vector Z i i =,..., n will consist of a random vector Y i representing a response and a random vector X i representing a vector of explanatory variables. The set Θ denotes a compact parameter set usually but not necessarily of finite dimension with non empty interior and H denotes an infinite dimensional parameter set. Suppose there exists a non-random measurable real-valued function M : Θ H IR, such that θ 0 = argmax θ Θ Mθ, h 0, θ, 4

6 and suppose θ 0 is unique and belongs to the interior of Θ. Let θ 0 and h 0 H be the true unknown finite and infinite dimensional parameters. We allow that the functions h H depend on the parameters θ and the vector Z, but footational convenience we will often suppress this dependence when no confusion is possible. For instance, we often use the following abbreviated notations : θ, h θ, h, θ, θ, h 0 θ, h 0, θ, and θ 0, h 0 θ 0, h 0, θ 0. The sets Θ and H are supposed to be metric spaces. Their metrics are denoted by d and d H respectively. Since the nuisance parameter is allowed to depend on θ we implicitly define d H h, h 0 uniformly over θ, i.e. d H h, h 0 := sup θ Θ d H h., θ, h 0., θ for some metric d H. Suppose there exists a random real-valued function M n : Θ H IR depending on the data Z,..., Z n, such that M n θ, h 0 is an approximation of Mθ, h 0 the precise conditions on M n will be given in the next sections. In many applications we have that Mθ, h = E[mZ, θ, h] and M n θ, h = n n i= mz i, θ, h, where m is a measurable real-valued function such that θ 0 = argmax θ Θ E[mZ, θ, h 0 ]. However, the conditions on M n do not impose this particular structure and allow for more general situations as well. Suppose that for each θ there is an initial nonparametric estimator ĥ, θ for h 0, θ. This nonparametric estimator depends on the particular model, and can be based on e.g. kernels, splines oeural networks. Again footational ease we let θ, ĥ θ, ĥ, θ. We estimate θ 0 by any θ Θ that approximately solves the sample maximization problem: max M nθ, ĥ. 3 θ Θ In the set of conditions given in the next sections we will formalize what we mean with approximate solution. 3 Consistency We focus in this section on the development of sufficient conditions under which the estimator θ is weakly consistent. This consistency will be used as a preliminary step for the subsequent sections, where we will deal with the rate of convergence and the asymptotic distribution of the estimator. In the remainder of the paper the notations P and E will be used to denote outer probabilities and outer expectations, to take into account potential measurability issues see e.g. Van der Vaart and Wellner 996 for a deeper discussion. 5

7 Consider the following assumptions: A θ Θ and M n θ, ĥ M nθ 0, ĥ + o P. A2 For all ɛ > 0 there exists a δɛ > 0 such that dθ, θ 0 > ɛ implies Mθ 0, h 0 Mθ, h 0 > δɛ. A3 Pĥ H as n and d Hĥ, h 0 P 0. A4 sup θ Θ, h H M n θ, h M n θ 0, h Mθ, h + Mθ 0, h + M n θ, h M n θ 0, h + Mθ, h Mθ 0, h = o P. A5 lim dh h,h 0 0 sup θ Θ Mθ, h Mθ, h 0 = 0. Below we illustrate and interpret these conditions, and we give some remarks, extensions, sufficient conditions, etc., that are useful for verifying these conditions in practice. Remark i In assumption A4, we take the supremum with respect to h over the whole family H. However, it is enough to assume the same type of convergence only for h = ĥ or for {h : d H h, h 0 } δ n for some δ n 0 such that d H ĥ, h 0δn Assumption A5 could be changed in the same way. = o P. ii Assumption A4 is closely related to the compactness of the sets Θ and H and is automatically fulfilled when the following standard assumption holds: sup M n θ, h Mθ, h = o P. θ Θ, h H This last condition holds when the family F = {m., θ, h, θ Θ, h H} is Glivenko-Cantelli see for instance Van der Vaart and Wellner 996 and Mθ, h = E[mZ, θ, h]. iii We do not require here that M is the mean of the random function mz,.,.. We only require that assumption A4 holds. smoothness assumptions on M n. assumptions A2 and A5. Moreover, we do not impose any We only require that the function M satisfies 6

8 iv In this section we do not assume that θ belongs to an Euclidean space. It is possible that θ and h belong to general metric spaces. Theoretically, it is also possible to consider semimetric spaces, however, this only allows to get the consistency without rate of convergence with respect to the corresponding semimetrics. v The assumption that the variables Z i are independent is not necessary here. Our result could be used even for dependent data as soon as it is possible to fulfill assumptions A, A3 and A4. vi Assumption A is trivially fulfilled when M n θ, ĥ sup θ Θ M n θ, ĥ + o P, which allows to deal with approximations of the value that actually maximizes θ M n θ, ĥ. Theorem Under assumptions A-A5 we have that The proof is given in the Appendix. d θ, θ 0 P 0. 4 Rate of convergence In the previous section we have shown the consistency of general M-estimators. We are now interested in going one step further and give their convergence rates. In this section, the consistency of our estimators is used as a preliminary assumption. Of course, Theorem can be used to obtain this consistency. We introduce the following assumptions: B d θ, θ 0 P 0 and v n d H ĥ, h 0 = O P for some sequence v n. B2 For all δ > 0, there exist α < 2, K > 0, δ 0 > 0 and n 0 N such that for all n n 0 there exists a function Φ n for which δ Φ n δ/δ α is decreasing on 0, δ 0 ] and for all δ δ 0, E sup dθ,θ 0 δ,d H h,h 0 δ vn M n θ, h M n θ 0, h Mθ, h + Mθ 0, h K Φ nδ. n B3 There exist a constant C > 0, a sequence, and variables W n = O P r n and β n = o P, such that for all θ Θ satisfying dθ, θ 0 δ 0 : Mθ, ĥ Mθ 0, ĥ W ndθ, θ 0 Cdθ, θ β n dθ, θ

9 B4 M n θ, ĥ M nθ 0, ĥ + O P r 2 n and rnφ 2 n rn n. Under the above assumptions, we will prove that the estimator θ is rn -consistent. Hence, the sequence plays an important role in the above assumptions and should be chosen in the sharpest possible way. Before stating and proving this result, we first discuss the above assumptions in more detail. Remark 2 i Assumption B is a high-level assumption. Many asymptotic results allow to get such conditions on both the M-estimator θ and the nuisance estimator ĥ. In general the rate of convergence of the nuisance estimator is slower than the best convergence rate of the M-estimator. We are interested in studying cases where the convergence rate of the M-estimator is not affected by the fact that we need to estimate the nuisance parameter. ii Assumption B2 is also a high-level assumption. Assume that for any z the function θ, h mz, θ, hz, θ mz, θ 0, hz, θ 0 is uniformly bounded on an open neighborhood of θ 0, h 0, i.e. on {θ, h : dθ, θ 0 δ 0, d H h, h 0 δ } for some δ 0, δ > 0. Let us consider the class F δ,δ = {m., θ, h, θ m., θ 0, h, θ 0 : dθ, θ 0 δ, d H h, h 0 δ } for any δ δ 0 and denote its envelope by M δ,δ. For any δ, we have δ vn F δ,δ, as for instance where N [ ] sup δ δ 0 0 δ fo large enough. Then, under entropy conditions on + log N [ ] ɛ M δ,δ L2 P, F δ,δ, L 2 Pdɛ < + 4 denotes the bracketing number see e.g. Van der Vaart and Wellner 996 for the definition, there exists a positive constant K not depending on δ such that for all δ δ 0, [ ] E [M 2 E δ,δ ] sup M n θ, h M n θ 0, h Mθ, h + Mθ 0, h K dθ,θ 0 δ,d H h,h 0 δ n see Theorems 2.4. and in Van der Vaart and Wellner 996. Hence, in this case, the last part of B2 holds if Φ n δ can be chosen such that K 0, δ δ 0 : E [M 2 δ,δ ] K 0 Φ n δ. 5 8

10 The function Φ n δ is closely related to the smoothness of the functions θ mz, θ, hz, θ. When these functions are Lipschitz respectively Hölder of order γ uniformly over z and h, it is possible to take Φ n δ = δ respectively Φ n δ = δ γ. In other situations, for instance when m contains an indicator function involving θ, such regularity assumptions may fail but it is possible to state B2 with Φ n δ = δ see Section 6. iii The way Φ n δ decreases when δ tends to zero has a crucial impact on the convergence rate through the condition rnφ 2 n rn n. When Φ n δ = δ, this last condition is equivalent to n and we may obtain n convergence rates. However, when Φ n δ = δ γ with γ <, this condition is equivalent to rn 2 γ n and hence only n 22 γ rates may be considered. In the case of non continuous criterion functions involving e.g. indicator functions a n 3 convergence rate may be obtained, analogously to the case of parametric M-estimation, see e.g. Kim and Pollard 990 and Van der Vaart and Wellner 996. iv Assumption B4 is automatically fulfilled under the following classical assumption: M n θ, ĥ sup M n θ, ĥ + O P r 2 n. θ Θ As in the previous section, this allows to consider approximations of the value that actually maximizes the empirical criterion. v Assumption B3 is automatically fulfilled when the following conditions hold: a Θ R k for some k, and dθ, θ 2 = θ θ 2, where is the Euclidean norm. b There exists δ 2 > 0 such that for any h satisfying d H h, h 0 δ 2, the function θ Mθ, h is twice continuously differentiable on an open neighborhood of θ 0. Hereafter, Γθ 0, h and Λθ 0, h denote respectively its gradient and Hessian matrix for θ = θ 0. Moreover, lim θ θ 0 0 sup θ θ 0 2 Mθ, h Mθ0, h Γθ 0, hθ θ 0 d Hh,h0 δ 2 2 θ θ 0 T Λθ 0, hθ θ 0 = 0. c Γθ 0, ĥ = O P r n and Γθ 0, h 0 = 0. d Λθ 0, h 0 is negative definite, and h Λθ 0, h is continuous in h 0 lim dh h,h 0 0 sup u R k, u = Λθ 0, h Λθ 0, h 0 u = 0. i.e. 9

11 Now denote the greatest eigenvalue of Λθ 0, h 0 by λ m. When d H ĥ, h 0 δ 2, assumptions a-d above imply that Mθ, ĥ Mθ 0, ĥ = Γθ 0, ĥ, γ θ + 2 γ θ T Λθ 0, h 0 γ θ + γ θ 2 o P + o γ θ 2 Γθ 0, ĥ γ θ + λ m 2 γ θ 2 + γ θ 2 o P + o γ θ 2, where γ θ = θ θ 0 and where the notation o γ θ 2 means lim γθ 0 o γ θ 2 γ θ = 0. By 2 taking δ 0 such that θ θ 0 δ 0, the last term above is bounded by λm θ θ 4 0 2, and hence B3 holds with W n = Γθ 0, ĥ and C = λm 4. vi Finally, it is possible to modify slightly the proof of the following theorem by considering the following extensions of assumptions B3 and B4: B3 There exist η 0 > 0, and two positive and non-decreasing functions Ψ and Ψ 2 on 0, η 0 ] such that for all θ satisfying dθ, θ 0 η 0 : Mθ, ĥ Mθ 0, ĥ W nψ dθ, θ 0 + o P Ψ 2 dθ, θ 0. Moreover, there exist β 2 > α, β < β 2, δ 0 > 0 such that δ Ψ δδ β non-increasing and δ Ψ 2 δδ β 2 Ψ rn W n = O P Ψ 2 rn for some sequence. B4 M n θ, ĥ M nθ 0, ĥ + O pψ 2 r n is non-decreasing on 0, δ 0 ], and such that and Φ n rn nψ 2 rn. It is possible to consider the case where β = β 2 if we assume that Ψ r n W n = o p Ψ 2 r n. We are now ready to state the rate of convergence of the estimator θ. The proof of this result can be found in the Appendix. is Theorem 2 Under assumptions B-B4 we have that d θ, θ 0 = O P. 5 Asymptotic distribution In the previous section, we have shown that d θ, θ 0 = O P. Our aim is now to study the asymptotic distribution of θ θ 0. We will assume throughout this section 0

12 that Θ is equipped with the Euclidean norm. We start with introducing a number of notations. For any θ Θ and h H, let B n θ, h = M n θ, h M n θ 0, h and Bθ, h = Mθ, h Mθ 0, h, and define for any δ > 0. Also, let M δ. sup m., θ, h 0 m., θ 0, h 0 θ θ 0 δ M δ = {m., θ, h 0 m., θ 0, h 0 : θ θ 0 δ}. Finally, for any p N, for any f : Θ IR and for any γ = γ,..., γ p Θ p denote f γ = fγ,..., fγ p T. We introduce the following assumptions: C θ θ 0 = O P and v n d H ĥ, h 0 = O P for some sequences and v n. C2 θ 0 belongs to the interior of Θ and Θ E,, where E is a finite dimensional Euclidean space i.e. E = R k for some k. C3 For all δ 2, δ 3 > 0, sup θ θ 0 δ 2 rn, d Hh,h 0 δ 3 vn C4 For all K, η > 0, r 2 n B n θ, h Bθ, h B n θ, h 0 + Bθ, h 0 + B n θ, h + B n θ, h 0 + Bθ, h + Bθ, h 0 = o P. rn 4 [ ] [ ] n E M 2 K = O and r4 n rn n E M 2 K {r 2 n M Krn >ηn} = o. rn C5 For all K > 0 and for any η n 0, r 4 [ n sup γ γ 2 <η n, γ γ 2 K n E m Z, θ 0 + γ, h 0 m Z, θ 0 + γ ] 2 2, h 0 = o. C6 For all z F, the function θ mz, θ, h 0 z, θ and almost all paths of the process θ mz, θ, ĥθ, z are uniformly over θ bounded on compact sets. C7 There exist β n = o P, a random and linear function W n : E IR, and a deterministic and bilinear function V : E E IR such that for all θ Θ, Bθ, ĥ = W nγ θ + V γ θ, γ θ + β n γ θ 2 + o γ θ 2

13 and Bθ, h 0 = V γ θ, γ θ + o γ θ 2, where γ θ = θ θ 0 and the notation o γ θ 2 means lim γθ 0 o γ θ 2 γ θ 2 = 0. Moreover, for any compact set K in E, τ, δ > 0, sup γ E, δ δ, γ δ W nγ V γ, γ V γ = O δ τ P, γ and sup <. γ,γ K, δ δ, δ τ γ γ δ C8 For all K > 0, there exists n 0 N such that for all n n 0, M n θ, ĥ sup θ θ 0 K rn M n θ, ĥ + o P r 2 n. C9 There exists a deterministic continuous function Λ and a zero-mean Gaussian process G defined on E such that for all p N and for all γ = γ,..., γ p E p, W nγ + r 2 nb n θ 0 +., h 0 γ L Λ γ + G γ. Moreover, Gγ = Gγ a.s. implies that γ = γ, and P limsup γ + Λ γ + G γ < sup γ E Λ γ + G γ =. C0 There exists a δ 0 > 0 such that 0 sup δ δ 0 log N [ ] ɛ M δ P,2, M δ, L 2 P dɛ < +. We will show below that θ θ 0 converges to the unique maximizer of the process γ Λγ + Gγ, where Λ and G are defined in C9. However, let us first discuss the above assumptions. Remark 3 i The first part of assumption C can be obtained from Theorem 2. If in addition we assume that assumption B2 holds with Φ n Φ not depending on n and continuous, and if we take + such that rnφr 2 n = n, then assumptions C4 and C5 are implied by the following ones: there exists a δ 4 > 0 such that for all δ δ 4, E M 2 δ KΦ2 δ for some K > 0, E [Mδ 2 lim {M δ >ηδ 2 Φ 2 δ}] δ 0 Φ 2 δ 2 = 0

14 for all η > 0, and lim lim sup ɛ 0 δ 0 γ γ 2 ɛ, γ γ 2 K E[mZ, θ 0 + γ δ, h 0 mz, θ 0 + γ 2 δ, h 0 ] 2 Φ 2 δ = 0 for all K > 0, using the same arguments as in the proof on Theorem in Van der Vaart and Wellner 996. ii Note that assumptions C4-C6 and C0 are the same as in the parametric case where h 0 is known see Theorem in Van der Vaart and Wellner 996. iii Assumption C6 ensures that for any compact K E the processes γ r 2 nb n θ 0 + γ, ĥ and γ r2 nb n θ 0 + γ, h 0 + W n γ take values in l K assumption C7 is also used for the second process. This assumption is not very restrictive. Moreover, because we deal with asymptotic results, we actually only require the latter properties fo > n K where n K only depends on K. It follows directly from C and C2 that there exists n K such that θ 0 + K Θ fo > n K. Hence it is only necessary to state C6 on the compact set Θ. iv Assumption C3 is automatically fulfilled under the following slightly more restrictive but common assumption: for all δ 2, δ 3 > 0, sup θ θ 0 δ 2 rn,d Hh,h 0 δ 3 vn B n θ, h Bθ, h B n θ, h 0 + Bθ, h 0 = o P r 2 n. The latter condition holds whenever the following one is fulfilled: there exists a function f and a constant δ 0 > 0 such that for all δ 2, δ 3 < δ 0, and E [ sup θ θ 0 δ 2 rn,d Hh,h 0 δ 3 vn rnf 2 δ2, δ 3 = o n, v n ] B n θ, h Bθ, h B n θ, h 0 + Bθ, h 0 f δ 2, δ 3 v n. n This last bound may be obtained using arguments that are similar to those discussed in Remark 2ii. v Now assume that assumptions a-d from Remark 2v hold. Following the same ideas as in this remark it is easy to show that C7 is fulfilled with E = R k, W n γ = Γθ 0, ĥ, γ and V γ, γ = 2 γt Λθ 0, h 0 γ whenever sup u R k, u = Λθ 0, h 0 u < 3

15 +. Moreover, in that case, if ĥ is computed from a dataset independent of Z,..., Z n, it is sufficient for C9 to assume the weak convergence of each term W nγ and r 2 nb n θ 0 +., h 0 γ separately. The convergence of the second term can be obtained as in the parametric case see Theorem in Van der Vaart and Wellner 996. Note also that if Γθ 0, ĥ W in distribution, the marginals of the process γ Γθ 0, ĥ, γ tend in distribution to the marginals of γ W, γ. Furthermore, if = n, it is common to assume that Γθ 0, ĥ = n n i= U i,n + o P n /2, where the variables U i,n are independent and centered. The convergence then follows from Lindeberg s condition. vi Assumption C8 allows to consider estimators θ that are approximations of the value that actually maximizes the map θ M n θ, ĥ. vii Let K be an arbitrary compact subset in E. Assumption C9 is used to derive the weak convergence in the l K sense of the process γ W n γ + r 2 nb n θ 0 + γrn, h 0 from the fact that it is asymptotically tight. If sup γ K,γ 0 W n γ γ = o P, we are in the same situation as in the parametric case and we obtain the convergence of the marginals whenever r 4 {[ n lim n n E m Z, θ 0 + γ, h 0 m Z, θ 0 + γ ] 2 } 2, h 0 = E[Gγ Gγ 2 2 ] for all γ, γ 2, by noting that the remaining term is a sum of an array of random variables that fulfill Lindeberg s condition see Van der Vaart and Wellner 996 p The last assumption on the process γ Λ γ + G γ is used to ensure almost all sample paths have a supremum which is only related to their behaviour on compact sets. The dominant term of the deterministic part Γ is usually a negative definite quadratic form and hence exponential inequalities could lead to such result. viii Finally, assumption C0 is exactly the one used in the parametric case in Van der Vaart and Wellner 996, where weaker conditions and alternatives based on covering numbers are also discussed see Theorems 2..22, and This assumption is used to show that γ rnb 2 n θ 0 + γrn, h 0 is asymptotically tight. We are now ready to state the main result of the paper about the asymptotic distribution of θ θ 0. As before, we refer to the Appendix for the proof. Theorem 3 If assumptions C-C0 hold, then for all K > 0 the process γ r 2 nb n θ 0 + γ, ĥ converges weakly to γ Λγ + Gγ in l K with K = {γ E : 4

16 γ K}. Moreover, for any such K almost all paths of the limiting process have a unique maximizer γ 0 on K. Assume now that γ 0 is measurable. Then, the random sequence θ θ 0 converges in distribution to γ 0. 6 Example : Classification with missing data In this section we illustrate the theory, and in particular the verification of the assumptions, by means of an example coming from the area of classification with missing data. Consider i.i.d. data X i = X i, X i2 i =,..., n having the same distribution as X = X, X 2. We suppose that these data come in reality from two underlying populations. Let Y i be j if observation i belongs to population j j = 0,, and let Y be the population indicator for the vector X. Based on these data, we wish to establish a classification rule foew observations, for which it will be unknown to which population they belong. The classification consists in regressing X 2 on X via a parametric regression function f θ, and choosing θ by minimizing the criterion P Y =, X 2 f θ X + P Y = 0, X 2 < f θ X. 6 Let θ 0 be the value of θ that maximizes 6 with respect to all θ Θ, where Θ is a compact subset of IR k, whose interior contains θ 0. We suppose now that some of the Y i s are missing. Let i respectively be if Y i respectively Y is observed, and 0 otherwise. Hence our data consist of i.i.d. vectors Z i = X i, Y i i, i i =,..., n. We assume that the missing at random mechanism holds true, in the sense that Note that 6 equals E [ I = p 0 X Hence, it is natural to define mz, θ, p = P = X, X 2, Y = P = X := p 0 X. { }] IY =, X 2 f θ X + IY = 0, X 2 < f θ X. I = px { } IY =, X 2 f θ X + IY = 0, X 2 < f θ X, 7 where the nuisance function p belongs to a space P to be defined later, and where Z = X, Y,. Also, let n Mθ, p = E[mZ, θ, p] and M n θ, p = n mz i, θ, p. 5 i=

17 Finally, define the estimator θ of θ 0 by θ = argmax θ Θ M n θ, p, where for any x, px = n i= k h x X i n j= k hx X j I i =, where k is a density function with support [, ], k h u = ku/h/h and h = h n is an appropriate bandwidth sequence. We will now check the conditions of Theorems, 2 and 3. Suppose dθ, θ 0 is the Euclidean distance. Let P be the space of functions p : R X IR that are continuously differentiable, and for which sup x R X px M <, sup x R X p x M and inf x R X px > η/2, where η = inf x R X p 0 x is supposed to be strictly positive, and where R X is the support of X, which is supposed to be a compact subspace of IR. We equip the space P with the supremum norm : d P p, p 2 = sup x R X p x p 2 x for any p, p 2. First of all, A is verified by construction of the estimator θ. Condition A2 is an identifiability condition, needed to ensure that θ 0 is unique, whereas A3 holds true provided the functions p 0 and k are continuously differentiable. Next, for A4 is suffices by Remark ii to show that the class F = {m, θ, p : θ Θ, p P} is Glivenko-Cantelli. For this we will show that for all ɛ > 0, the bracketing number N [ ] ɛ, F, L P is finite see Van der Vaart and Wellner 996, Theorem First, note that it follows from Corollary in Van der Vaart and Wellner 996 that N [ ] ɛ, P, L P expkɛ. In a similar way we can show that N [ ] ɛ, {f θ : θ Θ}, L P expkɛ, provided x f θ x is continuously differentiable and the derivatives are uniformly bounded over θ. From there it can be easily shown that the class T = {x, x 2 Ix 2 f θ x : θ Θ} satisfies N [ ] ɛ, T, L P expkɛ, provided sup x,x 2 f X2 X x 2 x <. By combining the brackets for P and T we get that N [ ] ɛ, F, L P expkɛ < for some K <. Finally, condition A5 is straightforward, and hence the weak consistency of θ follows. Next, we verify the B-conditions. Condition B holds with v n = K[nh /2 log n /2 + h]. For B2, it suffices by Remark 2ii to show that 4 and 5 hold true. Equation 5 holds true for Φ n δ = δ /2. Indeed the envelope M δ,δ of the class F δ,δ can be taken 6

18 equal to footational simplicity we suppose throughout that θ is one-dimensional M δ,δ Z = 2 η If θ 0 X Aδ X 2 f θ0 X + Aδ, where A = sup θ,x θ f θx, which we suppose to exist and to be finite. Hence, 5 is easily seen to hold provided X is absolutely continuous and sup x f X x <. For 4 note that M δδ 2 L 2 P = 4 ] [F η E 2 X2 X f θ0 X + Aδ X F X2 X f θ0 X Aδ X 8Aδ η 2 inf f X2 X x 2 x =: K 2 δ, which we suppose to be strictly positive, where inf is the infimum over all x, x 2 such that x 2 f θ0 x Aδ. It follows that N [ ] ɛ M δδ L2 P, F δδ, L 2 P is bounded above by N [ ] K ɛδ /2, F δδ, L 2 P. We will first construct brackets for the set G := {f θ : θ Θ, θ θ 0 δ}. Note that f θ can be written as f θ = [ δ f θ f θ0 ]δ +f θ0. If we assume that x θ f θx is twice continuously differentiable in x for all θ, it follows from Corollary in Van der Vaart and Wellner 996 that r ɛ := N [ ] ɛ 2, D, L 2 P expkε with D = { δ f θ f θ0 : θ Θ, θ θ 0 δ}. Let d L d U,..., d L r ɛ d U r ɛ be the ɛ 2 -brackets for D. It then easily follows that N [ ] ɛ 2 δ, G, L 2 P = r ɛ, and that the ɛ 2 δ-brackets for G are given by gj L := d L j δ +f θ0 d U j δ +f θ0 =: gj U. Moreover, s ɛ := N [ ] ɛ, P, L P expkɛ. Let h L h U,..., h L s ɛ h U s ɛ be the ɛ-brackets for P defined in such a way that for k s ɛ, inf t RX h L k η. We now claim that N [ ] ɛ M δδ L2 P, F δδ, L 2 P r ɛ s ɛ expkɛ. 8 Indeed, define for j r ɛ and k s ɛ, f L jkz = I = { [ ] h U k X IY = IX 2 gj U X IX 2 f θ0 X [ ]} +IY = 0 IX 2 < gj L X IX 2 < f θ0 X, 7

19 and define in a similar way the upper bracket fjk U Z. Then, fjkz U fjkz L 2 2 [ 4E h U k X ] 2 [ P h L k X X2 gj U X X P X 2 f θ0 X X ] + P X 2 gj L X X P X 2 f θ0 X X + 4 P η E X 2 2 gj U X X P X 2 gj L X X 6δ η 2 sup h U k x h L k x 2 sup x + 4 η 2 sup x,x 2 f X2 X x 2 x E Cɛ 2 δ, f X2 X x 2 x E x,x 2 [ ] gj U X gj L X [ ] d U j X + d L j X for some 0 < C <. Moreover, for each function in the class F δδ there exists a bracket [fjk L, f jk U ] to which it belongs. This shows 8. It now follows that sup δ δ 0 which shows 4 and hence B2. with and 0 + log N [ ] ɛ M δδ L2 P, F δδ, L 2 P dɛ <, For B3 we check conditions b-d of Remark 2v. It is easily seen that b holds Γθ 0, p = E Λθ 0, p = E [ p0 X { } 2P Y = X, X 2 f X2 X px f θ0 X ] θ f θ 0 X, [ p0 X { }{ 2 2P Y = X, X 2 f X px 2 X f θ0 X θ f θ 0 X }] +f X2 X f θ0 X 2 θ f 2 θ 0 X, and provided the derivatives in Λθ 0, p all exist. Next, by assuming that Γθ 0, p 0 = 0 and that Λθ 0, p 0 is negative, and by noting that Γθ 0, p = O P rn if satisfies [n /2 + h + nh log n] = O, it follows that c and d are also valid. It remains to check condition B4, which easily holds provided = On /3. The two conditions on and the fact that should be chosen as large as possible, are reconcilable provided nh 3 = O and nh 3/2 log n 3/2 = O. Note that it is possible to weaken the first condition to nh 6 = O if we assume that p 0 is twice continuously differentiable. Note however that the rate v n of p would then be Onh /2 log n /2 + h 2, which is faster 8

20 than the rate rn = Kn /3 of θ provided nh 3. Hence, the latter case is of lower level of complexity than the case where p 0 is only once differentiable, and we therefore do not consider it further. To conclude, we have that θ θ 0 = O P n /3. Finally, we check the conditions needed for establishing the asymptotic distribution of θ. Condition C follows from Theorem 2 and condition B, whereas C2 is immediately satisfied. For C3 a similar proof as for condition B2 can be given, which we omit for reasons of brevity. For C4 and C5, first note that the function Φ n δ = Kδ /2 in condition B2 is independent of n and continuous. Hence, C4 and C5 hold provided the three conditions stated in Remark 3i are verified. For the first one, we have that M δ satisfies M δ Z 2 f η I θ0 X Aδ X 2 f θ0 X + Aδ. Hence, E Mδ 2 Kδ for some K <. In a similar way the second and third condition can be proved, from which C4 and C5 follow. Next, C6 is obviously satisfied since for fixed p and z, our function mz,, p consists of indicator functions. Next, following Remarks 2v and 3v, condition C7 follows provided Λθ 0, p 0 <. By construction of the estimator θ, condition C8 holds true. For C9, first note that W n γ = Γθ 0, pγ = o P provided nh 3 = o and nh 3/2 log n 3/2 = o, using what has been shown already for B3. Next, rnb 2 n θ 0 + γ, p 0 9 [ = rn 2 M n θ 0 + γ, p 0 M n θ 0, p 0 Mθ 0 + γ ], p 0 + Mθ 0, p Λθ 0, p 0 γ 2 + o, since Γθ 0, p 0 = 0. Hence, Λγ = 2 Λθ 0, p 0 γ 2. The first terms on the right hand side of 9 are exactly the same as in the parametric case. Hence we can follow the same steps and get the convergence of the marginals using Lindeberg condition and Remark 3 vii under some regularity assumptions on f X2 X and θ f θ.finally, condition C0 can be proved in a similar way as B2. The asymptotic distribution of θ θ 0 now follows from Theorem 3. Appendix: Proofs In this Appendix we give the proofs of the asymptotic results, namely we prove the consistency, the rate of convergence and the asymptotic distribution of our M-estimator θ. 9

21 Proof of Theorem. Our aim is to show that Mθ 0, h 0 M θ, h 0 = o P. Indeed, the result we want to obtain is a direct consequence of and assumption A2. It is easy to show that assumptions A3 and A4 imply that M n θ, ĥ M nθ 0, ĥ M θ, ĥ + Mθ 0, ĥ + M n θ, ĥ M nθ 0, ĥ + M θ, ĥ Mθ 0, ĥ = o P, 2 since θ belongs by construction to Θ. Consider the following decomposition: Mθ 0, h 0 M θ, h 0 = M θ, ĥ M θ, h 0 + Mθ 0, h 0 Mθ 0, ĥ + Mθ 0, ĥ M θ, ĥ M n θ 0, ĥ M n θ, ĥ + 2 sup Mθ, h 0 Mθ, ĥ θ Θ + M n θ, ĥ M θ, ĥ M nθ 0, ĥ + Mθ 0, ĥ. This, together with 2 leads to the following inequality: Mθ 0, h 0 M θ, h 0 + o P M n θ 0, ĥ M n θ, ĥ + o P + 4 sup Mθ, h 0 Mθ, ĥ + o P. θ Θ Now, the quantity + o P on the left hand side in the above inequality is positive on a set A n whose outer probability tends to one when n tends to infinity. On A n, a reformulation of the previous inequality gives: Mθ 0, h 0 M θ, h 0 3 M n θ 0, ĥ M n θ, ĥ + o P Assumptions A3 and A5 imply that and assumption A gives that It now follows directly from 3-5 that + 4 sup Mθ, h 0 Mθ, ĥ + o P + o P. θ Θ sup Mθ, h 0 Mθ, ĥ = o P, 4 θ Θ M n θ 0, ĥ M n θ, ĥ o P. 5 0 Mθ 0, h 0 M θ, h 0 o P. 20

22 Proof of Theorem 2. We borrow ideas given in Van der Vaart and Wellner 996 for the case of parametric M-estimators. Let ξ n be the O P rn 2 -quantity involved in assumption B4. We introduce the sets } S j,n = {θ : 2 j < dθ, θ 0 2 j, and observe that Θ\{θ 0 } = + j= S j,n. Our aim is to prove that for any ɛ > 0 there exists τ ɛ > 0 such that P d θ, θ 0 > τ ɛ < ɛ 6 fo sufficiently large. From now on we work with an arbitrary fixed positive value of ɛ. For any δ, δ, M, K, K > 0, we obtain the following bound using assumption B4: P d θ, θ 0 > 2 M P sup [M n θ, ĥ M nθ 0, ĥ] Kr 2 n, A n j M, 2 j θ S δr j,n n +P 2d θ, θ 0 δ + P rn ξ 2 n > K + P r n W n > K +P β n > C + P d H 2 ĥ, h 0 > δ, 7 v n where A n = { W n K, β n C, dhĥ, h 2 0 δ v n }. Indeed, we can write P d θ, θ 0 > 2 M, 2d θ, θ 0 < δ, rn ξ 2 n K, A n θ Sj,n, rn ξ 2 n K, A n j M,2 j δ P j M,2 j δ P j M,2 j δ P sup [M n θ, ĥ M nθ 0, ĥ] ξ n, rn ξ 2 n K, A n θ S j,n sup [M n θ, ĥ M nθ 0, ĥ] Kr2 n, A n. θ S j,n Assumption B implies that for all δ > 0 there exists n ɛ such that P 2d θ, θ 0 δ < ɛ 6 8 fo larger than n ɛ. Then, by definition of ξ n and W n and because of B, there exist three positive constants δ, K ɛ and K ɛ such that P rn ξ 2 n > K ɛ < ɛ r 6, P n W n > K ɛ < ɛ 6, P β n > C < ɛ 2 6, and P d H ĥ, h 0 > δ < ɛ v n 6 2 9

23 fo larger than some n N. We fix δ < δ 0 and suppose that n maxn 0, n, n ɛ to get that assumptions B2 and B3 are fulfilled on all S j,n such that 2 j δ. Now, it follows directly from assumption B3 that for each fixed j such that 2 j δ one has for all θ S n,j : M n θ, ĥ M nθ 0, ĥ Mθ, ĥ Mθ 0, ĥ + sup dθ,θ 0 2j rn M n θ, ĥ M nθ 0, ĥ Mθ, ĥ + Mθ 0, ĥ W n 2j C β n 22j 2 + sup M rn 2 n θ, ĥ M nθ 0, ĥ Mθ, ĥ + Mθ 0, ĥ. dθ,θ 0 2j rn Consequently, we obtain the following inequality: P sup [M n θ, ĥ M nθ 0, ĥ] K ɛrn 2, A n θ S j,n P sup M n θ, h M n θ 0, h Mθ, h + Mθ 0, h dθ,θ 0 2j rn, d Hĥ,h 0 δ vn 22j 2 C rn 2 2 K ɛ2 2 j K ɛ 2 2 2j. Now, there exists M ɛ such that for all j M ɛ one gets C 2 K ɛ2 2 j K ɛ 2 2 2j C 4. Consequently, if M M ɛ, using assumption B2 and Chebychev s inequality we have that j M, 2 j δ P j M, 2 j δ P 4Kr2 n C n 4Kr2 n C n 6K C { } sup [M n θ, ĥ M nθ 0, ĥ] K ɛrn 2 A n θ S j,n sup M n θ, ĥ M nθ 0, ĥ Mθ, ĥ + Mθ 0, j M, 2 j δ j M, 2 j δ 2 jα 2. j M dθ,θ 0 2j rn, d Hĥ,h 0 δ vn Φ n 2j 2 2j 2 2 jα Φ n 2 2j 2 Finally, since α < 2, the series j M 2jα 2 converges and hence there exists M ɛ M ɛ such that 6K C 2 jα 2 ɛ 6. j M ɛ 22 C22j 2 ĥ 4rn 2

24 This finishes the proof showing 6 with τ ɛ = 2 M ɛ. Proof of Theorem 3. The proof is somewhat similar to the proof of the asymptotic distribution of parametric M-estimators Theorem in Van der Vaart and Wellner 996. The main difficulty lies in the presence of an estimated nuisance parameter. The first step consists in showing the weak convergence of the process γ r 2 nb n θ 0 + γ, ĥ. This is shown in Lemma 4 given below. The remainder of the proof is based on the same arguments as those used to state the Argmax theorem in Van der Vaart and Wellner 996. First note that E is a σ-compact metric space since E = i=k i with K i = {γ E : γ a i } for any positive sequence a i i N tending to infinity. Then deduce from assumption C9 together with Lemmas 5 and 6 given below, that almost all paths of the limiting process γ Λγ + Gγ attain their supremum at an unique point γ 0, following similar ideas to what is done in the parametric case see Theorem in Van der Vaart and Wellner 996. Assume now that γ 0 is measurable. The weak convergence of θ θ 0 to γ 0 is equivalent to the next statement Portmanteau s theorem : limsup n P r n θ θ 0 C P γ 0 C, for every closed set C. Let C be an arbitrary closed subset of E and fix ɛ > 0. The random variable γ 0 is tight because it takes values in E, which is σ-compact. Combining this tightness and the first part of C, it is possible to find K ɛ > 0 and hence a compact set K ɛ := {γ : γ K ɛ } such that P γ 0 / K ɛ ɛ 2, and P θ θ 0 / K ɛ ɛ 2. 0 It follows easily from 0 that limsup n P r n θ θ 0 C P θ θ 0 C K ɛ, γ 0 K ɛ + limsup n P {r n θ θ 0 / K ɛ } {γ 0 / K ɛ } P θ θ 0 C K ɛ, γ 0 K ɛ + ɛ. Now using Lemma 4 and assumption C8 we obtain limsup n P θ θ 0 C K ɛ, γ 0 K ɛ limsup n P sup rnb 2 n θ 0 + γ, γ C K ɛ r ĥ sup rnb 2 n θ 0 + γ, n γ K ɛ r ĥ + o P, γ 0 K ɛ n P sup Λ + Gγ supλ + Gγ, γ 0 K ɛ, 2 γ C K ɛ γ K ɛ 23

25 by Slutsky s lemma and Portmanteau s theorem. On the other hand, for every open set G containing γ 0, we have: Λ + Gγ 0 > sup γ G c K ɛ Λ + Gγ. This together with 2 leads to limsup n P θ θ 0 C K ɛ, γ 0 K ɛ P γ 0 C. 3 Consequently, it follows from that for all ɛ > 0, limsup n P r n θ θ 0 C P γ 0 C + ɛ. 4 Since the right hand side of 4 holds for all ɛ > 0, it also holds for ɛ = 0. The result now follows from Portmanteau s theorem. We end this section with three lemmas that were needed in the proof of Theorem 3. Lemma 4 For all K > 0, let K = {γ E : γ K} be a compact subset of E. Then, under the assumptions of Theorem 3, for any such K, the process γ r 2 nb n θ 0 + γ, ĥ converges weakly to the process γ Λγ + Gγ in l K. Moreover, almost all paths of the limiting process are continuous uniformly on every compact K with respect to. Proof. The weak convergence of the process γ rnb 2 n θ 0 + γ, ĥ in l K follows directly from Slutsky s theorem and Lemmas 5 and 6. On the other hand, makes K totally bounded since it is compact and γ r 2 nb n θ 0 + γ, h 0 + W n γ is asymptotically uniformly -equicontinuous in probability, asymptotically tight, and it converges weakly to γ Λγ + Gγ in l K see proof of Lemma 6. Thus almost all paths of the limiting process are uniformly -continuous on K see Theorem.5.7 in Van der Vaart and Wellner 996. Moreover, because E may be covered by a countable sequence of such compact sets, almost all paths of the limiting process are -continuous on E. Lemma 5 Let K = {γ E : γ K}. Then, under the assumptions of Theorem 3, for all γ K, there exist ξ 0,n, ξ,n, ξ 2,n, such that sup γ K ξ j,n = o P, j = 0,, 2, and rnb 2 n θ 0 + γ, r ĥ + ξ 0,n = [r 2nB n θ 0 + γ ], h 0 + W n γ + ξ,n + ξ 2,n. n 24

26 Proof. Let us introduce the following notations : with θ = θ 0 + γ/. B n θ, h Bθ, h B n θ, h 0 + Bθ, h 0 α 0,n γ = rn 2 + B n θ, h + B n θ, h 0 + Bθ, h + Bθ, h 0, s n,h γ = sign [B n θ 0 + γ ], h, r [ n s h γ = sign B θ 0 + γ ], h, Because the compact K is bounded and θ 0 belongs to the interior of Θ, there exists n K such that for all n n K and for all γ K, the quantity θ 0 + γ γ K entails that This can be reformulated as is in Θ. Then, for all B n θ 0 + γ, r ĥ n = B n θ 0 + γ, h 0 + B θ 0 + γ, r ĥ B θ 0 + γ, h 0 n r n +α 0,n γ rn 2 + B n θ 0 + γ, r ĥ Bn + θ 0 + γ, h 0 5 n r n + B θ 0 + γ B, r ĥ + θ 0 + γ, h 0. n rnb 2 n θ 0 + γ, r ĥ α 0,n γs γ n,ĥ n = rnb 2 n θ 0 + γ, h 0 + α 0,n γs n,h0 γ rnb θ γ, h 0 α 0,n γs h0 γ + rnb 2 θ 0 + γ, r ĥ + α 0,n γsĥγ n + α 0,n γ. 6 Then use assumptions C and C7 to get r 2 n [ B θ 0 + γ, r ĥ B θ 0 + γ ], h 0 n γ = W n γ + β n γ 2 + rno 2 2 := W n γ + α,n γ. 7 r 2 n Combining 6 and 7 we obtain rnb 2 n θ 0 + γ, r ĥ + ξ 0,n γ n = [r 2nB n θ 0 + γ ], h 0 + W n γ + ξ,n γ + ξ 2,n γ, 8 25

27 with ξ 0,n γ = α 0,n γs n, ĥ γ, ξ,n γ = α 0,n γs n,h0 γ, [ γ ξ 2,n γ = α 0,n γ + V γ, γ + rno 2 2 rn 2 sĥ + s h0 γ ] + W n γ + α,n γ sĥ s n,h0 γ + α,n γ + ξ,n γ. It can be easily shown that sup γ K ξ j,n γ = o P for j = 0,, 2 using assumptions C3 and C7. Lemma 6 Let K = {γ E : γ K}. Then, under the assumptions of Theorem 3, the process γ r 2 nb n θ 0 + γ, h 0 + W n γ is asymptotically tight, asymptotically uniformly equicontinuous with respect to on K, and it converges weakly to the process γ Λγ + Gγ in l K. Proof. The main idea of this proof consists in writing the process T n : γ r 2 nb n θ 0 + γ, h 0 + W n γ as the sum of two processes T,n : γ r 2 nb n θ 0 + γ, h 0 Bθ 0 + γ, h 0 and T 2,n : γ r 2 nbθ 0 + γ, h 0 + W n γ and studying separately the properties of T,n and T 2,n. However, in some specific cases it could be possible to state the weak convergence of T n without this decomposition. Let us first note that assumption C7 implies that for n sufficiently large only depending on K so that θ 0 + K Θ, the processes T,n and T 2,n take values in l K. The process T,n does not depend on the estimation of the nuisance parameter. Hence, following exactly the same ideas as in the parametric case we get from assumptions C4, C5 and C0 the asymptotic uniform equicontinuity of T,n with respect to on K as a sub-product of the proof of Theorem 2..9 in Van der Vaart and Wellner 996. On the other hand, fo large enough, θ + γ Θ see the proof of Lemma 5. Assume now that n is large enough and use assumption C7 to conclude that for all 0 < δ δ, sup T 2,n γ T 2,n γ γ,γ K, γ γ δ γ = sup W n γ γ + V γ, γ V γ, γ + rn 2 2 γ 2 o + o γ,γ K, γ γ δ δ τ sup := δ τ α n + b n, γ E, δ δ, γ δ W nγ + sup δ τ γ,γ E, δ δ, γ γ δ 26 r 2 n r 2 n V γ, γ V γ, γ + b δ τ n 9

28 where b n sup γ,γ K rno 2 γ 2 + o γ 2 0 as n tends to infinity, and α rn 2 rn 2 n = O P uniformly over δ δ. Let ɛ and η be arbitrary positive constants. It is clear that, for any 0 < δ δ and any positive constant K, 9 leads to limsup n P sup T 2,n γ T 2,n γ > ɛ γ,γ K, γ γ δ limsup n P δ τ α n + b n > ɛ, α n K, b n < ɛ + limsup 2 n P α n > K limsup n P δ τ > ɛ + limsup 2K n P α n > K. Finally choose K η such that the last term is smaller than η, and take δ δ ɛ 2K η τ. It then follows that T 2,n is asymptotically uniformly equicontinuous in probability with respect to on K. Hence, the same is also true for the process T n, since it is the sum of two such processes. The asymptotic tightness and hence the weak convergence of T n to Λ + G in l K now follows from Theorems.5.7 and.5.4 in Van der Vaart and Wellner 996, together with assumption C9 and the fact that K is totally bounded with respect to the -norm since it is compact. Moreover, using Addendum.5.8 in the same book, almost all paths of the limiting process on K are uniformly continuous with respect to. References Chen, X. and Fan, Y Estimation of copula-based semiparametric time series models. Journal of Econometrics, 30, Chen, X., Linton, O. and Van Keilegom, I Estimation of semiparametric models when the criterion function is not smooth. Econometrica, 7, Escanciano, J., Jacho-Chavez, D. and Lewbel, A Identification and estimation of semiparametric two step models. Unpublished manuscript. Escanciano, J., Jacho-Chavez, D. and Lewbel, A. 20. Uniform convergence for semiparametric two step estimators and tests. Unpublished manuscript. Goldenshluger, A. and Zeevi, A The Hough transform estimator. Annals of Statistics, 32, Groeneboom, P., Jongbloed, G. and Wellner, J.A Estimation of a convex function: characterizations and asymptotic theory. Annals of Statistics, 29, Groeneboom, P. and Wellner, J.A Information bounds and nonparametric maximum likelihood estimation. Birkhäuser, Basel. 27

29 Ichimura, H. and Lee, S Characterization of the asymptotic distribution of semiparametric M-estimators. Journal of Econometrics, 59, Kim, J. and Pollard, D Cube root asymptotics. Annals of Statistics, 8, Kosorok, M.R Introduction to empirical processes and semiparametric inference. Springer, New York. Kristensen, D. and Salanié 200. Higher order improvements for approximate estimators. Unpublished manuscript. Linton, O., Sperlich, S. and Van Keilegom, I Estimation of a semiparametric transformation model. Annals of Statistics, 36, Ma, S. and Kosorok, M.R Robust semiparametric M-estimation and the weighted bootstrap. Journal of Multivariate Analysis, 96, Mammen, E., Rothe, C. and Schienle, M. 20. Semiparametric estimation with generated covariates. Unpublished manuscript. Mohammadi, L. and Van de Geer, S Asymptotics in empirical risk minimization. Journal of Machine Learning Research, 6, Radchenko, P Mixed-rates asymptotics. Annals of Statistics, 36, Van de Geer, S.A Empirical processes in M-estimation. Cambridge University Press, New York. Van der Vaart, A.W. and Wellner, J.A Weak convergence and empirical processes: with applications in statistics. Springer, New York. Van der Vaart, A.W. and Wellner, J.A Empirical processes indexed by estimated functions. IMS Lecture Notes-Monograph Series, 55,

Empirical Processes: General Weak Convergence Theory

Empirical Processes: General Weak Convergence Theory Empirical Processes: General Weak Convergence Theory Moulinath Banerjee May 18, 2010 1 Extended Weak Convergence The lack of measurability of the empirical process with respect to the sigma-field generated

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han and Robert de Jong January 28, 2002 Abstract This paper considers Closest Moment (CM) estimation with a general distance function, and avoids

More information

Closest Moment Estimation under General Conditions

Closest Moment Estimation under General Conditions Closest Moment Estimation under General Conditions Chirok Han Victoria University of Wellington New Zealand Robert de Jong Ohio State University U.S.A October, 2003 Abstract This paper considers Closest

More information

D I S C U S S I O N P A P E R

D I S C U S S I O N P A P E R I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) UNIVERSITÉ CATHOLIQUE DE LOUVAIN D I S C U S S I O N P A P E R 2014/06 Adaptive

More information

Estimation in semiparametric models with missing data

Estimation in semiparametric models with missing data Ann Inst Stat Math (2013) 65:785 805 DOI 10.1007/s10463-012-0393-6 Estimation in semiparametric models with missing data Song Xi Chen Ingrid Van Keilegom Received: 7 May 2012 / Revised: 22 October 2012

More information

The properties of L p -GMM estimators

The properties of L p -GMM estimators The properties of L p -GMM estimators Robert de Jong and Chirok Han Michigan State University February 2000 Abstract This paper considers Generalized Method of Moment-type estimators for which a criterion

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

Maximum score estimation of preference parameters for a binary choice model under uncertainty

Maximum score estimation of preference parameters for a binary choice model under uncertainty Maximum score estimation of preference parameters for a binary choice model under uncertainty Le-Yu Chen Sokbae Lee Myung Jae Sung The Institute for Fiscal Studies Department of Economics, UCL cemmap working

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

Statistical Properties of Numerical Derivatives

Statistical Properties of Numerical Derivatives Statistical Properties of Numerical Derivatives Han Hong, Aprajit Mahajan, and Denis Nekipelov Stanford University and UC Berkeley November 2010 1 / 63 Motivation Introduction Many models have objective

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Concentration behavior of the penalized least squares estimator

Concentration behavior of the penalized least squares estimator Concentration behavior of the penalized least squares estimator Penalized least squares behavior arxiv:1511.08698v2 [math.st] 19 Oct 2016 Alan Muro and Sara van de Geer {muro,geer}@stat.math.ethz.ch Seminar

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation

Statistics 612: L p spaces, metrics on spaces of probabilites, and connections to estimation Statistics 62: L p spaces, metrics on spaces of probabilites, and connections to estimation Moulinath Banerjee December 6, 2006 L p spaces and Hilbert spaces We first formally define L p spaces. Consider

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations

Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Introduction to Empirical Processes and Semiparametric Inference Lecture 13: Entropy Calculations Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence

Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Introduction to Empirical Processes and Semiparametric Inference Lecture 08: Stochastic Convergence Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

Verifying Regularity Conditions for Logit-Normal GLMM

Verifying Regularity Conditions for Logit-Normal GLMM Verifying Regularity Conditions for Logit-Normal GLMM Yun Ju Sung Charles J. Geyer January 10, 2006 In this note we verify the conditions of the theorems in Sung and Geyer (submitted) for the Logit-Normal

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

Product-limit estimators of the survival function with left or right censored data

Product-limit estimators of the survival function with left or right censored data Product-limit estimators of the survival function with left or right censored data 1 CREST-ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France (e-mail: patilea@ensai.fr) 2 Institut

More information

Estimation of a semiparametric transformation model in the presence of endogeneity

Estimation of a semiparametric transformation model in the presence of endogeneity TSE 654 May 2016 Estimation of a semiparametric transformation model in the presence of endogeneity Anne Vanhems and Ingrid Van Keilegom Estimation of a semiparametric transformation model in the presence

More information

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013

Learning Theory. Ingo Steinwart University of Stuttgart. September 4, 2013 Learning Theory Ingo Steinwart University of Stuttgart September 4, 2013 Ingo Steinwart University of Stuttgart () Learning Theory September 4, 2013 1 / 62 Basics Informal Introduction Informal Description

More information

arxiv:submit/ [math.st] 6 May 2011

arxiv:submit/ [math.st] 6 May 2011 A Continuous Mapping Theorem for the Smallest Argmax Functional arxiv:submit/0243372 [math.st] 6 May 2011 Emilio Seijo and Bodhisattva Sen Columbia University Abstract This paper introduces a version of

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Efficient Estimation in Convex Single Index Models 1

Efficient Estimation in Convex Single Index Models 1 1/28 Efficient Estimation in Convex Single Index Models 1 Rohit Patra University of Florida http://arxiv.org/abs/1708.00145 1 Joint work with Arun K. Kuchibhotla (UPenn) and Bodhisattva Sen (Columbia)

More information

NONLINEAR LEAST-SQUARES ESTIMATION 1. INTRODUCTION

NONLINEAR LEAST-SQUARES ESTIMATION 1. INTRODUCTION NONLINEAR LEAST-SQUARES ESTIMATION DAVID POLLARD AND PETER RADCHENKO ABSTRACT. The paper uses empirical process techniques to study the asymptotics of the least-squares estimator for the fitting of a nonlinear

More information

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen

Some SDEs with distributional drift Part I : General calculus. Flandoli, Franco; Russo, Francesco; Wolf, Jochen Title Author(s) Some SDEs with distributional drift Part I : General calculus Flandoli, Franco; Russo, Francesco; Wolf, Jochen Citation Osaka Journal of Mathematics. 4() P.493-P.54 Issue Date 3-6 Text

More information

Classical regularity conditions

Classical regularity conditions Chapter 3 Classical regularity conditions Preliminary draft. Please do not distribute. The results from classical asymptotic theory typically require assumptions of pointwise differentiability of a criterion

More information

Local semiconvexity of Kantorovich potentials on non-compact manifolds

Local semiconvexity of Kantorovich potentials on non-compact manifolds Local semiconvexity of Kantorovich potentials on non-compact manifolds Alessio Figalli, Nicola Gigli Abstract We prove that any Kantorovich potential for the cost function c = d / on a Riemannian manifold

More information

Quantile Processes for Semi and Nonparametric Regression

Quantile Processes for Semi and Nonparametric Regression Quantile Processes for Semi and Nonparametric Regression Shih-Kang Chao Department of Statistics Purdue University IMS-APRM 2016 A joint work with Stanislav Volgushev and Guang Cheng Quantile Response

More information

The sample complexity of agnostic learning with deterministic labels

The sample complexity of agnostic learning with deterministic labels The sample complexity of agnostic learning with deterministic labels Shai Ben-David Cheriton School of Computer Science University of Waterloo Waterloo, ON, N2L 3G CANADA shai@uwaterloo.ca Ruth Urner College

More information

EXTENDING THE SCOPE OF EMPIRICAL LIKELIHOOD

EXTENDING THE SCOPE OF EMPIRICAL LIKELIHOOD The Annals of Statistics 2009, Vol. 37, No. 3, 1079 1111 DOI: 10.1214/07-AOS555 Institute of Mathematical Statistics, 2009 EXTENDING THE SCOPE OF EMPIRICAL LIKELIHOOD BY NILS LID HJORT, IAN W. MCKEAGUE

More information

The Influence Function of Semiparametric Estimators

The Influence Function of Semiparametric Estimators The Influence Function of Semiparametric Estimators Hidehiko Ichimura University of Tokyo Whitney K. Newey MIT July 2015 Revised January 2017 Abstract There are many economic parameters that depend on

More information

Nonparametric Identification and Estimation of a Transformation Model

Nonparametric Identification and Estimation of a Transformation Model Nonparametric and of a Transformation Model Hidehiko Ichimura and Sokbae Lee University of Tokyo and Seoul National University 15 February, 2012 Outline 1. The Model and Motivation 2. 3. Consistency 4.

More information

A Local Generalized Method of Moments Estimator

A Local Generalized Method of Moments Estimator A Local Generalized Method of Moments Estimator Arthur Lewbel Boston College June 2006 Abstract A local Generalized Method of Moments Estimator is proposed for nonparametrically estimating unknown functions

More information

Chapter One. The Calderón-Zygmund Theory I: Ellipticity

Chapter One. The Calderón-Zygmund Theory I: Ellipticity Chapter One The Calderón-Zygmund Theory I: Ellipticity Our story begins with a classical situation: convolution with homogeneous, Calderón- Zygmund ( kernels on R n. Let S n 1 R n denote the unit sphere

More information

2 A Model, Harmonic Map, Problem

2 A Model, Harmonic Map, Problem ELLIPTIC SYSTEMS JOHN E. HUTCHINSON Department of Mathematics School of Mathematical Sciences, A.N.U. 1 Introduction Elliptic equations model the behaviour of scalar quantities u, such as temperature or

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem

Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem Chapter 34 Large Deviations for Weakly Dependent Sequences: The Gärtner-Ellis Theorem This chapter proves the Gärtner-Ellis theorem, establishing an LDP for not-too-dependent processes taking values in

More information

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach

Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach Goodness-of-Fit Tests for Time Series Models: A Score-Marked Empirical Process Approach By Shiqing Ling Department of Mathematics Hong Kong University of Science and Technology Let {y t : t = 0, ±1, ±2,

More information

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability...

Functional Analysis. Franck Sueur Metric spaces Definitions Completeness Compactness Separability... Functional Analysis Franck Sueur 2018-2019 Contents 1 Metric spaces 1 1.1 Definitions........................................ 1 1.2 Completeness...................................... 3 1.3 Compactness......................................

More information

Econometrica Supplementary Material

Econometrica Supplementary Material Econometrica Supplementary Material SUPPLEMENT TO USING INSTRUMENTAL VARIABLES FOR INFERENCE ABOUT POLICY RELEVANT TREATMENT PARAMETERS Econometrica, Vol. 86, No. 5, September 2018, 1589 1619 MAGNE MOGSTAD

More information

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data

Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Supplementary Material for Nonparametric Operator-Regularized Covariance Function Estimation for Functional Data Raymond K. W. Wong Department of Statistics, Texas A&M University Xiaoke Zhang Department

More information

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA

NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Statistica Sinica 213): Supplement NETWORK-REGULARIZED HIGH-DIMENSIONAL COX REGRESSION FOR ANALYSIS OF GENOMIC DATA Hokeun Sun 1, Wei Lin 2, Rui Feng 2 and Hongzhe Li 2 1 Columbia University and 2 University

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19

Introductory Analysis I Fall 2014 Homework #9 Due: Wednesday, November 19 Introductory Analysis I Fall 204 Homework #9 Due: Wednesday, November 9 Here is an easy one, to serve as warmup Assume M is a compact metric space and N is a metric space Assume that f n : M N for each

More information

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010

M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 M- and Z- theorems; GMM and Empirical Likelihood Wellner; 5/13/98, 1/26/07, 5/08/09, 6/14/2010 Z-theorems: Notation and Context Suppose that Θ R k, and that Ψ n : Θ R k, random maps Ψ : Θ R k, deterministic

More information

Part III. 10 Topological Space Basics. Topological Spaces

Part III. 10 Topological Space Basics. Topological Spaces Part III 10 Topological Space Basics Topological Spaces Using the metric space results above as motivation we will axiomatize the notion of being an open set to more general settings. Definition 10.1.

More information

Bootstrap of residual processes in regression: to smooth or not to smooth?

Bootstrap of residual processes in regression: to smooth or not to smooth? Bootstrap of residual processes in regression: to smooth or not to smooth? arxiv:1712.02685v1 [math.st] 7 Dec 2017 Natalie Neumeyer Ingrid Van Keilegom December 8, 2017 Abstract In this paper we consider

More information

Delta Theorem in the Age of High Dimensions

Delta Theorem in the Age of High Dimensions Delta Theorem in the Age of High Dimensions Mehmet Caner Department of Economics Ohio State University December 15, 2016 Abstract We provide a new version of delta theorem, that takes into account of high

More information

Convergence of generalized entropy minimizers in sequences of convex problems

Convergence of generalized entropy minimizers in sequences of convex problems Proceedings IEEE ISIT 206, Barcelona, Spain, 2609 263 Convergence of generalized entropy minimizers in sequences of convex problems Imre Csiszár A Rényi Institute of Mathematics Hungarian Academy of Sciences

More information

University of California San Diego and Stanford University and

University of California San Diego and Stanford University and First International Workshop on Functional and Operatorial Statistics. Toulouse, June 19-21, 2008 K-sample Subsampling Dimitris N. olitis andjoseph.romano University of California San Diego and Stanford

More information

Model Selection and Geometry

Model Selection and Geometry Model Selection and Geometry Pascal Massart Université Paris-Sud, Orsay Leipzig, February Purpose of the talk! Concentration of measure plays a fundamental role in the theory of model selection! Model

More information

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008

Gaussian processes. Chuong B. Do (updated by Honglak Lee) November 22, 2008 Gaussian processes Chuong B Do (updated by Honglak Lee) November 22, 2008 Many of the classical machine learning algorithms that we talked about during the first half of this course fit the following pattern:

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

INFERENCE ON SETS IN FINANCE

INFERENCE ON SETS IN FINANCE INFERENCE ON SETS IN FINANCE VICTOR CHERNOZHUKOV EMRE KOCATULUM KONRAD MENZEL Abstract. In this paper we introduce various set inference problems as they appear in finance and propose practical and powerful

More information

SUPPLEMENTARY MATERIAL TO IRONING WITHOUT CONTROL

SUPPLEMENTARY MATERIAL TO IRONING WITHOUT CONTROL SUPPLEMENTARY MATERIAL TO IRONING WITHOUT CONTROL JUUSO TOIKKA This document contains omitted proofs and additional results for the manuscript Ironing without Control. Appendix A contains the proofs for

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Multiscale Adaptive Inference on Conditional Moment Inequalities

Multiscale Adaptive Inference on Conditional Moment Inequalities Multiscale Adaptive Inference on Conditional Moment Inequalities Timothy B. Armstrong 1 Hock Peng Chan 2 1 Yale University 2 National University of Singapore June 2013 Conditional moment inequality models

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

An elementary proof of the weak convergence of empirical processes

An elementary proof of the weak convergence of empirical processes An elementary proof of the weak convergence of empirical processes Dragan Radulović Department of Mathematics, Florida Atlantic University Marten Wegkamp Department of Mathematics & Department of Statistical

More information

Stochastic Equicontinuity in Nonlinear Time Series Models. Andreas Hagemann University of Notre Dame. June 10, 2013

Stochastic Equicontinuity in Nonlinear Time Series Models. Andreas Hagemann University of Notre Dame. June 10, 2013 Stochastic Equicontinuity in Nonlinear Time Series Models Andreas Hagemann University of Notre Dame June 10, 2013 In this paper I provide simple and easily verifiable conditions under which a strong form

More information

SDS : Theoretical Statistics

SDS : Theoretical Statistics SDS 384 11: Theoretical Statistics Lecture 1: Introduction Purnamrita Sarkar Department of Statistics and Data Science The University of Texas at Austin https://psarkar.github.io/teaching Manegerial Stuff

More information

On threshold-based classification rules By Leila Mohammadi and Sara van de Geer. Mathematical Institute, University of Leiden

On threshold-based classification rules By Leila Mohammadi and Sara van de Geer. Mathematical Institute, University of Leiden On threshold-based classification rules By Leila Mohammadi and Sara van de Geer Mathematical Institute, University of Leiden Abstract. Suppose we have n i.i.d. copies {(X i, Y i ), i = 1,..., n} of an

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Convergence Rate of Nonlinear Switched Systems

Convergence Rate of Nonlinear Switched Systems Convergence Rate of Nonlinear Switched Systems Philippe JOUAN and Saïd NACIRI arxiv:1511.01737v1 [math.oc] 5 Nov 2015 January 23, 2018 Abstract This paper is concerned with the convergence rate of the

More information

REAL VARIABLES: PROBLEM SET 1. = x limsup E k

REAL VARIABLES: PROBLEM SET 1. = x limsup E k REAL VARIABLES: PROBLEM SET 1 BEN ELDER 1. Problem 1.1a First let s prove that limsup E k consists of those points which belong to infinitely many E k. From equation 1.1: limsup E k = E k For limsup E

More information

Estimation for two-phase designs: semiparametric models and Z theorems

Estimation for two-phase designs: semiparametric models and Z theorems Estimation for two-phase designs:semiparametric models and Z theorems p. 1/27 Estimation for two-phase designs: semiparametric models and Z theorems Jon A. Wellner University of Washington Estimation for

More information

Theoretical Statistics. Lecture 17.

Theoretical Statistics. Lecture 17. Theoretical Statistics. Lecture 17. Peter Bartlett 1. Asymptotic normality of Z-estimators: classical conditions. 2. Asymptotic equicontinuity. 1 Recall: Delta method Theorem: Supposeφ : R k R m is differentiable

More information

Solution of the 7 th Homework

Solution of the 7 th Homework Solution of the 7 th Homework Sangchul Lee December 3, 2014 1 Preliminary In this section we deal with some facts that are relevant to our problems but can be coped with only previous materials. 1.1 Maximum

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

Lecture 6: September 19

Lecture 6: September 19 36-755: Advanced Statistical Theory I Fall 2016 Lecture 6: September 19 Lecturer: Alessandro Rinaldo Scribe: YJ Choe Note: LaTeX template courtesy of UC Berkeley EECS dept. Disclaimer: These notes have

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001

Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Statistics 581 Revision of Section 4.4: Consistency of Maximum Likelihood Estimates Wellner; 11/30/2001 Some Uniform Strong Laws of Large Numbers Suppose that: A. X, X 1,...,X n are i.i.d. P on the measurable

More information

The deterministic Lasso

The deterministic Lasso The deterministic Lasso Sara van de Geer Seminar für Statistik, ETH Zürich Abstract We study high-dimensional generalized linear models and empirical risk minimization using the Lasso An oracle inequality

More information

Introduction and Preliminaries

Introduction and Preliminaries Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis

More information

On John type ellipsoids

On John type ellipsoids On John type ellipsoids B. Klartag Tel Aviv University Abstract Given an arbitrary convex symmetric body K R n, we construct a natural and non-trivial continuous map u K which associates ellipsoids to

More information

Metric Spaces and Topology

Metric Spaces and Topology Chapter 2 Metric Spaces and Topology From an engineering perspective, the most important way to construct a topology on a set is to define the topology in terms of a metric on the set. This approach underlies

More information

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term;

min f(x). (2.1) Objectives consisting of a smooth convex term plus a nonconvex regularization term; Chapter 2 Gradient Methods The gradient method forms the foundation of all of the schemes studied in this book. We will provide several complementary perspectives on this algorithm that highlight the many

More information

Some Background Material

Some Background Material Chapter 1 Some Background Material In the first chapter, we present a quick review of elementary - but important - material as a way of dipping our toes in the water. This chapter also introduces important

More information

Minimax lower bounds I

Minimax lower bounds I Minimax lower bounds I Kyoung Hee Kim Sungshin University 1 Preliminaries 2 General strategy 3 Le Cam, 1973 4 Assouad, 1983 5 Appendix Setting Family of probability measures {P θ : θ Θ} on a sigma field

More information

Cross-fitting and fast remainder rates for semiparametric estimation

Cross-fitting and fast remainder rates for semiparametric estimation Cross-fitting and fast remainder rates for semiparametric estimation Whitney K. Newey James M. Robins The Institute for Fiscal Studies Department of Economics, UCL cemmap working paper CWP41/17 Cross-Fitting

More information

Distance between multinomial and multivariate normal models

Distance between multinomial and multivariate normal models Chapter 9 Distance between multinomial and multivariate normal models SECTION 1 introduces Andrew Carter s recursive procedure for bounding the Le Cam distance between a multinomialmodeland its approximating

More information

DISCUSSION PAPER 2016/43

DISCUSSION PAPER 2016/43 I N S T I T U T D E S T A T I S T I Q U E B I O S T A T I S T I Q U E E T S C I E N C E S A C T U A R I E L L E S ( I S B A ) DISCUSSION PAPER 2016/43 Bounds on Kendall s Tau for Zero-Inflated Continuous

More information

HOPF S DECOMPOSITION AND RECURRENT SEMIGROUPS. Josef Teichmann

HOPF S DECOMPOSITION AND RECURRENT SEMIGROUPS. Josef Teichmann HOPF S DECOMPOSITION AND RECURRENT SEMIGROUPS Josef Teichmann Abstract. Some results of ergodic theory are generalized in the setting of Banach lattices, namely Hopf s maximal ergodic inequality and the

More information

Cubic regularization of Newton s method for convex problems with constraints

Cubic regularization of Newton s method for convex problems with constraints CORE DISCUSSION PAPER 006/39 Cubic regularization of Newton s method for convex problems with constraints Yu. Nesterov March 31, 006 Abstract In this paper we derive efficiency estimates of the regularized

More information

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling

Estimation and Inference of Quantile Regression. for Survival Data under Biased Sampling Estimation and Inference of Quantile Regression for Survival Data under Biased Sampling Supplementary Materials: Proofs of the Main Results S1 Verification of the weight function v i (t) for the lengthbiased

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Invariant HPD credible sets and MAP estimators

Invariant HPD credible sets and MAP estimators Bayesian Analysis (007), Number 4, pp. 681 69 Invariant HPD credible sets and MAP estimators Pierre Druilhet and Jean-Michel Marin Abstract. MAP estimators and HPD credible sets are often criticized in

More information

Probability and Measure

Probability and Measure Part II Year 2018 2017 2016 2015 2014 2013 2012 2011 2010 2009 2008 2007 2006 2005 2018 84 Paper 4, Section II 26J Let (X, A) be a measurable space. Let T : X X be a measurable map, and µ a probability

More information

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1

EC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1 EC9A0: Pre-sessional Advanced Mathematics Course Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1 1 Infimum and Supremum Definition 1. Fix a set Y R. A number α R is an upper bound of Y if

More information

LECTURE 15: COMPLETENESS AND CONVEXITY

LECTURE 15: COMPLETENESS AND CONVEXITY LECTURE 15: COMPLETENESS AND CONVEXITY 1. The Hopf-Rinow Theorem Recall that a Riemannian manifold (M, g) is called geodesically complete if the maximal defining interval of any geodesic is R. On the other

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Likelihood Based Inference for Monotone Response Models

Likelihood Based Inference for Monotone Response Models Likelihood Based Inference for Monotone Response Models Moulinath Banerjee University of Michigan September 5, 25 Abstract The behavior of maximum likelihood estimates (MLE s) the likelihood ratio statistic

More information

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form.

An exponential family of distributions is a parametric statistical model having densities with respect to some positive measure λ of the form. Stat 8112 Lecture Notes Asymptotics of Exponential Families Charles J. Geyer January 23, 2013 1 Exponential Families An exponential family of distributions is a parametric statistical model having densities

More information

Supplementary Appendix: Difference-in-Differences with. Multiple Time Periods and an Application on the Minimum. Wage and Employment

Supplementary Appendix: Difference-in-Differences with. Multiple Time Periods and an Application on the Minimum. Wage and Employment Supplementary Appendix: Difference-in-Differences with Multiple Time Periods and an Application on the Minimum Wage and mployment Brantly Callaway Pedro H. C. Sant Anna August 3, 208 This supplementary

More information