A semiparametric single-index estimator for a class of estimating equation models

Similar documents
Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

7.1 Convergence of sequences of random variables

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Statistical Inference Based on Extremum Estimators

Kernel density estimator

5. Likelihood Ratio Tests

4. Partial Sums and the Central Limit Theorem

Lecture 19: Convergence

7.1 Convergence of sequences of random variables

Study the bias (due to the nite dimensional approximation) and variance of the estimators

SEMIPARAMETRIC SINGLE-INDEX MODELS. Joel L. Horowitz Department of Economics Northwestern University

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence


(A sequence also can be thought of as the list of function values attained for a function f :ℵ X, where f (n) = x n for n 1.) x 1 x N +k x N +4 x 3

Properties and Hypothesis Testing

Exponential Families and Bayesian Inference

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

Application to Random Graphs

Efficient GMM LECTURE 12 GMM II

An Introduction to Randomized Algorithms

Algebra of Least Squares

1 Covariance Estimation

11 THE GMM ESTIMATION

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Random Variables, Sampling and Estimation

Discrete Mathematics for CS Spring 2008 David Wagner Note 22

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Rank tests and regression rank scores tests in measurement error models

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Chapter 6 Principles of Data Reduction

Lecture 2: Monte Carlo Simulation

6.3 Testing Series With Positive Terms

Advanced Stochastic Processes.

Lecture 24: Variable selection in linear models

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Law of the sum of Bernoulli random variables

Sequences and Series of Functions

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

REGRESSION WITH QUADRATIC LOSS

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

Distribution of Random Samples & Limit theorems

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

Mathematical Statistics - MS

Web-based Supplementary Materials for A Modified Partial Likelihood Score Method for Cox Regression with Covariate Error Under the Internal

ECON 3150/4150, Spring term Lecture 3

Information-based Feature Selection

Chapter 6 Infinite Series

Chi-Squared Tests Math 6070, Spring 2006

Lecture 7: Properties of Random Samples

Lecture 6 Simple alternatives and the Neyman-Pearson lemma

Regression with quadratic loss

On Random Line Segments in the Unit Square

EXAMINATIONS OF THE ROYAL STATISTICAL SOCIETY

1 Introduction to reducing variance in Monte Carlo simulations

Chapter 7 Isoperimetric problem

Lecture Stat Maximum Likelihood Estimation

Simulation. Two Rule For Inverting A Distribution Function

GUIDE FOR THE USE OF THE DECISION SUPPORT SYSTEM (DSS)*

Element sampling: Part 2

6. Sufficient, Complete, and Ancillary Statistics

Estimation for Complete Data

1 Inferential Methods for Correlation and Regression Analysis

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 2 9/9/2013. Large Deviations for i.i.d. Random Variables

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 11

Sequences A sequence of numbers is a function whose domain is the positive integers. We can see that the sequence

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Topic 9: Sampling Distributions of Estimators

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

CS284A: Representations and Algorithms in Molecular Biology

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Unbiased Estimation. February 7-12, 2008

Infinite Sequences and Series

Lecture 33: Bootstrap

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Quantile regression with multilayer perceptrons.

Estimation of the essential supremum of a regression function

Linear Regression Demystified

Basis for simulation techniques

1 The Haar functions and the Brownian motion

Goodness-Of-Fit For The Generalized Exponential Distribution. Abstract

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

10. Comparative Tests among Spatial Regression Models. Here we revisit the example in Section 8.1 of estimating the mean of a normal random

LECTURE 2 LEAST SQUARES CROSS-VALIDATION FOR KERNEL DENSITY ESTIMATION

Supplemental Material: Proofs

3. Z Transform. Recall that the Fourier transform (FT) of a DT signal xn [ ] is ( ) [ ] = In order for the FT to exist in the finite magnitude sense,

A statistical method to determine sample size to estimate characteristic value of soil parameters

MATHEMATICAL SCIENCES PAPER-II

CHAPTER 4 BIVARIATE DISTRIBUTION EXTENSION

Axioms of Measure Theory

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Empirical Processes: Glivenko Cantelli Theorems

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

7-1. Chapter 4. Part I. Sampling Distributions and Confidence Intervals

Department of Mathematics

Transcription:

A semiparametric sigle-idex estimator for a class of estimatig equatio models arxiv:1608.04244v2 [math.s] 26 Apr 2017 Maria Hristache Weiyu Li Valeti Patilea Abstract We propose a two-step pseudo-maximum likelihood procedure for semiparametric sigle-idex regressio models where the coditioal variace is a kow fuctio of the regressio ad a additioal parameter. he Poisso sigle-idex regressio with multiplicative uobserved heterogeeity is a example of such models. Our procedure is based o liear expoetial desities with uisace parameter. he pseudo-likelihood criterio we use cotais a oparametric estimate of the idex regressio ad therefore a rule for choosig the smoothig parameter is eeded. We propose a automatic ad atural rule based o the joit maximizatio of the pseudo-likelihood with respect to the idex parameter ad the smoothig parameter. We derive the asymptotic properties of the semiparametric estimator of the idex parameter ad the asymptotic behavior of our optimal smoothig parameter. he fiite sample performaces of our methodology are aalyzed usig simulated ad real data. Keywords: semiparametric pseudo-maximum likelihood, sigle-idex model, liear expoetial desities, badwidth selectio. CRES Esai, email: maria.hristache@esai.fr Correspodig author. CRES Esai, email: liweiyu84@gmail.com CRES Esai, email: valeti.patilea@esai.fr. Valeti Patilea gratefully ackowledges support from the research program New Challeges for New Data of Gees, LCL ad Fodatio de Risque. 1

1 Itroductio I this paper we cosider semiparametric models defied by coditioal mea ad coditioal variace estimatig equatios. Models defied by estimatig equatios for the first ad secod order coditioal momets are widely used i applicatios. See, for istace, Ziegler 2011 for a recet referece. Here we cosider a model that exteds the framework cosidered by Cui, Härdle ad Zhu 2011. o provide some isight o the type of models we study, cosider the followig semiparametric extesio of the classical Poisso regressio model with uobserved heterogeeity: the observed variables are Y, Z where Y deotes the cout variable ad Z is the vector of d explaatory variables. Let r t; θ = E Y Z θ = t. We assume that there exists θ 0 R d such that E Y Z = E Y Z θ 0 = r Z θ 0 ; θ 0. he parameter θ 0 ad the fuctio r are ukow. Give Z ad a uobserved error term ε, the variable Y has a Poisso law of mea r Z θ 0 ; θ 0 ε. If E ε Z = 1 ad V ar ε Z = σ 2, the V ar Y Z = V ar E Y Z, ε Z + E V ar Y Z, ε Z = r Z θ 0 ; θ 0 [ 1 + σ 2 r Z θ 0 ; θ 0 ]. 1.1 his model is a semiparametric sigle-idex regressio model e.g., Powell, Stock ad Stoker 1989, Ichimura 1993, Härdle, Hall ad Ichimura 1993, Sherma 1994b where a secod order coditioal momet is specified as a oliear fuctio of the coditioal mea ad a additioal ukow parameter. his exteds the framework of Cui, Härdle ad Zhu 2011 where the coditioal variace of the respose is proportioal to a give fuctio of the coditioal mea. Our first cotributio is to propose a ew semiparametric estimatio procedure for sigleidex regressio which icorporates the additioal iformatio o the coditioal variace of Y. For this we exted the quasi-geeralized pseudo maximum likelihood method itroduced by Gouriéroux, Mofort ad rogo 1984a, 1984b to a semiparametric framework. More precisely, we propose to estimate θ 0 ad the fuctio r through a two-step pseudomaximum likelihood PML procedure based o liear expoetial families with uisace parameter desities. Such desities are parameterized by the mea r ad a uisace parameter that ca be recovered from the variace. Although we use a likelihood type criterio, o coditioal distributio assumptio o Y give Z is required for derivig the asymptotic results. As a example of applicatio of our procedure cosider the case where Y is a cout variable. First, write the Poisso likelihood where the fuctio r is replaced by a kerel estimator ad maximize this likelihood with respect to θ to obtai a semiparametric PML estimator of θ 0. Use this estimate ad the variace formula 1.1 to deduce a cosistet momet estimator of σ 2. I a secod step, estimate θ 0 through a semiparametric Negative Biomial PML where r is agai replaced by a kerel estimator ad the variace parameter of the Negative Biomial is set equal to the estimate of σ 2. Fially, give the secod step estimate of θ 0, build a kerel estimator for the regressio r. For simplicity, we use a 2

Nadaraya-Watso estimator to estimate r. Other smoothers like local polyomials could be used at the expese of more itricate techical argumets. he occurrece of a oparametric estimator i a pseudo-likelihood criterio requires a rule for the smoothig parameter. While the semiparametric idex regressio literature cotais a large amout of cotributios o how to estimate a idex, there are much less results ad practical solutios o the choice of the smoothig parameter. Eve if the smoothig parameter does ot ifluece the asymptotic variace of a semiparametric estimator of θ 0, i practice the estimate of θ 0 ad of the regressio fuctio may be sesitive to the choice of the smoothig parameter. Aother cotributio of this paper is to propose a automatic ad atural choice of the smoothig parameter used to defie the semiparametric estimator. For this, we exted the approach itroduced by Härdle, Hall ad Ichimura 1993 see also Xia ad Li 1999, Xia, og ad Li 1999 ad Delecroix, Hristache ad Patilea 2006. he idea is to maximize the pseudo-likelihood simultaeously i θ ad the smoothig parameter, that is the badwidth of the kerel estimator. he badwidth is allowed to belog to a large rage betwee 1/4 ad 1/8. I some sese, this approach cosiders the badwidth a auxiliary parameter for which the pseudo-likelihood may provide a estimate. Usig a suitable decompositio of the pseudo-log-likelihood we show that such a joit maximizatio is asymptotically equivalet to separate maximizatio of a purely parametric oliear term with respect to θ ad miimizatio of a weighted mea-squared cross-validatio fuctio with respect to the badwidth. he weights of this cross-validatio fuctio are give by the secod order derivatives of the pseudo-log-likelihood with respect to r. We show that the rate of our optimal badwidth is 1/5, as expected for twice differetiable regressio fuctios. he paper is orgaized as follows. I sectio 2 we itroduce a class of semiparametric PML estimators based o liear expoetial desities with uisace parameter ad we provide a atural badwidth choice. Moreover, we preset the geeral methodology used for the asymptotics. Sectio 3 cotais the asymptotic results. A boud for the variace of our semiparametric PML estimators is also derived. I sectio 4 we use the semiparametric PML estimators to defie a two-step procedure that ca be applied i sigle-idex regressio models where a additioal variace coditio like 1.1 is specified. Sectio 5.1 examies the fiite-sample properties of our procedure via Mote Carlo simulatios. We compare the performaces of a two-step geeralized least-squares with those of a Negative Biomial PML i a Poisso sigle-idex regressio model with multiplicative uobserved heterogeeity. Eve if the two procedures cosidered lead to asymptotically equivalet estimates, the latter procedure seems preferable i fiite samples. A applicatio to real data o the frequecy of recreatioal trips see Camero ad rivedi 2013, page 246 is also provided. Sectio 6 cocludes the paper. he techical proofs are postpoed to the Appedix. 2 Semiparametric PML with uisace parameter Cosider that the observatios Y 1, Z1,..., Y, Z are idepedet copies of the radom vector Y, Z R R d. Assume that there exists θ 0 R d, uique up to a scale 3

ormalizatio factor, such that the sigle-idex model SIM coditio E Y Z = E Y Z θ 0 = r Z θ 0 ; θ 0 2.1 holds. I this paper, we focus o sigle-idex models where the coditioal secod order momet of Y give Z is a kow fuctio of E [Y Z] ad of a uisace parameter. o be more precise, i the model we cosider, V ar Y Z = g E Y Z, α 0 = g r Z θ 0 ; θ 0, α0, 2.2 for some real value α 0. he fuctio g, is kow ad, for each r, the map α g r, α is oe-to-oe. Our framework is slightly more geeral that the oe cosidered by Cui, Härdle ad Zhu 2011 where the coditioal variace of Y give Z is a give fuctio of the coditioal mea of Y give Z multiplied by a ukow costat. o estimate the parameter of iterest θ 0 i a model like 2.1-2.2, we propose a semiparametric PML procedure based o liear expoetial families with uisace parameter. he desity used to build the pseudo-likelihood is take with mea ad variace equal to r ad gr, α, respectively. I this sectio we suppose that a estimator of the uisace parameter is give. I sectio 4 we show how to build such a estimator usig a prelimiary estimate of θ 0 ad coditio 2.2. 2.1 Liear expoetial families with uisace parameter Gouriéroux, Mofort ad rogo 1984a itroduced a class of desities, with respect to a give measure µ, called liear expoetial family with uisace parameter LEFN ad defied as l y r, α = exp [B r, α + C r, α y + D y, α], where α is the uisace parameter. Sice the domiatig measure µ eed ot be Lebesgue measure, the law defied by l is ot ecessarily cotiuous. he fuctios B, ad C, are such that the expectatio of the correspodig law is r while the variace is [ r C r, α] 1. r deotes the derivative with respect to the argumet r. Recall that for ay give α, the followig idetity holds: r B r, α + r C r, α r 0. If α is fixed, a LEFN becomes a liear expoetial family LEF of desities. Gouriéroux, Mofort ad rogo 1984a, 1984b used LEFN desities to defie a two-step PML procedure i oliear regressio models where a specificatio of the coditioal variace is give. Herei, we exted their approach to a semiparametric framework. I the case of the SIM defied by equatio 1.1, the coditioal variace is give by g r, α = r 1 + αr with r ad α > 0. I this case take B r, α = 1 α l 1 + αr ad C r, α = l r 1 + αr, which defie a Negative Biomial distributio of mea r ad variace r 1 + αr. Note that the limit case α = 0 correspods to a Poisso distributio. As aother example, cosider g r, α = r 2 /α with r ad α > 0. Now, take the LEFN desity give by B r, α = α l r ad C r, α = α/r, which is the desity of a gamma law of mea r ad variace r 2 /α. 4

2.2 he semiparametric estimator I order to defie our semiparametric PML estimator i the presece of a uisace parameter let us itroduce some otatio: give {c }, a sequece of umbers growig slowly to ifiity e.g., c = l, let H = { h : c 1/4 h c 1 1/8} be the rage from which the optimal badwidth will be chose. Defie the set Θ = {θ : θ θ 0 d }, 1, with {d } some sequece decreasig to zero. Let α be some real value of the uisace parameter. ypically, α = α 0 if the coditioal variace formula 2.2 is correctly specified. Otherwise, α is some pseudo-true value of the uisace parameter. Suppose that a sequece { α } such that α α, i probability, is give. Set 1 ψ y, r; α = l l y r, α with l y r, α the LEFN desity of expectatio r ad uisace parameter α. Defie the semiparametric PML estimator i the presece of a uisace parameter ad the optimal badwidth as where θ, ĥ 1 = arg max θ Θ, h H ˆr i h t; θ = 1 1 1 1 ψ Y i, ˆr h i Z i θ; θ ; α τ Z i, 2.3 Y j K h t Z j θ j i K h t Z j θ =: γi h t; θ t; θ j i deotes the leave-oe-out versio of the Nadaraya-Watso estimator of the regressio fuctio r t; θ = E Y Z θ = t γ t; θ =: f t; θ, with f ; θ the desity of Z θ. he fuctio K is a secod order kerel fuctio ad K h stads for K /h /h, where h is the badwidth. τ deotes a trimmig fuctio. If the sequece α is costat or ψ does ot deped o α, equatio 2.3 defies a semiparametric PML based o a LEF desity. A trimmig is desiged to keep the desity estimator f h i away from zero i computatios ad it is usually required for aalyzig the asymptotic properties of the oparametric regressio estimator ad of the optimal badwidth. he practical purpose of a trimmig recommeds a data-drive device like I {z: f i h z θ;θ c}, with some fixed c > 0. Herei, I A deotes the idicator fuctio of the set A. However, to esure cosistecy with such a trimmig, oe should require i additio that θ 0 = arg max E [ ψ Y, r θ Z θ I {z: fz θ θ;θ c}z ]. 1 Herei, we focus o ψ y, r; α = l l y r, α where l y r, α = exp [B r, α + C r, α y + D y, α] is a LEFN desity. However, other fuctios ψ y, r; α havig the required properties ca be cosidered see Appedix A. 5 f i h

Meawhile, a trimmig like I {z: fz θ 0 ;θ 0 c} is easier to hadle i theory. Here, we cosider τ = I {z: f i h z θ ; θ c} 2.4 with θ Θ, 1, a sequece with limit θ 0 ad h, 1, a sequece of prelimiary badwidths such that ε h 0 ad 1/2 ε h for some 0 < ε < 1/2. he trimmig procedure we propose represets a appealig compromise betwee the theory ad the applicatios. O oe had, it is easy to implemet. O the other had, we show below that, i a certai sese, our trimmig is asymptotically equivalet to the fixed trimmig I {z: fz θ 0 ;θ 0 c} ad this fact greatly simplifies the proofs. We prove this equivalece uder two types of assumptios: either i Z is bouded ad θ θ 0 = o 1, or ii E [exp λ Z ] <, for some λ > 0, ad θ θ 0 = o 1/ l. o be more precise, defie A = { z : f } z θ 0 ; θ 0 c R d ad A δ = { z : } f z θ 0 ; θ 0 c δ, δ > 0. By little algebra, for all θ Θ, h ad i, I {z: f i h z θ;θ c} Z i I A Z i I A δz i + I δ, G, where Let G = max 1 i sup θ Θ, h Ŝ θ, h; α, A = 1 f i h Z i θ; θ f Z i θ 0 ; θ 0. ψ Y i, ˆr h i Z i θ; θ ; α IA Z i with A = A or A δ. Without loss of geerality, cosider that ψ, ; 0. Sice ψ is the logarithm of a LEFN desity, for ay give y ad α, the map r ψ y, r ; α attais its maximum at r = y; thus, up to a traslatio with a fuctio depedig oly o y ad α, we may cosider ψ 0. I this case we have 1 ψ Y i, ˆr h i Z i θ; θ ; α I{z: f i h z θ ;θ c} Z i Ŝ θ, h; α, A 2.5 Ŝ θ, h; α, A δ I δ, G ψ Y i, ˆr h i Z i θ; θ ; α. We show that Ŝ θ, h; α, A δ = o P Ŝ θ, h; α, A, uiformly over Θ H ad uiformly i α, provided that δ 0 ad P f Z θ 0 ; θ 0 = c = 0. O the other had, we prove that P G > δ 0, provided that δ 0 slowly eough ad h 0 faster tha ε ad slower tha 1/2 ε, for some 0 < ε < 1/2. See Lemma B.2 i the appedix; i that lemma we distiguish two types of assumptios depedig o whether Z is bouded or ot. Deduce that θ, ĥ is asymptotically equivalet to the maximizer of Ŝ θ, h; α, A over Θ H. herefore, hereafter, we simply write Ŝ θ, h; α istead of Ŝ θ, h; α, A ad we cosider θ, ĥ = arg max Ŝ θ, h; α. 2.6 θ Θ, h H 6

2.3 Methodology he semiparametric pseudo-log-likelihood Ŝ θ, h; α ca be split ito a purely parametric oliear part S θ; α, a purely oparametric oe h; α ad a remider term Rθ, h; α, where S θ; α = 1 h; α = 1 R θ, h; α = 1 [ ψ Yi, r Zi θ; θ ; α ψ Yi, r Zi θ 0 ; θ ] 0 ; α I A Z i, 2.7 ψ Y i, ˆr h i Z i θ 0 ; θ 0 ; α I A Z i, [ ψ Yi, ˆr h i Z i θ; θ ; α ψ Yi, r Zi θ; θ ] ; α IA Z i 1 [ ψ Yi, ˆr h i Z i θ 0 ; θ 0 ; α ψ Y i, r Zi θ 0 ; θ ] 0 ; α I A Z i see Härdle, Hall ad Ichimura 1993 for a slightly differet splittig. Give this decompositio, the simultaeous optimizatio of Ŝ θ, h; α is asymptotically equivalet to separately maximizig S θ; α with respect to θ ad h; α with respect to h, provided that R θ, h; α is sufficietly small. A key igrediet for provig that R θ, h; α is egligible with respect to S θ; α ad h; α, uiformly i θ, h Θ H ad for ay { α }, is represeted by the orthogoality coditios E [ 2 ψ Y, r ] Z θ 0 ; θ 0 ; α Z = 0 2.8 ad E [ θ 2 ψ Y, r Z θ 0 ; θ 0 ; α Z θ 0 ] = 0, 2.9 that must hold for ay α, where 2 deotes the derivative with respect to the secod argumet of ψ, ; ad θ is the derivative with respect to all occurreces of θ, that is give y, z ad α, θ 2 ψ y, r z θ 0 ; θ 0 ; α = θ 2ψ y, r z θ; θ ; α θ=θ0 see also Sherma 1994b ad Delecroix, Hristache ad Patilea 2006 for similar coditios. If ψ y, r; α = l l y r, α = B r, α + C r, α y + D y, α, with r B r, α + r C r, α r 0, the 2 ψ y, r; α = r C r, α y r ad thus 2.8 is a cosequece of the SIM coditio 2.1. o check the secod orthogoality coditio ote that E [ 2 22ψ Y, r Z θ 0 ; θ 0 ; α Z ] = E [ 2 22 ψ Y, r Z θ 0 ; θ 0 ; α Z θ 0 ] ad E [ θ r Z θ 0 ; θ 0 Z θ 0 ] = E [ r Z θ 0 ; θ 0 Z E [ Z Z θ 0 ] Z θ 0 ], 7

where r ; θ 0 is the derivative of r ; θ 0. he last idetity is always true uder the SIM coditio e.g., Newey 1994, page 1358. Let us poit out that coditios 2.8-2.9 hold eve if the variace coditio 2.2 is misspecified. Sice R θ, h; α is egligible with respect to S θ; α ad h; α does ot cotai the parameter of iterest, the asymptotic distributio of θ will be obtaied by stadard argumets used for M estimators i the presece of uisace parameters applied to the objective fuctio S θ; α. We deduce that θ behaves as follows: i if the SIM coditio 2.1 holds ad α α = O P 1, for some α, the θ is asymptotically ormal; ii if SIM coditio holds, the coditioal variace 2.2 is correctly specified ad α α 0 = O P 1, the θ is asymptotically ormal ad it has the lowest variace amog the semiparametric PML estimators based o LEF desities. I ay case, the asymptotic distributio of θ θ 0 does ot deped o the choice of α. Let us poit out that i our framework we oly impose α coverget i probability without askig a rate of covergece O P 1/, as it is usually supposed for M estimatio i the presece of uisace parameters. his because the usual orthogoality coditio E [ α θ ψ Y, r Z θ 0 ; θ 0 ; α ] = 0 is true for ay α, provided that ψ y, r; α = l l y r, α with l y r, α a LEFN desity. Ideed, we have E [ α θ ψ Y, r Z θ 0 ; θ 0 ; α ] = E [ α r ψ Y, r Z θ 0 ; θ 0 ; α θ r Z θ 0 ; θ 0 ] = E [ E { α r B r Z θ 0 ; θ 0 ; α + α r C r Z θ 0 ; θ 0 ; α Y Z } θ r Z θ 0 ; θ 0 ] = 0 because E Y Z = r Z θ 0 ; θ 0 ad α r B r, α + α r C r, α r 0, for ay α. For the badwidth ĥ we obtai a asymptotic equivalece with a theoretical optimal badwidth miimizig h; α, that is we prove that the ratio of the two badwidths coverges to oe, i probability. Remark that h; α is a kid of ψ CV cross validatio fuctio. It ca be show that, up to costat additive terms, h; α is asymptotically equivalet to a weighted mea-squared CV fuctio. Whe ψ y, r; α = y r 2, the fuctio h; α is the usual CV fuctio that oe would use for choosig the badwidth for the Nadaraya-Watso estimator of E Y Z θ 0. By extesio of classical results for oparametric regressio, it ca be proved that the rate of the theoretical optimal badwidth miimizig h; α is 1/5 see Lemma B.3 i Appedix B; see also Härdle, Hall ad Ichimura 1993 for the case ψ y, r; α = y r 2. Deduce that ĥ is also of order 1/5. 2.4 Extesios Give the model coditios 2.1-2.2, the idea is to choose a LEFN desity with mea r ad variace gr, α ad to costruct a semiparametric PML estimator give a prelimiary estimate of the uisace parameter α 0. However, it may happe that o such LEFN desity exists or that oe prefers aother type of LEFN desities. he, the idea is to reparametrize 8

the coditioal variace of Y give Z. More precisely, we may cosider l y r, η = exp [B r, η + C r, η y + D y, η], where η stads for the uisace parameter. Let Σ = Σr, η deote the variace of the law give by this desity. Assume that for ay give r, the map η Σr, η is oe-to-oe. I this case, i order to provide a LEFN desity with variace gr, α it suffices to cosider l y r, η with η = Σ 1 r, gr, α. For istace, if gr, α = r1 + αr 2, oe may use a Negative Biomial desity of mea r ad uisace parameter αr. Aother solutio is to cosider a ormal desity of mea r where the variace equal to r1 + αr 2 plays the role of the uisace parameter. I this case, give a estimate of r1 + αr 2, our semiparametric PML becomes a semiparametric geeralized least-squares GLS procedure. Note that this example of fuctio gr, α leads us to the situatio where the uisace parameter is replaced by a uisace fuctio of r ad some additioal parameters. At the expese of more complicated writigs, our methodology ca be exteded to take ito accout the case of a uisace fuctio. More precisely, cosider a more geeral pseudolog-likelihood fuctio ψ y, r; Ψr, gr, α where Ψ, is a give real-valued fuctio ad α is the uisace parameter. See also Gouriéroux, Mofort ad rogo 1984a. o defie θ, ĥ, oe replaces α by Ψ r h Z i θ ; θ ; α i equatio 2.3, where θ, α θ 0, α, i probability, for some α, ad r h ; θ is a Nadaraya-Watso estimator of the regressio r ; θ. he same type of decompositio of the pseudo-log-likelihood criterio ito a purely parametric part fuctio of θ 1 [ ψ Yi, r Zi θ; θ ; Ψ r Zi θ ; θ, g rz i θ, θ, α a purely oparametric part fuctio of h ψ Y i, r Z i θ 0 ; θ 0 ; Ψ r Z i θ 0 ; θ 0, g rz i θ 0, θ 0, α ] I A Z i, h; α = 1 ψ Y i, ˆr h i Z i θ 0 ; θ 0 ; Ψ r Z i θ 0 ; θ 0, g rz i θ 0, θ 0, α I A Z i ad a egligible remider fuctio of θ ad h ca be used. For brevity, the details of this more geeral case are omitted. However, we sketch a quick argumet that applies for the semiparametric GLS. 2 Cosider the semiparametric GLS criterio Ŝ θ, h; θ, α, h = 1 g r h Zi 1 [ θ ; θ ; α Yi ˆr h i Z i θ; θ ] 2 IA Z i with θ, α θ 0, α, i probability, ad h, 1, a sequece of badwidths. Assume that max g r h Zi θ ; θ ; α g rz i θ 0 ; θ 0 ; α I A Z i = o P 1 2.10 1 i 2 his semiparametric geeralized least-squares procedure is a particular case for Picoe ad Butler 2000. However, they do ot provide a badwidth rule. 9

ad g rz θ 0 ; θ 0 ; α I A z stays away from zero. he the GLS criterio Ŝ θ, h; θ, α, h is asymptotically equivalet to the ifeasible GLS criterio 1 [ Yi ˆr h i Z i θ; θ ] 2 g rz i θ 0 ; θ 0 ; α 1 IA Z i, that is we ca decompose the two criteria i such way that, up to egligible remiders, they have exactly the same purely parametric ad purely oparametric parts. Fially, we apply the methodology 3 described i the previous subsectio with ψ y, r; α = y r 2 ad the trimmig I A Z i multiplied by g rzi θ 0 ; θ 0 ; α 1. I order to esure coditio 2.10, it suffices to suppose that the map r, α g r; α satisfies a Lipschitz coditio ad that h is such that max rh Zi θ 0 ; θ 0 rzi θ 0 ; θ 0 IA Z i = o P 1 1 i ad max 1 i θ r h Z i θ; θ I A Z i is bouded i probability, uiformly with respect to θ i o P 1 eighborhoods of θ 0. For istace, a badwidth of order 1/5 satisfies these coditios see Adrews 1995; see also Delecroix, Hristache ad Patilea 2006. Other possible extesios of the framework we cosider is to allow a multi-idex regressio ad/or multivariate depedet variables. For istace, the SIM coditio ca be replaced by the multi-idex coditio E Y Z = E Y Z θ0, 1..., Z θ p 0 with p smaller tha the dimesio of Z, while the secod order momet coditio remais V ar Y Z = g E Y Z, α 0. O the other had, for multivariate depedet variables oe may cosider PML estimatio based o the multivariate ormal or multivariate geeralizatios of Poisso, Negative Biomial distributios Johso, Kotz ad Balakrisha 1997. he decompositio of the pseudo-log-likelihood i S, ad R as above ca still be used for these cases but the detailed aalysis of these extesios will be cosidered elsewhere. 3 Asymptotic results I this sectio we obtai the asymptotic distributio for θ ad the correspodig estimator of the regressio fuctio r t; θ = E [ Y Z θ = t ] as well as the asymptotic behavior of ĥ, with θ, ĥ defied i 2.3. A cosistet estimator for the asymptotic variace matrix of θ is proposed. Moreover, a lower boud for the asymptotic variace matrix of θ is derived. For the idetifiability of the parameter of iterest θ 0, hereafter fix its first compoet, that is θ 0 = 1, θ 0, θ 0 R d 1. herefore, we shall implicitly idetify a vector θ = 1, θ with its last d 1 compoets ad redefie the symbol θ as beig the vector of the first order partial derivatives with respect to the last d 1 compoets of θ. 3 Notice that the trimmig fuctio z I A z with A = { z : f z θ 0 ; θ 0 c } ca be writte as a fuctio of z θ 0. I view of our proofs, it becomes obvious that the methodology described i the previous subsectio remais valid if I A Z i is multiplied by a fuctio depedig oly o Z i θ 0. 10

Let v t; θ = V ar Y Xθ = t. If the SIM assumptio ad variace coditio 2.2 hold, the v Z θ 0 ; θ 0 = g r Z θ 0 ; θ 0, α0. For a give θ, let r ; θ ad r ; θ deote the first ad secod order derivatives of the fuctio r ; θ. Similarly, f ; θ is the derivative of f ; θ. Defie 4 C 1 = K2 1 4 E 1 2 rc r Z θ 0 ; θ 0 ; α 3.1 [r Z 2 r Z θ 0 ; θ ] 2 0 f Z θ 0 ; θ 0 θ 0 ; θ 0 + I A Z f Z θ 0 ; θ 0 { 1 C 2 = K 2 E 2 rc r Z θ 0 ; θ 0 ; α 1 f Z θ 0 ; θ 0 v } Z θ 0 ; θ 0 IA Z, with K 1 = u 2 K u du, K 2 = K 2 u du, ad cosider h opt = arg max h C1 h 4 + C 2 1 h 1 = C 2 /4C 1 1/5 1/5. Defie the d 1 d 1 matrices { [ r I = E C r Z θ 0 ; θ ] 0 ; α 2 v Z θ 0 ; θ 0 θ r Z θ 0 ; θ 0 θ r } Z θ 0 ; θ 0 IA Z J = E [ r C r Z θ 0 ; θ 0 ; α θ r Z θ 0 ; θ 0 θ r ] Z θ 0 ; θ 0 IA Z. Note that I = J if the variace coditio 2.2 holds ad α = α 0. Now, we deduce the asymptotic ormality of the semiparametric PML θ estimator i the presece of a uisace parameter. Moreover, we obtai the rate of decay to zero of the optimal badwidth ĥ. he proof of the followig result is give i Appedix refproof. heorem 3.1 Suppose that the assumptios i Appedix A hold. Defie the set Θ = {θ : θ θ 0 d }, 1, with d l 0 ad α, 1, such that α α = o P 1. Fix c > 0. If θ, ĥ is defied as i 2.3-2.4, the ĥ/hopt 1, i probability, ad D θ θ0 N 0, J 1 IJ 1. If Z is bouded, the same coclusio remais true for ay sequece d 0. 4 Note that 2 22ψy, r = 2 rrcr, α y r r Cr, α. hus, r C ca be replaced by 2 22ψ i the defiitio of the costats C 1 ad C 2. 11

I applicatios J 1 IJ 1 is ukow ad therefore it has to be cosistetly estimated. o this ed, we propose a usual sadwich estimator of the asymptotic variace J 1 IJ 1 e.g., Ichimura 1993. Let f h ; θ deote the kerel estimator for the desity of Z θ. Defie I = 1 J = 1 [ r C rĥ Zi r C rĥ Z i ] 2 θ; θ ; α [Y i rĥ Zi θ rĥ θ; θ ; α θ rĥ Zi Z i ] 2 θ; θ θ; θ θ rĥ Zi θ; θ I{z: f ĥ z θ; θ c} Z i θ; θ θ rĥ Zi θ; θ I{z: f ĥ z θ; θ c} Z i. Propositio 3.2 Suppose that the coditios of heorem 3.1 hold. J 1 IJ 1, i probability. he, J 1 I J 1 Proof. he argumets are quite stadard e.g., Ichimura 1993, sectio 7. O oe had, the covergece i probability of θ ad α ad, o the other had, the covergece i probability of rĥ z θ; θ ad θ rĥ z θ; θ, uiformly over θ i eighborhoods shrikig to θ 0 ad uiformly over z A e.g., Adrews 1995, Delecroix, Hristache ad Patilea 2006 imply I I ad J J, i probability. heorem 3.1 shows, i particular, that θ is asymptotically equivalet to the semiparametric PML based o the LEF pseudo-log-likelihood ψ y, r; α = l f y, r α. As i the parametric case, we ca deduce a lower boud for the asymptotic variace J 1 IJ 1 with respect to semiparametric PML based o LEF desities. his boud is achieved by θ if the SIM assumptio ad the variace coditio 2.2 hold ad α = α 0. he proof of the followig propositio is idetical to the proof of Property 5 of Gouriéroux, Mofort ad rogo 1984a, page 687 ad thus it will be skipped. Propositio 3.3 he set of asymptotic variace matrices of the semiparametric PML estimators based o liear expoetial families has a lower boud equal to K, where { [v ] K 1 = E Z 1 θ 0 ; θ 0 θ r Z θ 0 ; θ 0 θ r } Z θ 0 ; θ 0 IA Z. Cocerig the oparametric part, we have the followig result o theasymptotic distributio of the oparametric estimator of the regressio. he proof is omitted see Härdle ad Stoker 1989. Propositio 3.4 Assume that the coditios of heorem 3.1 are fulfilled. he, for ay t such that f t; θ 0 > 0, ĥ rĥ t; θ D r t; θ 0 ĥ2 β t N 0, K 2 vt; θ 0 f t; θ 0 1 where β t = K 1 /2 [ r t; θ 0 + 2r t; θ 0 f t; θ 0 f t; θ 0 1]. 12

Note that, for ay z such that f z θ 0 ; θ 0 > 0, ĥ rĥ z θ; θ r z θ 0 ; θ 0 ĥ 2 β z θ D 0 N 0, K 2 vz θ 0 ; θ 0 f z 1 θ 0 ; θ 0. Ideed, use the results of Adrews 1995 to deduce that θ rĥ z θ; θ θ r z θ; θ, i probability, uiformly over eighborhoods of θ 0 where f z θ; θ stays away from zero. herefore, we ca write rĥ z θ; θ r z θ 0 ; θ 0 = rĥ z θ; θ rĥ z θ 0 ; θ 0 + rĥ z θ 0 ; θ 0 r z θ 0 ; θ 0 θ = θ rĥ z θ 0 ; θ 0 θ0 + o P θ θ0 + rĥ z θ 0 ; θ 0 r z θ 0 ; θ 0 = O P θ θ0 + rĥ z θ 0 ; θ 0 r z θ 0 ; θ 0 ad obtai the asymptotic ormality of rĥz θ; θ as a cosequece of the cosistecy of θ ad the asymptotic behavior of the Nadaraya-Watso estimator. 4 wo-step semiparametric PML Here, we cosider a two-step semiparametric PML procedure that ca be applied i semiparametric sigle-idex regressio models whe a coditioal variace coditio like V ar Y Z = g E Y Z, α 0 = g r Z θ 0 ; θ 0, α0, 4.2 is specified. Assume that this coditioal variace coditio is correctly specified. At the ed of this sectio we also discuss the misspecificatio case. First, we have to build a sequece {θ } with limit θ 0. Moreover, i the case of ubouded covariates, θ should approach θ 0 faster tha 1/ l. For this purpose, we maximize with respect to θ a pseudo-likelihood based o a LEF desity l y r. We use a fixed trimmig I B with B a subset of R d such that, for ay θ ad ay z B, we have f z θ; θ c > 0. o esure cosistecy for such a PML estimator, we have to check that θ 0 = arg max E [ l l Y r Z θ; θ I B Z ], 4.3 θ ad θ 0 is uique with this property. Recall that the SIM coditio specifies θ 0 as the uique vector satisfyig E [Y Z] = E [ ] Y Z θ 0. O the other had, if l l y r = B r + C r y +D y, the B m+c m r B r+c r r cf. Property 4, Gouriéroux, Mofort ad rogo 1984a, page 684. Deduce that for ay z, θ 0 = arg max E [ l l Y r z θ; θ ] θ ad θ 0 is the uique maximizer. Hece, coditio 4.3 holds for ay set B. his leads us to the followig defiitio of a prelimiary estimator. 13

SEP 1 prelimiary step. Cosider a sequece of badwidths h, 1, such that ε h 0 ad 1/2 ε h for some 0 < ε < 1/2. Moreover, let l y r be a LEF desity. Defie θ = arg max θ 1 l l Y i ˆr h Z i θ; θ I B Z i. Delecroix, Hristache ad Patilea 2006 showed that, uder the regularity coditios required by heorem 3.1, we have θ θ 0 = o P 1/ l. Usig the prelimiary estimate θ ad the variace coditio 4.2 we ca build α, 1, such that α α 0, i probability see the ed of this sectio. Let l y r, α deote a LEFN desity with mea r ad variace g r, α. Cosider c e.g., c = l, defie H = { h : c 1/4 h c 1 1/8}. Moreover, cosider Θ = {θ : θ θ 0 d }, 1 with {d } as i heorem 3.1. Fix some small c > 0. SEP 2. Defie θ, ĥ 1 = arg max θ Θ, h H with θ ad h from Step 1. l l Y i ˆr i h Z i θ; θ ; α I{z: f i h z θ ; θ c} Z i, he followig result is a direct cosequece of heorem 3.1. Corollary 4.1 Suppose that the assumptios of heorem 3.1 hold. If θ ad as i Step 2 above, the D θ θ0 N 0, K, ĥ are obtaied with { [v ] K 1 = E Z 1 θ 0 ; θ 0 θ r Z θ 0 ; θ 0 θ r } Z θ 0 ; θ 0 IA Z. Moreover, ĥ 1, C 2 /4C 1 1/5 1/5 i probability, where C 1 ad C 2 are defied as i 3.1 with α = α 0. Remark 1. Let us poit out that simultaeous optimizatio of the semiparametric criterio i Step 1 with respect to θ, α ad h or with respect to θ ad α for a give h is ot recommeded, eve if the coditioal variace V ar Y Z is correctly specified. Ideed, if the true coditioal distributio of Y give Z is ot the oe give by the LEFN desity l y r, α = exp ψ y, r; α, joit optimizatio with respect to θ ad α leads, i geeral, to a icosistet estimate of α 0. his failure is well-kow i the parametric case where r is a kow fuctio; see commets of Camero ad rivedi 2013, pages 84-85. I view of decompositio 2.7 we deduce that this fact also happes i the semiparametric framework 14

where r has to be estimated. I this case the matrices I ad J defied i sectio 3 are o loger equal ad thus the asymptotic variace of the oe-step semiparametric estimator of θ obtaied by simultaeous maximizatio of the criterio i Step 1 with respect to θ, α does ot achieve the boud K. However, whe the SIM coditio holds ad the true coditioal law of Y is give by the LEFN desity l = exp ψ, our two-step estimator θ ad the semiparametric MLE of θ 0 obtaied by simultaeous optimizatio with respect to θ, α are asymptotically equivalet. Remark 2. Note that if we igore the efficiecy loss due to trimmig, K is equal to the efficiecy boud i the semiparametric model defied oly by the sigle-idex coditio E Y Z = E Y Z θ 0 whe the variace coditio 4.2 holds. o see this, apply the boud of Newey ad Stoker 1993 with the true variace give by 4.2. Our two-stage estimator achieves this SIM efficiecy boud if the variace is well-specified. However, this SIM boud is ot ecessarily the two momet coditios model boud. he latter should take ito accout the variace coditio see Newey 1993, sectio 3.2, for a similar discussio i the parametric oliear regressio framework. I other words our two-stage estimator has some optimality properties but it may ot achieve the semiparametric efficiecy boud of the two momet coditios model. he same remark applies for the two-stage semiparametric geeralized least squares GLS procedure of Härdle, Hall ad Ichimura 1993 [see also Picoe ad Butler 2000]. Achievig semiparametric efficiecy whe the first two momets are specified would be possible, for istace, by estimatig higher orders coditioal momets oparametrically. However, i this case we face agai the problem of the curse of dimesioality that we tried to avoid by assumig the SIM coditio. o complete the defiitio of the two-step procedure above, we have to idicate how to build a cosistet sequece { α }. Such a sequece ca be obtaied from the momet coditio 4.2 after replacig r z θ 0 ; θ 0 by a suitable estimator. his kid of procedure is commoly used i the semiparametric literature e.g., Newey ad McFadde 1994. For simplicity, let us oly cosider the Negative Biomial case where, for ay z, we have E [ Y E Y Z 2 Z = z ] = r z θ 0 ; θ 0 [ 1 + α0 r z θ 0 ; θ 0 ]. 4.4 Cosider a set B R d such that, for ay θ ad ay z B, we have f z θ; θ c > 0. We ca write { [ Y E E r Z 2 ] } { θ 0 ; θ 0 r Z θ 0 ; θ 0 Z I B Z = α 0 E r } Z 2 θ 0 ; θ 0 IB Z. Cosequetly, we may estimate 5 α 0 by α = [ 1 Yi r h Z 2 ] i θ ; θ rh Z i θ ; θ I B Z i 1 r h Zi θ ; θ 2 I B Z i 4.5 5 Oe ca expect little ifluece of the choice of the badwidth used to costruct the α. his is ideed cofirmed by the simulatio experimets we report i sectio 5.1. 15

with θ ad h from Step 1 ad r h the Nadaraya-Watso estimator with badwidth h. Sice θ θ 0, deduce that α α 0, i probability see also the argumets we used i subsectio 2.4. Now, let us commet o what happes with our two-step procedure if the secod order momet coditio is misspecified, while the SIM coditio still holds. I geeral, the sequece α oe may derive from the coditioal variace coditio ad the prelimiary estimate of θ 0 is still coverget to some pseudo-true value α of the uisace parameter. 6 he, the behavior of θ, ĥ yielded by Step 2 is described by heorem 3.1, that is θ is still asymptotically ormal ad ĥ is still of order 1/5. Fially, if the SIM coditio does ot hold, the θ estimates a kid of first projectiopursuit directio. I this case, our procedure provides a alterative to miimum average coditioal variace estimatio MAVE procedure of Xia et al. 2002. he ovelty would be that the first projectio directio is defied through a more flexible PML fuctio tha the usual least-squares criterio. his case will be aalyzed elsewhere. 5 Empirical evidece I our empirical sectio we cosider the case of a cout respose variable Y. A bechmark model for studyig evet couts is the Poisso regressio model. Differet variats of the Poisso regressio have bee used i applicatios o the umber of patets applied for ad received by firms, bak failures, worker abseteeism, airlie or car accidets, doctor visits, etc. Camero ad rivedi 2013 provide a overview of the applicatios of Poisso regressio. I the basic setup, the regressio fuctio is log-liear. A additioal uobserved multiplicative radom error term i the coditioal mea fuctio is usually used to accout for uobserved heterogeeity. I this sectio we cosider semiparametric sigle-idex extesios of such models. 5.1 Mote Carlo simulatios o evaluate the fiite sample performaces of our estimator θ ad of the optimal badwidth ĥ, we coduct a simulatio experimet with 500 replicatios. We cosider three explaatory variables Z = Z 1, Z 2, Z 3 N0, Σ with Σ = [σ ij ] 3 3 ad σ ij = 0.5 i j. he regressio fuctio is EY Z = Z θ 0 2 + 0.5 ad θ 0 = θ 1 0, θ 2 0, θ 3 0 = 1, 3, 2. he coditioal distributio of Y give Z ad ε is Poisso of mea rzθ 0 ; θ 0 ε with ε idepedet of Z ad distributed accordig to 6 For istace, α defied i 4.5 is coverget i probability to α = E[ Y r Z 2 θ 0 ; θ 0 IB Z] E[r Z θ 0 ; θ 0 IB Z] E[rZ θ 0 ; θ 0 2. I B Z] o esure that the limit of α is positive, oe may replace α by max α, ρ for some small but positive ρ. 16

Gamma0.5, 2 or Uiform0, 2. hus, the coditioal variace of Y give Z is give by the fuctio g r, α = r 1 + αr with α 0 = 2 for ε Gamma0.5, 2 ad α = 1/3 for ε Uiform0, 2. For this simulatio experimet we geerate samples of size = 200 ad 300. For the oparametric part we use a quartic kerel K u = 15/16 1 u 2 2 I [ 1,1] u. o estimate the parameter θ 0 ad the regressio r ; θ 0 we use two semiparametric two-step estimatio procedures as defied i sectio 4: i A procedure with a Poisso PML i the first step ad a Negative Biomial PML i the secod step; let θ 2 3 NB SP = 1, θ NB SP, θ NB SP deote the two-step estimator. ii a procedure with a least-squares method i the first step ad a GLS method i the secod step; let θ 2 3 GLS SP = 1, θ GLS SP, θ GLS SP be the two-step estimator. Note that θ NB SP ad θ GLS SP have the same asymptotic variace. I both two-step procedures cosidered, we estimate α 0 usig the estimator defied i 4.5. he badwidth h is equal to 3 1/5. We also cosider the parametric two-step GLS method as a bechmark. I this case the lik fuctio ad the variace parameter are cosidered give; let θ 3 GLS P = 1,, θ GLS P deote the correspodig estimator. θ 2 GLS P able 1. Poisso regressio with uobserved heterogeeity ε Gamma0.5, 2. he true coditioal variace of Y give Z is rzθ 0 ; θ 0 1 + 2rZθ 0 ; θ 0 with r t; θ 0 = t 2 + 0.5. he true vector θ 0 is 1, 3, 2. Let θ NB SP ad θ GLS SP deote the two-step estimators obtaied from the Negative Biomial pseudo-likelihood ad GLS criterio, respectively. he first step Poisso PML estimator is deoted by θ P OI SP. he superscripts idicate the compoets of the vectors. θ2 GLS P θ 2 GLS SP θ 2 P OI SP θ 2 NB SP θ 3 GLS P θ 3 GLS SP θ 3 P OI SP θ 3 NB SP 200 mea 2.8977 2.8019 3.0177 3.1249-1.9501-1.6954-1.7955-2.0520 std. 0.8097 0.8986 0.9937 0.9481 0.6268 0.5170 0.6435 0.5580 MSE 0.3822 0.8467 0.9879 0.9145 0.3929 0.3600 0.4559 0.3167 300 mea 2.9422 2.8261 2.9982 3.0758-1.9594-1.7215-1.8028-1.9569 std. 0.4600 0.7741 0.9288 0.8297 0.5002 0.4568 0.5705 0.4670 MSE 0.2150 0.6295 0.8628 0.6941 0.2519 0.2862 0.3643 0.2199 able 2. he same setup as i able 1 but with ε Uiform0, 2 ad the true coditioal variace of Y give Z equal to rzθ 0 ; θ 0 1 + 1/3rZθ 0 ; θ 0. θ2 GLS P θ 2 GLS SP θ 2 P OI SP θ 2 NB SP θ 3 GLS P θ 3 GLS SP θ 3 P OI SP θ 3 NB SP 200 mea 2.9842 2.8460 2.9755 3.0613-1.9961-1.8702-1.9094-2.0127 std. 0.2505 0.4537 0.6619 0.4917 0.2551 0.2874 0.4117 0.2921 MSE 0.0630 0.2295 0.4387 0.2456 0.0651 0.0994 0.1777 0.0855 300 mea 2.9919 2.8956 2.9422 3.0618-1.9946-1.8999-1.8953-2.0052 std. 0.2213 0.4279 0.5753 0.3658 0.2443 0.2647 0.3639 0.2237 MSE 0.0497 0.1940 0.3343 0.1376 0.0597 0.0800 0.1433 0.0500 17

he results o the estimates of the compoets of θ 0 are provided i able 1 ad able 2. We report the mea, the stadard deviatio ad the estimated mea squared error MSE for each compoet. he two semiparametric estimators that icorporate the iformatio o the coditioal variace clearly outperform the semiparametric sigle-idex estimator that igores that iformatio. Moreover, they behave reasoably well compared to the parametric bechmark. 5.2 A real data example I order to further illustrate our methodology, we cosider a real dataset o recreatioal trips as preseted by Camero ad rivedi 2013. his data iitially collected by Sellar, Stoll ad Chavas 1985 is built from a survey that icludes the umber of recreatioal boatig trips to Lake Sommerville, exas. We reproduce below the tables that describe the observed frequecies ad the explaatory variables. We do ot use all the explaatory variables for estimatio sice the variables C1, C3 ad C4 are almost perfectly correlated i the sample. Ideed, CorrC1, C3 = 0.977, CorrC1, C4 = 0.987 ad CorrC3, C4 = 0.964. o avoid colliearity problems, we drop C3 ad C4. We stadardize the variables IN C ad C1. able 3. he recreatioal trips data set: actual frequecy distributio. Number of rips 0 1 2 3 4 5 6 7 8 9 10 Frequecy 417 68 38 34 17 13 11 2 8 1 13 Number of rips 11 12 15 16 20 25 26 30 40 50 88 Frequecy 2 5 14 1 3 3 1 3 3 1 1 able 4. Explaatory variables for the recreatioal trips couts. Variable Defiitio Mea Std RIP S Number of recreatioal boatig trips i 1980 2.244 6.292 by a sample group SO Facility s subjective quality rakig o a scale of 1 to 5 1.419 1.812 SKI Equal 1 if egaged i water-skiig at the lake 0.367 0.482 IN C Household icome of the head of the group $10,000/year 0.385 0.185 F C3 Equal 1 if user s fee paid at Lake Sommerville 0.019 0.139 C1 Hudreds of dollar expediture whe visitig Lake Coroe 0.554 0.467 C3 Hudreds of dollar expediture whe visitig Lake Somerville 0.599 0.488 C4 Hudreds of dollar expediture whe visitig Lake Housto 0.560 0.461 he model we cosider is the oe give by equatios 2.1-2.2 with gr, α = r1 + αr. First, we assume that the regressio fuctio is log-liear, that is we cosider the stadard Negative Biomial Parametric model NB-P. Next, we o loger assume that the regressio fuctio is kow ad we apply our semiparametric methodology, the semiparametric 18

hat rt 1 2 3 4 5 2 0 2 4 t Figure 1: he lik fuctio Negative Biomial pseudo-likelihood procedure. I the semi-parametric procedures the coefficiet of the variable SO is set to 1. For the oparametric part we use the quartic kerel Ku = 15/161 u 2 2 I [ 1,1] u. he parameter estimates ad estimated stadard errors are gathered i able 5, the plot of the estimated lik fuctio is provided i Figure 1. able 5. Estimatio results: parametric NB-P versus semiparametric model NB-SP. Parameters NB-P NB-SP Itercept -1.7452 0.1441. SO 0.9017 0.0430 1 SKI 0.4420 0.1707-0.2489 0.0405 IN C -0.2245 0.0906 0.1963 0.0690 F C3 1.5813 0.4404-0.1399 0.0702 C1-0.3258 0.1018-0.2987 0.0995 α 2.2983 0.2210 5.5764 h. 5.6530 Note that the estimate of the coefficiet of SO i the parametric model is close to oe, while i the semiparametric approach we fixed it to oe. hus the estimated values of the remaiig parameters i the parametric ad semiparametric cases are almost directly comparable. he results obtaied with the semiparametric approach seem more realistic. For istace, the coefficiet of INC covariate is positive with NB-SP ad the lik fuctio is strictly mootoe. his suggests that a higher icome more likely iduces a larger umber of recreatioal trips. he NB-P model leads to the opposite coclusio. he reported parametric ad semiparametric stadard errors caot be directly compared o the same basis sice we ca oly compute the stadard error of a ratio of parameters i the semiparametric cases. he large badwidth could be explaied by the large coditioal variace of the 19

respose ad a lik fuctio with a secod derivative close to zero. his leads to a large costat C 2 /4C 1 1/5 i the expressio of h opt, see equatio 3.1 above. I order to evaluate the overall performace of the parametric ad semiparametric models ad of the estimatio methods, we cosider various goodess-of-fit measures such as the Pearso statistic, the deviace statistic ad the deviace pseudo R-squared statistic. he Pearso statistics is give by Y i r i 2 P =, ω i where r i is the estimated coditioal mea for idividual i ad ω i is the estimated coditioal variace computed accordig to equatio 2.2. he deviace statistic is give by D = 2 [ Y i l Yi ] Yi + 1/ α Y i + 1/ α l, r i r i + 1/ α with α the estimated value of the uisace parameter with the values give i the able 5. Fially, if Y deotes the sample mea of the variable Y, the deviace pseudo R-squared statistic is R 2 DEV = 1 [ Y i l Y i / r i Y i + 1/ α l [ Y i l Y i / Y Y i + 1/ α l ] Yi +1/ α r i +1/ α Yi +1/ α Y +1/ α Aother model diagostic is obtaied whe comparig fitted probabilities ad actual probabilities by the mea of a chi-square type statistic. he statistic we cosider is J 2 pj p j ξ =, p j j=1 7 where the possible values of Y are aggregated i J o overlappig cells. he actual frequecy for cell j is deoted p j while p j is the correspodig predicted probability by the model uder study. For both methods GLS-SP ad NB-SP we used the probabilities of a egative biomial distributio to compute p j. We cosider seve cells correspodig to the values RIP = 0,..., 5 ad RIP > 5. All the results are summarized i able 6. he semiparametric model performs better tha the parametric model. We also give the estimators of the probability i able 7. We ca see that our estimators are close to the empirical probability of RIP. he semiparametric approach greatly improves the stadard parametric modelig. ]. able 6. Goodess-of-fit statistics: P Pearso statistic, D deviace statistic, RDEV 2 deviace pseudo R-squared statistic ad ξ chi-square statistic. 7 he chi-square statistic we cosider is ot ecessarily chi-square distributed uder the ull hypotheses of a well specified model. his is because it does ot correctly take ito accout the estimatio error i p j. See Adrews 1988 for the geeral defiitio of the chi-square goodess-of-fit test statistic i odyamic regressio models. Here, we oly use ξ as a crude diagostic for the three types of fitted probabilities p j. 20

NB-P NB-SP P 5296.506 608.7212 D 1158.41 405.1771 RDEV 2 0.4780 0.1886 ξ 968.9416 3.1922 able 7. Empirical probability ad estimate probability RIP S 0 1 2 3 4 5 > 5 Empirical probability 0.6327 0.1031 0.0576 0.0515 0.0257 0.0197 0.1092 NB P 0.1111 0.1572 0.1596 0.1407 0.1147 0.0889 0.2273 NB SP 0.6314 0.1045 0.0568 0.0381 0.02797 0.0215 0.1194 6 Coclusio We cosider a semiparametric sigle-idex model SIM where a additioal secod order momet coditio is specified. o estimate the parameter of iterest θ we itroduce a two-step semiparametric pseudo-maximum likelihood PML estimatio procedure based o liear expoetial families with uisace parameter desities. his procedure exteds the quasi-geeralized pseudo-maximum likelihood method proposed by Gouriéroux, Mofort ad rogo 1984a, 1984b. We also provide a atural rule for choosig the badwidth of the oparametric smoother appearig i the estimatio procedure. he idea is to maximize the pseudo-likelihood of the secod step simultaeously i θ ad the smoothig parameter h. he rate of the badwidth is allowed to lie i a rage betwee 1/4 ad 1/8. We derive the asymptotic behavior of θ, the two-step semiparametric PML we propose. If the SIM coditio holds, the θ is asymptotically ormal. We also provide a cosistet estimator of its variace. Whe the SIM coditio holds ad the coditioal variace is correctly specified, the θ has the best variace amogst the semiparametric PML estimators. he optimal badwidth ĥ obtaied by joit maximizatio of the pseudo-likelihood fuctio i the secod step is show to be equivalet to the miimizer of a weighted cross-validatio fuctio. From this we deduce that 1/5 ĥ coverges to a positive costat, i probability. I particular, our optimal badwidth ĥ has the rate expected whe estimatig a twice differetiable regressio fuctio oparametrically. We coduct a simulatio experimet i which the data were geerated usig a Poisso sigle-idex regressio model with multiplicative uobserved heterogeeity. he simulatio cofirms the sigificat advatage of estimators that icorporate the iformatio o the coditioal variace. We also applied our semiparametric approach to a bechmark real cout data set ad we obtai a much better fit tha the stadard parametric regressio models for cout data. 21

A Appedix: Assumptios Let Θ = {1} Θ with Θ a compact subset of R d 1 with ovoid iterior. Depedig o the cotext, Θ is cosidered a subset of R d 1 or a subset of R d. Assumptio A.1 he observatios Y 1, Z1,..., Y, Z are idepedet copies of a radom vector Y, Z R d+1. Assumptio A.2 Let r t; θ = E Y Z θ = t. here exists a uique θ 0 iterior poit of Θ such that E Y Z = E Y Z θ 0 = r Z θ 0 ; θ 0. Assumptio A.3 For every θ Θ, the radom variable Z θ admits a desity f ; θ with respect to the Lebesgue measure o R. Assumptio A.4 E [exp λ Z ] <, for some λ > 0. Moreover, EY 4+ε <, for some ε > 0. Assumptio A.5 With probability oe, the matrix 1, Z 1, Z is positive defiite. Assumptio A.6 here exists c 0 > 0 ad a positive iteger k 0 such that, for ay θ Θ ad 0 < c c 0, the set {t : ft; θ = c} has at most k 0 elemets. he last two assumptios esure that P fz θ 0 ; θ 0 = c = 0, for ay 0 < c c 0. CONDIION L A fuctio g : Θ R R is said to satisfy Coditio L if, for ay Λ a compact set o the real lie, there exists B > 0 ad b 0, 1] such that g θ, t g θ, t B θ, t θ, t b, θ, θ Θ, t, t Λ. Assumptio A.7 a he fuctio θ, t f t; θ 0, θ Θ, t R, satisfies a Lipschitz coditio, that is there exists a 0, 1] ad C > 0 such that f t; θ f t ; θ C θ, t θ, t a for θ, θ Θ ad t, t R. b he fuctio θ, t r t; θ, θ Θ, t R, satisfies Coditio L. c For ay θ Θ, the fuctios t γ t; θ ad t f t; θ are twice differetiable. Let γ t; θ ad f t; θ deote the secod order derivatives. he fuctios θ, t γ t; θ ad θ, t f t; θ, θ Θ, t R, satisfy Coditio L with b = 1. d For ay θ Θ ad ay compoet Z j of Z, the fuctios t E Z j Z θ = t ad t E Y Z j Z θ = t are twice differetiable ad their secod order derivatives satisfy Coditio L with b = 1. e For ay t R, the fuctio θ r t; θ is twice cotiuously differetiable ad, for ay θ Θ, the fuctios t θ r t; θ ad t θθ 2 r t; θ are cotiuous. Moreover, the fuctio θ, t θ r t; θ satisfy Coditio L with b = 1. Let v t; θ = V ar Y Z θ = t be the coditioal variace of Y give Z θ = t. 22

Assumptio A.8 he fuctio θ, t v t; θ satisfies Coditio L. Cosider the fuctios B, C : R N R, with Y, R, N R. Defie Λ = θ Θ {t : ft; θ c}, with c, δ > 0, ad Dc, δ = {r : θ, t Θ Λ such that r rt; θ δ}. Assumptio A.9 If c > 0, there exists δ > 0 such that Dc, δ is strictly icluded i R. Assumptio A.10 he kerel fuctio K is differetiable, symmetric, positive ad compactly supported. Moreover, K ad the derivative K are of bouded variatio. Up to a term depedig oly o y ad α, the three argumets fuctio ψ, ; ivolved i equatio 2.3 is defied as ψ y, r; α = Br, α + Cr, αy where ly r, α = exp [Br, α + Cr, αy + Dy, α] is a LEFN desity with mea r ad variace [ r Cr, α] 1. Assumptio A.11 he fuctios B r, α ad C r, α are twice differetiable i the first argumet. Moreover, for ay c ad δ > 0 for which Dc, δ is strictly icluded i R, there exists a costat M such that sup r,r Dc,δ, α,α N sup r Dc,δ, α N 2 rr Gr, α + r Gr, α M, 2 rr Gr, α 2 rrgr, α M r r + α α, where G stads for B or C. he fuctios r Br; α ad r Cr; α are cotiuously differetiable i α. Assumptio A.12 For ay c ad δ > 0 for which Dc, δ is strictly icluded i R, we have r Cr, α > 0, r Dc, δ, α N. Assumptio A.12 esures that the d 1 d 1 matrix J = E [ 2 θθ ψ Y, r Z θ 0 ; θ 0 ; α I A Z ] [ = E r C r Z θ 0 ; θ 0 ; α θ r Z θ 0 ; θ 0 θ r ] Z θ 0 ; θ 0 IA Z is positive defiite. Let us otice that the asymptotic results remai valid eve if the fuctio ψ y, r; α is ot the logarithm of a LEFN. It suffices to adapt Assumptio A.11, to suppose that there exists F ; such that ψ y, r; α F y; α, r R, to esure that J is positive defiite ad to assume that, for ay α, E [ 2 ψ Y, r Z θ 0 ; θ 0 ; α Z ] = 0 ad E [ θ 2 ψ Y, r Z θ 0 ; θ 0 ; α Z θ 0 ] = 0. 23