Distribution Free Estimation of Heteroskedastic Binary Response Models Using Probit/Logit Criterion Functions

Similar documents
Economics 241B Relation to Method of Moments and Maximum Likelihood OLSE as a Maximum Likelihood Estimator

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

Statistical Inference Based on Extremum Estimators

Kernel density estimator

Study the bias (due to the nite dimensional approximation) and variance of the estimators

SEMIPARAMETRIC SINGLE-INDEX MODELS. Joel L. Horowitz Department of Economics Northwestern University

Lecture 19: Convergence

Optimally Sparse SVMs

Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

1 Covariance Estimation

Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

Notes On Median and Quantile Regression. James L. Powell Department of Economics University of California, Berkeley

Advanced Analysis. Min Yan Department of Mathematics Hong Kong University of Science and Technology

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

11 THE GMM ESTIMATION

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Efficient GMM LECTURE 12 GMM II

Regression with an Evaporating Logarithmic Trend

MATH 320: Probability and Statistics 9. Estimation and Testing of Parameters. Readings: Pruim, Chapter 4

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Output Analysis and Run-Length Control

Maximum Likelihood Estimation

Distribution of Random Samples & Limit theorems

Econ 325/327 Notes on Sample Mean, Sample Proportion, Central Limit Theorem, Chi-square Distribution, Student s t distribution 1.

Rates of Convergence by Moduli of Continuity

ECONOMETRIC THEORY. MODULE XIII Lecture - 34 Asymptotic Theory and Stochastic Regressors

Sequences and Series of Functions

Chapter 6 Infinite Series

Properties and Hypothesis Testing

Random Variables, Sampling and Estimation

Random Walks on Discrete and Continuous Circles. by Jeffrey S. Rosenthal School of Mathematics, University of Minnesota, Minneapolis, MN, U.S.A.

Empirical Processes: Glivenko Cantelli Theorems

Lecture 33: Bootstrap

4. Partial Sums and the Central Limit Theorem

A statistical method to determine sample size to estimate characteristic value of soil parameters

Slide Set 13 Linear Model with Endogenous Regressors and the GMM estimator

ECE 901 Lecture 12: Complexity Regularization and the Squared Loss

Lecture 7: Density Estimation: k-nearest Neighbor and Basis Approach

7.1 Convergence of sequences of random variables

Rademacher Complexity

1.010 Uncertainty in Engineering Fall 2008

Linear regression. Daniel Hsu (COMS 4771) (y i x T i β)2 2πσ. 2 2σ 2. 1 n. (x T i β y i ) 2. 1 ˆβ arg min. β R n d

6.867 Machine learning, lecture 7 (Jaakkola) 1

Rank tests and regression rank scores tests in measurement error models

Advanced Stochastic Processes.

Sieve Estimators: Consistency and Rates of Convergence

Estimation for Complete Data

An Introduction to Asymptotic Theory

Topic 9: Sampling Distributions of Estimators

Preponderantly increasing/decreasing data in regression analysis

ECON 3150/4150, Spring term Lecture 3

CS434a/541a: Pattern Recognition Prof. Olga Veksler. Lecture 5

Topic 9: Sampling Distributions of Estimators

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Linear Regression Demystified

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

Exponential Families and Bayesian Inference

CEE 522 Autumn Uncertainty Concepts for Geotechnical Engineering

Topic 9: Sampling Distributions of Estimators

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

w (1) ˆx w (1) x (1) /ρ and w (2) ˆx w (2) x (2) /ρ.

Chapter 6 Principles of Data Reduction

A Note on Box-Cox Quantile Regression Estimation of the Parameters of the Generalized Pareto Distribution

Double Stage Shrinkage Estimator of Two Parameters. Generalized Exponential Distribution

Sequences. Notation. Convergence of a Sequence

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

A NEW CLASS OF 2-STEP RATIONAL MULTISTEP METHODS

Lecture 3: MLE and Regression


Element sampling: Part 2

MA Advanced Econometrics: Properties of Least Squares Estimators

ECE 8527: Introduction to Machine Learning and Pattern Recognition Midterm # 1. Vaishali Amin Fall, 2015

Asymptotic Results for the Linear Regression Model

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

Point Estimation: properties of estimators 1 FINITE-SAMPLE PROPERTIES. finite-sample properties (CB 7.3) large-sample properties (CB 10.

A survey on penalized empirical risk minimization Sara A. van de Geer

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

The standard deviation of the mean

1 Duality revisited. AM 221: Advanced Optimization Spring 2016

Introductory statistics

Optimization Methods MIT 2.098/6.255/ Final exam

Econ 325 Notes on Point Estimator and Confidence Interval 1 By Hiro Kasahara

Statistical Properties of OLS estimators

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

CSE 527, Additional notes on MLE & EM

ON POINTWISE BINOMIAL APPROXIMATION

It should be unbiased, or approximately unbiased. Variance of the variance estimator should be small. That is, the variance estimator is stable.

LECTURE 14 NOTES. A sequence of α-level tests {ϕ n (x)} is consistent if

10-701/ Machine Learning Mid-term Exam Solution

Information-based Feature Selection

Statistical Inference (Chapter 10) Statistical inference = learn about a population based on the information provided by a sample.

Dimension-free PAC-Bayesian bounds for the estimation of the mean of a random vector

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

11 Correlation and Regression

Lecture 2: Monte Carlo Simulation

ESTIMATING THE ERROR DISTRIBUTION FUNCTION IN NONPARAMETRIC REGRESSION WITH MULTIVARIATE COVARIATES

Lecture 27: Optimal Estimators and Functional Delta Method

( θ. sup θ Θ f X (x θ) = L. sup Pr (Λ (X) < c) = α. x : Λ (x) = sup θ H 0. sup θ Θ f X (x θ) = ) < c. NH : θ 1 = θ 2 against AH : θ 1 θ 2

Elements of Statistical Methods Lots of Data or Large Samples (Ch 8)

Transcription:

Distributio Free Estimatio of Heteroskedastic Biary Respose Models Usig Probit/Logit Criterio Fuctios Shakeeb Kha Duke Uiversity Revised: February Abstract I this paper estimators for distributio free heteroskedastic biary respose models are proposed. The estimatio procedures are based o relatioships betwee distributio free models with a coditioal media restrictio ad parametric models (such as Probit/Logit) exhibitig (multiplicative) heteroskedasticity. The first proposed estimator is based o the observatioal equivalece betwee the two models, ad is a semiparametric sieve estimator (see, e.g. Gallat ad Nychka(987), Ai ad Che(3), Che, Hog ad Tamer(5)) for the regressio coefficiets, based o maximizig stadard Logit/Probit criterio fuctios, such as NLLS ad MLE. This procedure has the advatage that choice probabilities ad regressio coefficiets are estimated simultaeously. The secod proposed procedure is based o the equivalece betwee existig semiparametric estimators for the coditioal media model (Maski(975,985), Horowitz(99)) ad the stadard parametric (Probit/Logit) NLLS estimator. This estimator has the advatage of beig implemetable with stadard software packages such as Stata. Distributio theory is developed for both estimators ad a Mote Carlo study idicates they both perform well i fiite samples. JEL Classificatio: C3,C4,C4 Key Words: biary respose, heteroskedasticity, Probit/Logit, sieve estimatio. Correspodig author. Departmet of Ecoomics, Duke Uiversity, Durham, NC 778; e- mail:shakeebk@duke.edu. I am grateful to Co-editor T. Amemiya, a Aoymous Associate Editor ad Aoymous referees, S. Che, X. Che, M. Coppejas, B. Hooré, A. Lewbel, W. Newey, W. Ploberger, J. Powell, ad semiar participats at Bosto College, Brow, McGill, Harvard/MIT, Rice, ad Texas A&M for helpful commets. This research was supported i part by the Natioal Sciece Foudatio through grat SES-36.

Itroductio The biary respose model has received a great deal of attetio i both the theoretical ad applied ecoometrics literature, as may ecoomic variables of iterest are of a qualitative ature. The model is usually represeted by some variatio of the followig equatio: y i = I[x iβ ɛ i ] (.) where I[ ] is the usual idicator fuctio, y i is the observed respose variable, takig the values or ad x i is a observed vector of covariates which effect the behavior of y i. Both the disturbace term ɛ i, ad the vector β are uobserved, the latter ofte beig the parameter estimated from a radom sample of (y i, x i) i =,,... The disturbace term ɛ i is restricted i ways that esure idetificatio of β. Parametric restrictios specify the distributio of ɛ i up to a fiite umber of parameters ad assume it is distributed idepedetly of the covariates x i. Uder such a restrictio, β ca be estimated (up to scale) usig maximum likelihood or oliear least squares. However, except i special cases, these estimators are icosistet if the distributio of ɛ i is misspecified or coditioally heteroskedastic. Semiparametric, or distributio free restrictios have also bee imposed i the literature, resultig i a variety of estimatio procedures for β. The first was the maximum score estimator proposed i Maski(975). Idetificatio of β was based o a coditioal media restrictio: med(ɛ i x i ) = (.) Maski s estimator maximized the followig objective fuctio M (β) = I[y i = ]I[x iβ ] + I[y i = ]I[x iβ < ] (.3) i= Maski(975,985) established the estimator s cosistecy. Kim ad Pollard(99) established its rate of covergece ad limitig distributio, which were /3 ad o-gaussia, respectively. Horowitz(99) modified the procedure by smoothig the objective fuctio i (.3). Specifically, his approach was to maximize the followig objective fuctio: S (β) = I[y i = ]K h (x iβ) + I[y i = ]( K h (x iβ)) (.4) i= where K h ( ) K( /h) with K( ) deotig a smooth kerel fuctio, ad h deotig a smoothig parameter, covergig to with the sample size. Uder stroger smoothess

coditios o the distributios of ɛ i ad x i, Horowitz showed that the estimator coverges at the rate of /5 with a asymptotically ormal distributio. By stregtheig the coditios further, he was able to attai a rate of p/(p+) where p is a iteger related to the order of smoothess of the distributios of ɛ i ad x iβ i eighborhoods of. These estimators have two disadvatages which this paper attempts to address. For oe, both the maximum score ad smooth maximum score estimatio procedures oly provided a estimator of β. As discussed i Maski(988), a estimator of β permits structural aalysis, which may be of iterest for oe of two reasos. For oe, the researcher may have a scietific iterest i learig about the process yieldig biary outcomes. The other motive is predictio, where structural aalysis eables more precise ad tractable predictio, as well as extrapolatio. However, choice probabilities ad margial effects are also of iterest i most practical applicatios- see Greee(997) for a explaatio. Ufortuately, the maximum score ad smooth maximum score procedures do ot estimate these variables. Alterative semiparametric restrictios used i the literature were idepedece/idex restrictios. These restrictios are much stroger tha the media restrictio metioed, as they require the error term to be distributed idepedetly of x i, or deped o x i through the idex x iβ. Estimatio procedures uder this restrictio iclude those proposed by Cosslett(983), Powell et al.(989), Ichimura(993), Klei ad Spady(993), ad Coppejas(). A advatage of most of these procedures is that they eable joit estimatio of the regressio coefficiets ad choice probabilities. However a drawback of these procedures is the restrictios they are based o are much stroger tha the media restrictio metioed - they require the error term to be distributed idepedetly of x i, or deped o x i through the idex x iβ. They do ot permit the geeral forms of heteroskedasticity that the coditioal media restrictio allows for. Therefore, the first procedure proposed i this paper aims to address the drawbacks of the existig estimators metioed. Specifically, the geeral heteroskedasticity of the coditioal media restrictio is maitaied, yet the joit estimatio of the regressio coefficiets ad the choice probabilities is also permitted. The idea behid this approach is based o the observatioal equivalece betwee a distributio free model uder a coditioal media restrictio, ad a (multiplicative) heteroskedastic parametric (e.g. probit, logit) model. This equivalece result motivates a estimator of the heteroskedastic parametric model, ad the estimators proposed permit joit estimatio of regressio coefficiets ad choice By predictio, we mea i a somewhat crude sese. That is, oe predicts the value of or based o the sig of the estimated idex.

probabilities. The procedures ivolve maximizig stadard parametric criteria fuctios, such as MLE, ad NLLS probit/logit. A secod drawback of maximum ad smoothed maximum score estimators is implemetatio. Specifically, their objective fuctios are o-stadard ad thus they caot be computed usig stadard software packages. This motivates the secod estimator which ca compute regressio coefficiets i the semiparametric biary choice model uder media restrictios usig the NLLS objective fuctio for a parametric model such as Logit or Probit. Cosequetly, the regressio coefficiets ca be estimated usig stadard software packages such as Stata. The paper is orgaized as follows. The followig sectio formally establishes a equivalece result which motivates the first estimatio procedure. Sectio 3 proposes the estimatio procedure ad establishes its asymptotic properties. Sectio 4 proposes the estimatio procedure for the regressio coefficiets that is very simple to implemet o stadard software packages. Sectio 5 explores the fiite sample performace of these estimators via a simulatio study. Sectio 6 cocludes. Proofs of the asymptotic properties of the proposed estimators are left to the appedix. A Equivalece Result The equivalece result is based o the followig two models: y i = I[x iβ ɛ i ] (.) where Model : Coditioal Media Restrictio CM x i R k is assumed to have desity with respect to Lebesgue measure, which is positive o the set X R k. I what follows, we will let x [j,] i deote 3 the j-th compoet of the vector x i, j =,,..k. CM Lettig (t, x) deote P (ɛ i t x i = x) we assume This assumptio is ot required but will be maitaied throughout the paper for otatioal coveiece. Techically we require oly oe regressor to be cotiuously distributed ad have positive desity o the real lie. 3 More geerally we will deote the [a, b] compoet of a matrix M by M [a,b] throughout this paper. 3

CM. (, ) is cotiuous o R X. CM. (t, x) (t, x)/ t exists ad is cotiuous ad positive o R for all x X. CM.3 (, x) = / for x X. CM.4 lim t (t, x) = lim t + (t, x) =. Model : Heteroskedastic Probit/Logit Model HP Assumptio CM. HP ɛ i = σ (x i ) u i where σ( ) is cotiuous ad positive o X a.s., ad u i is idepedet of x i with ay kow (e.g. logistic, ormal) distributio with media ad has a desity fuctio which is positive ad cotiuous o the real lie. Theorem. Uder Assumptios CM,CM,HP,HP, Models ad are observatioally equivalet. Proof: Note that the assumptios i Model easily imply the assumptios i Model are satisfied. Now assume the assumptios of Model are satisfied. We will show that there exists a scale fuctio σ ( ) which satisfies Assumptio HP such that the coditioal distributio of the observed depedet variable is the same uder the two models. Note it will suffice to show that P (y i = x i = x) is the same (x i a.s.) i both models. Let P (x) = (x β, x) deote this probability fuctio for the Model. Now defie σ (x) = x β /Φ (P (x))i[x β ] where Φ( ) deotes the kow c.d.f. of u i. Note that σ (x) > for all x such that x β. This is because x β > P (x) > / Φ (P (x)) >, ad similarly x β < Φ (P (x)) <. We immediately see that for the heteroskedastic probit model, P (y i = x i = x) = Φ(x iβ /σ (x i )) = Φ ((Φ (P (x)))) = P (x). Sice x β = with probability uder Assumptio CM, establishig the equivalece of the two models. Remark. Here we ote the followig implicatios of the established equivalece result: The above equivalece result is similar to the Lemma o page 737 i Maski(988) who established a class of dual models. These models had oliear regressio fuctios 4

ad homoskedastic disturbace terms with kow distributio. 4 Here we have a liear regressio fuctio ad a heteroskedastic 5 ormal error term which makes it relatively simple to extract the structural compoet of the model from the choice probabilities. This is eabled by two properties of the model- ) the ormal distributio has media zero ad positive desity everywhere ) the scale fuctio is positive everywhere. These costraits ca be easily imposed to simultaeously estimate β ad σ ( ), as will be illustrated later i the paper. Aother useful feature of the equivalece result is that it suggests other methods of estimatig the model. The first model is geerally estimated usig the L ad smoothed L orm estimators proposed i Maski(975) ad Horowitz(99). This is a atural approach i the sese that models with coditioal media restrictios are ofte estimated miimizig least absolute deviatio (LAD) objective fuctios. I the followig sectio, we propose a estimator based o the observatioally equivalet Model, ad describe its advatages over the aforemetioed existig estimators. Fially, it should be poited out that the otio of equivalece is defied by equatig the choice probability fuctios. Further otios ca be used to distiguish differet models. Oe such example is the order of smoothess of the probability fuctio, which is oe way to distiguish betwee the maximum score model ad the smoothed maximum score models, resultig i differet rates of covergece for estimatig β. I a separate ote (Kha()), a more refied equivalece result is established betwee Model ad Model above. Specifically, they are equivalet uder stated smoothess coditios i the sese the optimal rate for estimatig β are the same i the two models. 4 I fact there are several other structures that eable the choice probabilities to match up with those attaied from Model. I am grateful to a referee for poitig this out to me. Note also that sice the stadard ormal distributio is symmetrically distributed aroud its mea, the equivalece result here also implies equivalece betwee two distributio-free semiparametric restrictios- coditioal media idepedece ad coditioal symmetry. 5 The multiplicative form of the heteroskedasticity has bee imposed elsewhere i the literature- see, e,g, Klei ad Vella(9). 5

3 Estimatio Procedure Results i the previous sectio suggest that oe could estimate a heteroskedastic probit model which is distributio free. We ote that the result matchig choice probabilities to a distributio free model restricted the sig of the scale fuctio to be positive everywhere o the support of x i. This will have to be icorporated ito the estimatio procedure for cosistet, distributio free estimatio of β. The proposed estimators will cosider joit estimatio of the parameter (β, σ ( )). This is aalogous to existig estimators (e.g. Cosslett(983), Klei ad Spady(993), Coppejas()) of (β, F ( )) where the fuctio F ( ) deotes the c.d.f. of the error term. As metioed previously, these estimators assume idepedece betwee x i ad the error term, rulig out coditioal heteroskedasticity. O the surface it appears that the approach adopted here is allowig for heteroskedasticity at the expese of requirig a parametrically specified error distributio, as well as restrictig the heteroskedasticity to be multiplicative. However, this is ot the case. The oparametric compoet 6 σ ( ) permits both a ukow error distributio ad the coditioal heteroskedasticity of a coditioal media restrictio. The ormality assumptio oly serves to impose the coditioal media restrictio, ad ay distributioal assumptio o u i that has media ca be used for distributio free estimatio 7. Before itroducig the estimator, we itroduce the otatio we will adopt to accout for the fact that the regressio coefficiets are oly idetified up to scale. Followig covetio we set the last coefficiet value to ad estimate the k vector θ, where (θ, ) = β. The heteroskedastic probit model ca be viewed as a likelihood model with ifiite dimesioal parameter space. This class of models has bee studied extesively i the ecoometric ad statistics literature. Work i this area icludes Gema ad Huag(98), Gallat ad Nychka(987), Wog ad Severii(99), She ad Wog(994), She(997), Che ad She(998), ad Coppejas(), Ai ad Che(3), Che et al.(5), Che ad Pouzo(9,). Bieres(). Most of these papers focus o the method of sieves, which will be used i the costructio of a estimator i this paper. The estimator itroduced here is based o treatig the scale fuctio as a ifiite dime- 6 While the previous theorem illustrated idetificatio of β ad P, σ is also idetified ad easily estimable usig the procedure discussed i the followig sectio. It should be emphasized that this parameter by itself is of less iterest, as it oly provides the fuctioal form of the heteroskedasticity whe the errors are ideed ormally distributed. 7 Cosequetly, the ormal c.d.f. used here ca be iterpreted as a particular kerel fuctio, aalogous to kerel fuctios used i smoothed maximum score estimatio. 6

sioal parameter. This motivates costructig a estimator which maximize a probit/logit criterio fuctio which icludes this fuctio. Specifically we defie the criterio fuctio as 8 γ (θ, l) = ( ) (y i Φ ( x iθ + x [k,] i ) exp(l(x i )) ) (3.) i= for α (θ, l) i the (ifiite dimesioal) parameter space A, whose properties will be detailed shortly. To be able to implemet the NLLS procedure, sice the parameter space is ifiite dimesioal, we propose a liear i parameters sieve estimator. Let b j (x i ) deote a sequece of kow basis fuctios 9. Deote b κ (x i ) = (b (x i ),...b κ (x i )) for some iteger κ. A approximator of g(x i ) exp(l(x i )) i the above objective fuctio is g (x i ) = exp(b κ (x i ) Π ) where Π is a vector of costats, ad the expoetial fuctio serves to impose the positivity of the scale fuctio eeded for idetificatio. Let α (θ, g ) A where A is the sieve space. We ca formally defie the estimator as : ˆα = mi (y i Φ (x α A iβ g (x i ))) (3.) i= 8 Here we have used a probit fuctio, with Φ( ) deotig the ormal c.d.f. ad have adopted the NLLS objective fuctio, as its boudedess properties facilitate proofs. The MLE objective fuctio could also be used, but as will be argued later o, this results i the same asymptotic variace matrix as NLLS i this cotext. We also ote that this NLLS objective fuctio is similar to the smoothed maximum score estimator whe oe sets exp(l(x i )) = h where h. The properties of this estimator are discussed i Sectio 4. Fially, ote that the ifiite dimesioal parameter l( ) is the log of the scale fuctio. 9 See, e.g. Che ad She(998) for examples of basis fuctios. For the problem at had with a regressor that has ubouded support, certai basis fuctios (e.g. power series) will ot achieve the desired approximatio for the asymptotic theory to be valid. Cosequetly, we restrict ourselves to basis fuctios suitable for approximatig fuctios of regressors with ubouded support- see. e.g. Che et al.(5) who use polyomial splies. The expoetial fuctio is ot ecessary ad oly adopted here for coveiece. Oe could simply use the approximator b κ (x i ) Π ad impose costraits o Π to esure positivity of the scale fuctio. Sieve estimators ca easily icorporate such parameter costraits- see e.g. She(997). Effectively, we are simply optimizig the objective fuctio with respect to the parameters β, Π. We ote the objective fuctio is smooth i these parameters ad stadard optimizatio routies ca be used to fid local optima. However, the objective fuctio is ot cocave i the parameters, ad a search amogst these local maxima eeds to be coducted. A similar problem is ecoutered with the smoothed maximum score estimator ad Horowitz(99) suggested the use of the geeralized simulated aealig algorithm i Bohachevsky et al.(986). We ote it is ot difficult to implemet a procedure where we impose positivity of the scale fuctio by imposig parameter costraits i the optimizatio. I fact, sice the objective fuctio is smooth i the parameters, CO - a applicatio module writte i GAUSS, ca be used for the problem at had. 7

Remark 3. The idea of miimizig a probit or logit criterio fuctio that icludes a growig umber of basis fuctios is ot ew to the ecoometrics or statistics literature. The first was i the semial work of McFadde(974) who itroduced the Mother Logit model. Stoe(994) estimated choice probabilities i a biary choice model by replacig the idex x iβ with a liear i parameters series, iside a probit or logit likelihood fuctio. While his approach ca estimate the probability fuctio by estimatig the fuctio Φ(g(x i )), it caot estimate the structural parameter β as the proposed procedure ca. We ow detail the coditios uder which the asymptotic properties of this estimator will be derived. The first property we will establish is cosistecy. We first itroduce some otatio which will be used i imposig smoothess ad compactess coditios. This will require itroducig ew otatio, ad the otatio adopted here is idetical to that used i Ai ad Che(3), Che et al.(5). For ay k vector v = (v, v,...v k ), let v deote k i= v i. Let h( ) deote ay fuctio o X. We deote the v -th derivative of the fuctio h( )as: v h(x) = v x v... x v h(x) k k Also, for γ > we let Λ γ (X ) deote the space of fuctios which have up to [γ] (here [ ] deotes the iteger operator) cotiuous derivatives with the highest derivatives that are Holder cotiuous 3 with expoet (γ [γ]). Let E deote the Euclidea orm. For a real valued fuctio h( ) Λ γ (X ) we defie its Holder orm as h Λ γ = sup h(x) + max sup v h(x) v h( x) E x X v =[γ] x x (x x) (x x) γ [γ] Fially we deote a space of fuctios that will be used i defiig the parameter space: Λ γ c (X, w ) {h Λ γ (X ) : h( )( + x x) w / Λ c < } γ where w > ad c is a kow costat. However, this estimator ca be used i a first stage to estimate choice probabilities which ca the be projected oto Φ(x i βg (x i )) to form a estimator of β. Sice the first stage ivolves a cocave objective fuctio if MLE is used, this approach may have computatioal advatages over the approach suggested here. 3 See Ai ad Che(3) for a formal defiitio of Holder cotiuity ad a more detailed discussio o Holder Spaces. 8

The weightig fuctio of the regressors ( + x x) w/ goes to as x E goes to ifiity ad permits h( ) ad its derivatives to be ubouded. 4 With our weightig fuctio we ca itroduce the weighted sup orm defied as: h(x)( h(x),w = sup x X + x x) w / E Our assumptios for cosistecy are: RC (Parameter Space) Recall our otatio that β = (θ, ). Let B = Θ. The parameter space A cosists of all pairs β, l( ) such that i β B, a compact subset of R k. ii l(x) Λ p c(x, w ), where p >. RC (Regressor Distributio) Recall that X deotes the support of the regressors. For simplicity, we assume the regressor vector is cotiuously distributed ad deote its joit desity fuctio as f X ( ). i The k th regressor, coditioal o the other regressors, has desity fuctio with respect to Lebesgue measure that is positive o R The first k compoets of x i, deoted by x i, are assumed to have bouded support. ii The support of the distributio of x i is ot cotaied i ay proper liear subspace of R k. iii ( + x E )w f X (x)dx < where w > w. RC3 E[b κ (x i )b κ (x i ) ] is osigular for all. RC4 The vector (y i, x i) is i.i.d. ad satisfies P (y i = x i ) = Φ(x iβ g (x i )) Φ(x iβ exp(l (x i ))) Remark 3. Before establishig cosistecy, we commet o some of the regularity coditios imposed: RCii is a type of compactess coditio o the fuctioal space, ad ofte imposed i the sieve literature. See, e.g. Che et al.(5). With our defiitio of the sieve space, we will have a sieve approximatio error which coverges to with respect to a weighted sup orm. 4 This is the weightig fuctio used i Che et al.(5). Examples of other weightig fuctios, such as exp( x i x i), ca be foud i Gallat ad Nychka(987). 9

Assumptio RCi imposes regressor support coditios. The coditio o the k th regressor is used for idetificatio. The bouded support coditio o x i is oly made to simplify argumets i the proofs ad ca be relaxed to this subvector havig fiite fourth momets. Assumptio RC3 is maily useful to esure poit idetificatio of the sieve coefficiets. 5 The above coditios are sufficiet to establish cosistecy of the estimator of the regressio coefficiets. The proof is omitted as it follows from virtually idetical argumets as i Che et al.(3). Theorem 3. Uder assumptios RC-RC4, if κ ad κ /, we have ˆβ β = o p () (3.3) While the above result is a importat first step, as metioed i the itroductio, there are several estimators for the model cosidered here for which the regressio coefficiets ca be estimated cosistetly. The motivatio for SNLLS estimator proposed here was also to cosistetly estimate the choice probability fuctio, which we ow tur attetio to. We first ote that cosistecy of the proposed estimator of the scale fuctio(with respect to the weighted sup orm) also follows from coditios RC-RC4 (see, e.g. Propositio A. i Che et al.(5)). Hece the choice probability fuctio estimator is also cosistet. But more importatly, a faster rate with respect to a differet orm ca be attaied uder additioal coditios. This orm, the Fisher orm- see Ai ad Che(3) ad Hu ad Scheach(8), will prove useful o may frots. For oe, a covergece result for this orm will directly tied to a covergece rate for the probability fuctio, as this fuctio is a fuctioal satisfyig a Lipschitz coditio. Secod, the asymptotic distributio of the regressio coefficiet estimator, which we will also establish shortly, will be related to features of this orm. We defie the Fisher orm as follows: o A, ad deote it by F. For α = (θ, l ) ad α = (θ, l ) we defie α α F E[φ igi( x i(θ θ ) (x iβ )(l l )) ( x i(θ θ ) (x iβ )(l l ))] 5 It may ot be ecessary for cosistetly estimatig the regressio coefficiets ad choice probability, as established here. I am grateful to a referee for poitig this out.

Our rate result 6 is stated i the followig theorem, whose proof is omitted as it ca be show usig similar argumets to those used i Ai ad Che(3) ad Che et al.(5). Before statig the theorem, we will impose the followig locally quadratic coditio o the objective fuctio we adopted: RC5 There exist positive costats c, c, c < c such that [ { } ] [ { } ] c E Φ(x iβ exp(l (x i ))) Φ(x iβ exp(l(x i ))) α α F c E Φ(x iβ exp(l (x i ))) Φ(x iβ exp(l(x i ))) for all α A such that β β E = o(), l l,w = o(). (3.4) Theorem 3. Suppose assumptios RC RC5 hold, but with the added coditios p > k/ ad w > p. The ( ) κ ˆα α F = O p + κ p/k (3.5) Remark 3.3 Assumptio RC5 imposes that the populatio criterio fuctio ca be approximated locally by a quadratic fuctio, effectively assumig that the remaider term i a mea value expasio gets small as the parameter α approaches α. This coditio will also be used whe derivig the limitig distributio theory for ˆθ. Remark 3.4 We ote the above rate coicides with the attaied i Newey(997) for estimatig a regressio fuctio usig series estimatio. From our coditios it will also imply the same rate of covergece for choice probability fuctio estimator 7 : E[(Φ(x iβ exp(l(x i ))) Φ(x ˆβ i exp(ˆl(x i )))) ] (3.6) We ext tur attetio to the limitig distributio theory of the estimator ˆθ. For this we require the additioal assumptios: 6 This particular rate result is with respect to the Fisher orm, which, as we will see shortly, will provide us rates for the choice probability fuctioal as well. I fact, cosistecy of ˆα with respect to the stroger weighted sup orm follows from assumptios RC-RC4. We the will derive a rate result ad distributio theory for the estimator of β. 7 As discussed i Ai ad Che(3), attaiig L rates geerally requires stroger coditios tha those eeded for rates with respect to the Fisher orm. I the curret settig, Assumptio RC5 is what eables us to get the same rate uder both orms.

AD β it B. AD Reparameterizig the fuctio g (x i ) σ (x i ) as g (z i, x i ) with z i x iβ, the matrix Q = E[φ() x i x i g (, x i ) f Z X( x)] is o-sigular where φ( ) is the stadard ormal desity fuctio ad f Z X( ) deotes the coditioal desity of z i x iβ give x i. AD3 f X(z Z x) is cotiuously differetiable z i a eighborhood of ad all x. The mai theorem establishes a liear represetatio for the sieve estimator. The liear represetatio exposes the bias ad variace of the estimator as a fuctio of the umber of basis fuctios i the sieve, κ, ad ca be used to derive the rate at which κ i order for ˆβ to coverge to β at the fastest rate i terms of MSE. The liear represetatio below requires the itroductio of some ew otatio. Let φ i, g i deote φ(x iβ g (x i )) ad g (x i ) respectively. Let wi be a (k ) vector fuctio of x i, which satisfies. (θ, w i) A. E[φ ig i( x i + (x iβ )w i) ( x i + (x iβ )w i)] E[φ ig i( x i + (x iβ )w i ) ( x i + (x iβ )w i )] for all (k ) fuctios w i such that (θ, w i ) A The, let x i deote x iβ w i. ad let l (x i ) satisfy (θ, l (x i )) A ad miimize: α α F where α = (β, l (x i )) ad α = (β, l (x i )). Theorem 3.3 Uder assumptios RC-RC5, AD-AD3, if κ /k the ˆθ θ = c(p) Q κ /k + o p ( κ /k ) ad κ /k, φ(x iβ g (x i )) g (x i )( x i x i )(y i Φ(x iβ g (x i ))) i= + o p (κ p/k ) (3.7)

where β deotes a sequece of values i betwee ˆβ ad β, g (x i ) deotes aalogous itermediate values for the scale fuctio, ad where c(p) is a costat depedig o the assumed order of smoothess p, ad whose expressio ca be foud i (A.8). Remark 3.5 From the liear represetatio i the theorem, we ca see that the bias is of order κ p/k, the rate at which the fuctios Φ(x iβ g (x i )) ca be approximated well (with respect to our weighted orm) by our basis fuctio approximatio (see, e.g. Che et al.(5)). The variace is of order κ /k /, the rate we are dividig the summatio by 8. Equatig the two to derive the optimal rate at which the sequece κ icreases, we get κ = O( k/(p+) ). This implies the rate of covergece of the MSE of ˆθ is O( p/(p+) ) which is slower tha the parametric (root-) rate. Che ad Kha(3) show that the parametric rate is ot achievable for a similar model, ad it is cojectured that the MSE rate attaied here is the optimal rate of covergece for the model uder Assumptios RC- RC5, AD-AD3- see Kha(). A immediate corollary to the above theorem is the limitig distributio theory for the sieve NLLS estimator: Corollary 3. Cosider the sequece κ = O( k+ɛ 3/p+k ) where ɛ 3 > is a arbitrarily small costat, ad p > k/. It follows that: κ /k (ˆθ θ ) N(, 4 c(p) Q ) (3.8) We coclude this sectio with some commets o the form of the limitig distributio. Remark 3.6 Recall the estimator was motivated by the fact that the heteroskedastic probit model probabilities could be equated to the probabilities i a distributio free model by settig g (z i, x i ) = ( ) Φ P (z i, x i ) z i 8 Details o how these rates are derived ca be foud i the derivatio of(a.) 3

where recall z i = x iβ ad P (, ) deotes the probability fuctio reparameterized as a fuctio of two argumets. Takig limits as z i (keepig x i fixed) we get g (, x i ) = φ() P (, x i ) where here P (, ) deotes the partial derivative of P (, ) with respect to its first argumet. From this we see Q = E[ x i x i P (, x i ) f Z X( x)] (3.9) ad we ote the form of Q is idepedet of the fact that the ormal c.d.f. was used i the objective fuctio. Remark 3.7 We ote the variace covariace matrix is ot of a sadwich form. While this feature usually occurs for MLE estimators it is a feature of the sieve NLLS estimator because all the iformatio for β is at x iβ =. I fact the structure of the variace matrix is the same as would be obtaied with a ifeasible weighted NLLS estimator, with more weight beig give to observatios where the true idex x iβ is close to a (vaishig) eigborhood of. This causes the usual sadwich form foud i NLLS estimators to collapse, sice here the outerscore term, which has the term Var(y i x i ) = P (y i = x i )( P (y i = x i ), is ow equal to the costat whe 4 x iβ =. This makes the outerscore term proportioal to the hessia term, causig the collapse. We coclude this sectio by illustratig a further advatage of the proposed estimatio procedure. I additio to estimatig the structural parameters β, the sieve approach also permits estimatio of other fuctioals of the probability fuctio. Oe relevat fuctioal is the (weighted) average margial effect, which we defie here as: W = w W (x) P (x)/ xdx (3.) where recall P ( ) deotes the choice probability fuctio ad w W ( ) deotes a weightig fuctio (assumed here to have compact support) satisfyig w W (x) ad w W (x)dx = (3.) Lettig Ŵ deote the estimator obtaied by replacig P with our proposed sieve estimator of the choice probability i (3.). The followig theorem establishes the limitig distributio theory of this estimator. Its proof is omitted as it follows virtually idetical argumets as used i the provig the previous theorems. 4

Theorem 3.4 Uder the coditios imposed i Theorem 3., if κ p/k, the ( Ŵ W) N(, V W ) (3.) where V W = E X [v W (x i )v W (x i ) P (x i )( P (x i ))] (3.3) with v W (x i ) = f X (x i ) w W (x i )/ x i (3.4) Remark 3.8 A atural ad illustrative example of the usefuless of the above theorem is to cosider the covetioal averaged derivative estimator. I this case we would let w W (x) = f X (x), where f X (x) deotes the regressor desity fuctio. The, from equatio (3.), we ca plug i ˆα ito Φ( ) to get a estimate ˆP (x) of the choice probability fuctio, the differetiate it with respect to x to get a estimate of the margial effect, which we ca average across regressor values to get Ŵ = ˆP (x i )/ x i (3.5) i= Remark 3.9 We ote that this limitig distributio correspods to that obtaied i Theorem 3 i Newey(997), who estimated the probability fuctio by a series regressio ad did ot attai a estimator of β. This result agrees with the geeral coclusio i She(997) which is that the two mai coditios affectig the limitig distributio of a smooth fuctioal are the rate of covergece of the sieve estimator ad the smoothess of the fuctioal. Sice the rate of covergece attaied i Theorem 3. aligs with Theorem i Newey(997), oe would the expect the limitig distributios of the same smooth fuctioal to coicide. Remark 3. While the above theorem is for smooth fuctioals, distributio theory for osmooth fuctioals, such as poit wise probability fuctio estimators, should also be attaiable followig argumets used i Che ad Pouzo(9,). The list of formal regularity coditios ad proof of such a theorem is left for future work. 4 Local NLLS Estimators This sectio proposes a procedure which agai relates media based semiparametric estimators for biary choice models to stadard estimatio procedures for parametric biary choice 5

models. Like the previous proposed estimator, the estimator optimizes a NLLS parametric objective fuctio. It differs i the sese that it does ot estimate choice probabilities like the previous procedure, but it has the advatage of beig implemetable i stadard software packages such as Stata. The estimators we propose ivolve combiig the maximum score ad smoothed maximum score objective fuctios i (.3) ad (.4) respectively. First we ote that the objective fuctio of the maximum score estimator: y i I[x iβ ] (4.) i= is idetical to the squared loss objective fuctio (y i I[x iβ ]) (4.) i= sice both y i ad I[ ] are - variables. Next we smooth this objective fuctio as was doe i (.4), by replacig the idicator fuctio with a kerel fuctio. For the smoothed maximum score estimator, the kerel fuctio serves to approximate a c.d.f. We do the same here, usig the c.d.f. of the stadard ormal distributio 9 which as before we deote by Φ( ), ad whose p.d.f we deote by φ( ). To formally defie the estimator, we let h deote a sequece of positive umbers, decreasig to with the sample size. (h ca be viewed as a badwidth sequece foud i oparametric kerel estimatio). We adopt the usual scale ormalizatio i semiparametric models (e.g. Horowitz(99)), where we set the coefficiet o the k th regressor to be, ad cosider estimatio of θ, where β = (θ, ). Our NLLS estimator ˆβ = (ˆθ, ) is defied as ( ( )) x ˆβ = arg mi y i Φ i β (4.3) β Θ i= h The mai advatage of this procedure is that it ivolves the stadard NLLS objective fuctio. I fact, it is the stadard NLLS Probit estimator used to estimate parametric biary choice models. Thus stadard software packages, such as Stata, ca be used to compute the estimator of θ. 9 Actually, the c.d.f. of other radom variables ca be used as well, so for example NLLS Logit ca also be used as a estimator. We oly use the ormal c.d.f. sice its values ca be easily computed usig stadard software packages. For example, i Stata, the l commad fits a arbitrary oliear fuctio by least squares. The Probit regressio fuctio ca be costructed usig Stata s orm( ) commad, which returs cumulative probabilities from the stadard ormal distributio. 6

Regardig asymptotic properties of this estimator, we impose coditios that are idetical to those i Horowitz(99). A θ is i the iterior of a compact set Θ. A The vector x i has bouded support. A3 The desity fuctio of x iβ coditioal o x i, deoted by f Z X( ) is positive ad cotiuously differetiable with bouded derivative. A4 The coditioal probability fuctio of y i, expressed as a fuctio of x i ad x iβ, is twice cotiuously differetiable with respect to x iβ with bouded derivatives for x iβ i a eighborhood of, for all x i. A5 The matrix Q H = E[ P (, x i ) x i x i f Z X( x i )] (4.4) is osigular, where P (x iβ, x i ) deotes the coditioal probability of y i = give x i, which we reparamaterized as a fuctio of x i, x iβ, ad P (, ) deotes the partial derivative of P (, ) with respect to its first argumet. The followig theorem characterizes the estimators rate of covergece ad limitig distributio as a fuctio of h. The proof of the theorem is omitted as it follows from argumets that are similar to those used i Horowitz(99). Theorem 4. Assume CM,CM, A -A5 hold ad h, the,. if h 3 the h (ˆθ θ ) p κ where κ is a k dimesioal vector of costats.. At the rate h = O( /3 ) the /3 (ˆθ θ ) o-stadard (i.e. o-gaussia) distributio. d B where the radom vector B has As the above theorem idicates, the local NLLS estimator has asymptotic properties that are similar to the maximum score estimator proposed i Maski(975,985). Specifically, its rate of covergece ca be as fast as O( /3 ), the same rate of the maximum score estimator, ad it has a o-gaussia limitig distributio. For the NLLS estimator, the o-gaussiaity stems from the result that the Hessia term i its liear represetatio coverges to a radom matrix, implyig the estimator has a asymptotically mixed ormal distributio. See, for example Sectio 9.6 i va der Vaart(998). 7

However, the slow rate of covergece (relative to the smoothed maximum score estimator i Horowitz(99)) is due to a bias coditio, where the bias of the estimator coverges at the rate of h, which is i cotrast to the rate of h for the smoothed maximum score estimator. Thus the differet rates of covergece for the two estimators (NLLS ad SMS) is loosely aalogous to differig rates of covergece for oe-sided ad two-sided kerel estimators i oparametric desity ad regressio estimatio. Fortuately, this bias coditio i NLLS is easily correctible. For example, a alterative kerel fuctio to the ormal c.d.f. could be used to reduce the order of the bias, or other bias reducig mechaisms, such as jackkifig could be implemeted, to achieve the same rate as SMS, as well as a asymptotic ormal distributio. The asymptotic properties of such approaches is left for future work. 5 Mote Carlo Results I this sectio, we ivestigate the small-sample performace of the estimators itroduced i this paper by ways of a small-scale Mote Carlo study. We begi by cosiderig the desigs used i Horowitz(99). These are based o the model: y i = I[x i + β x i u i ] β =, x i N(, ) ad x i N(, ). There are 4 desigs correspodig to 4 differet distributios of u i. They are:. u i logistic, media, variace.. u i uiform, media, variace. 3. u i Studet s t with 3 degrees of freedom, ormalized to have variace. 4. u i =.5( + zi + zi)v 4 i where z i = x i + x i ad v i logistic with media ad variace. The estimators studied i the study are the sieve NLLS (SNLLS), the sieve MLE (SMLE), maximum score (MS), the smoothed maximum score (SMS), the proposed local NLLS estimator (LNLLS) ad its jackkifed versio (JKNLLS). To implemet the estimators for SMS the feasible optimal badwidth sequece itroduced i Horowitz(99) was used. For the sieve estimators a series was used i the expasio of the log scale fuctio with a polyo- 8

mial of degree for = 5 ad otherwise. For the LNLLS a badwidth sequece of /3 was used. For JKNLLS, the weights used were 4/3 ad -/3, ad the badwidths were c /5, c /5 with costats /4 ad. Tables I-IV report the mea bias ad MSE for each of the estimators for = 5, 5, with replicatios. The MS ad SMS results reported are those foud i Horowitz(99). The sieve estimators were computed usig the Nelder- Meade simplex algorithm 3, with 5 radomly geerated startig values 4. The sieve estimators geerally perform better tha SMS across desigs with the exceptio of Desig 3 where results are very similar. The SMLE ad SNLLS perform quite similarly, also i accordace with the theory, as the MSE for the SMLE is ot smaller tha the SNLLS. The local NLLS estimators also perform quite well. Oe surprise i the simulatio results is that i terms of RMSE, for some desigs, the stadard NLLS performs as well as, if ot better tha the other estimators despite its slower rate of covergece. The jackkife procedure geerally results i a lower bias tha the LNLLS, but it appears this sometimes comes at the expese of a larger variace. As metioed i the paper a advatage of the sieve NLLS ad the sieve MLE is that they simultaeously estimate choice probabilities as well as regressio coefficiets. Figures I-IV plot the mea value of the estimated choice probabilities usig SNLLS o a grid of 5 regressor values for each of the 4 desigs, for sample sizes of = 5, 5,, agai usig replicatios. Also reported i paretheses are the values of the average mea square errors (AMSE) which averages MSE across the poits o the grid. As the results idicate the SNLLS does a adequate job of estimatig choice probabilities, ad the values of the AMSE go dow with the sample size. The estimator performs the worst i the heteroskedastic desig, both i terms of the level of the AMSE, ad the rate at which it decreases with the sample size. As a fial compoet of our simulatio study, we explore how each of the estimators perform i a higher dimesioal, more complicated desig. Specifically, we allow for more covariates ad a form of heteroskedasticity that is ot a fuctio of the idex x iβ, but a Precisely, to estimate with a polyomial of order the scale fuctio was approximated with exp(π + Π x i + Π x i + Π 3 x i + Π 4x + Π 5 x i x i ). Results for similar orders were experimeted with but did ot chage results much, ad are ot reported. 3 As metioed previously, sice the objective fuctio is smooth i the parameters, more stadard, gradiet based algorithms may be used. They were ot adopted here to avoid potetial istability problems associated with ear sigularity of hessia matrices, ad also because the relatively low dimesioality of desigs permit the Nelder- Meade algorithm to be computatioally feasible. 4 The simulatio was performed i GAUSS. 9

more geeral form of the covariates. The followig model was simulated: y i = I[x i + β () x i + β () x 3i + β (3) x 4i u i ] where here β () = β () = β (3) =, x i N(, ), x i N(, ), x 3i χ, x 4i N(, ). The heteroskedastic error term was distributed logistically, with scale fuctio exp( x i x 3i ). Table V reports results for the estimator of β () for the same 4 estimators for the same sample sizes ad umber of replicatios. For implemetatio, for the sieve estimator we icreased the order of polyomial by for each sample to accout for the fact there are more regressors. To implemet the SMS, we used the fourth-order kerel fuctio described i Horowitz(99), ad at first used the plug-i method described i Horowitz(99) to select the smoothig parameter. However this resulted i ustable results for = 5, so we implemeted a extra iteratio i the plug i strategy. That is, we implemeted the plug-i method to get a iitial estimator of the regressio coefficiets which we used to estimate the costat i the smoothig parameter. This led to improved results for = 5. As the results i the table idicate, all estimators perform reasoable well. The SMLE is smaller tha the SNLLS for = 5, 5 but the reverse is true for =, providig further evidece that either estimator is more efficiet. The SMS exhibits large values of MSE for = 5 but stabilizes afterwards. Noetheless, eve for = it has a larger MSE tha either sieve estimator. MS exhibits the largest bias ad MSE except for = 5, whe its MSE is smaller tha SMS, though larger tha the sieve estimators. The NLLS ad JKNLLS estimators perform well i this desig as well, but ot as well as the sieve estimators for large sample sizes. The results for this desig are ecouragig for the sieve estimators, demostratig they do ot suffer ay more i higher dimesioal desigs tha existig estimators. 6 Coclusios I this paper ew estimatio procedures for a distributio free heteroskedastic biary respose model were proposed. The sieve estimators eable joit estimatio of the regressio coefficiets, choice probabilities. The regressio coefficiet estimators was show to coverge at a oe-dimesioal oparametric rate, as was foud for the (smoothed) maximum score estimator. While the choice probability fuctio estimator coverged at a oparametric rate, a smooth fuctioal was show to coverge at the parametric rate with a limitig Gaussia distributio. The proposes local NLLS estimators estimated oly regressio co-

efficiets but had the advatage of beig very simple to implemet with stadard software packages. A simulatio study idicates these estimators perform adequately well i fiite samples. The work here suggests areas for future research. Limitig distributio theory for the (poitwise) choice probability, ad margial effects estimators, as well as smooth fuctioals thereof, eeds to be derived. Also it would also be useful to explore if further restrictios o the model, by costraiig the behavior of σ (x i ), would eable improvig upo the optimal rates attaied here ad i Horowitz(993a,b). Such further restrictios would be relatively easy to impose usig the sieve estimatio approach adopted here. Refereces [] Ai, C. ad X. Che (3), Efficiet Estimatio of Models with Coditioal Momet Restrictios Cotaiig Ukow Fuctios, Ecoometrica, 7, 795-844. [] Bieres, H.J. (987) Kerel Estimators of Regressio Fuctios, i T.F. Bewley, ed., Advaces i Ecoometrics, Fifth World Cogress, Vol.., Cambridge: Cambridge Uiversity Press. [3] Bieres, H.J. (), Cosistecy ad Asymptotic Normality of Sieve Estimators Uder Weak ad Verifiable Coditios, Pe State workig paper. [4] Bohachevsky, I.O., M.E. Johso ad M.L. Ster (986) Geeralized Simulated Aealig, Tecometrics, 8, 9-7. [5] Che, S. ad S. Kha(3), Rates of Covergece for Estimatig Regressio Coefficiets i Heteroskedastic Discrete Respose Models, Joural of Ecoometrics, 7, 45-78. [6] Coppejas, M. (), Estimatio of the Biary Respose Model usig a Mixture of Distributios Estimator (MOD), Joural of Ecoometrics,, 3-69. [7] Che, X., H. Hog, ad E. Tamer (5), Noliear Models with Measuremet Error ad Auxiliary Data, Review of Ecoomic Studies, 7, 343-366. [8] Che, X., O. B. Lito, ad I. va Keilegom (3), Estimatio of Semiparametric Models whe the Criterio Fuctio is ot Smooth, Ecoometrica, vol. 7, 59-68. [9] Che, X. ad D. Pouzo (9), Efficiet Estimatio of Semiparametric Coditioal Momet Models with Possibly Nosmooth Residuals, Joural of Ecoometrics, 5, 46-6.

[] Che, X. ad D. Pouzo (), Efficiet Estimatio of Semiparametric Coditioal Momet Models with Possibly Nosmooth Geeralized Residuals, Ecoometrica, 8, 77-3. [] Che, X. ad X. She (998), Sieve Extremum Estimates for Weakly Depedet Data, Ecoometrica, 89-34. [] Cosslett, S.R. (983), Distributio-Free Maximum Likelihood Estimator of the Biary Choice Model, Ecoometrica, 5, 765-78. [3] Gallat, A.R. ad D.W. Nychka (987), Semi-oparametric Maximum Likelihood Estimatio, Ecoometrica, 55, 363-39. [4] Gema, S. ad C. Hwag (983), Noparametric Maximum Likelihood by the Method of Sieves, Aals of Statistics,, 4-44. [5] Greee, W.H. (997), Ecoometric Aalysis, Upper Saddle River, NJ: Pretice Hall [6] Horowitz, J.L. (99), A Smoothed Maximum Score Estimator for the Biary Respose Model, Ecoometrica, 6, 55-53. [7] Horowitz, J.L. (993a), Optimal Rates of Covergece of Parameter Estimators i the Biary Respose Model with Weak Distributioal Assumptio Ecoometric Theory, 9, -8. [8] Horowitz, J.L. (993b), Semiparametric ad Noparametric Estimatio of Quatal Respose Models, i G.S. Maddala, C.R. Rao, H.D. Viod eds. Hadbook of Statistics - Ecoometrics, Amsterdam: North Hollad [9] Hu, Y. ad Scheach, S. M. (8), Istrumetal Variable Treatmet of Noclassical Measuremet Error Models, Ecoometrica, 76, 95-6. [] Ichimura, H. (993) Semiparametric Least Squares ad Weighted SLS Estimatio of Sigle -Idex Models, Joural of Ecoometrics, 58, 7- [] Kha, S. (), Optimal Rates for Regressio Coefficiets Heteroskedastic Biary Respose Models, mauscript, available at http://eco.duke.edu/ shakeebk/optimalrates.pdf [] Kim J., ad D. Pollard (99), Cube Root Asymptotics, Aals of Statistics, 8, 9-9 [3] Klei, R.W. ad R.H. Spady (993), A Efficiet Semiparametric Estimator for Discrete Choice Models, Ecoometrica, 6, 387-4. [4] Klei, R.W. ad F. Vella (9), A Semiparametric Model for Biary Respose ad Cotiuous Outcomes Uder Idex Heteroscedasticity, Joural of Applied Ecoometrics, 4, 73576.

[5] Maski, C.F. (975), Maximum Score Estimatio of the Stochastic Utility Model of Choice, Joural of Ecoometrics, 3, 5-8 [6] Maski, C.F. (985), Semiparametric Aalysis of Discrete Respose: Asymptotic Properties of Maximum Score Estimatio, Joural of Ecoometrics, 7, 33-334 [7] Maski, C.F. (988), Idetificatio of Biary Respose Models, Joural of the America Statistical Associatio, 83, 79-738 [8] McFadde, D. (974), Coditioal Logit Aalysis of Qualitative Choice Behavior, I P. Zarembka (ed.) Frotiers i Ecoometrics, pp 37-53. New York: Academic Press. [9] Newey, W.K. (997), Covergece Rates ad Asymptotic Normality for Series Estimators, Joural of Ecoometrics 79, 4768. [3] Powell, J.L., J.H. Stock, ad T.M. Stoker (989) Semiparametric Estimatio of Idex Coefficiets, Ecoometrica, 57, 44-43. [3] Schumaker, L.L. (98) Splie Fuctios Basic Theory, New York: Joh Wiley ad Sos. [3] She, X. (997) O Method of Sieves ad Pealizatio, Aals of Statistics, 5, 555-59. [33] She, X. ad W.H. Wog (994) Covergece Rates of Sieve Estimates, Aals of Statistics,, 58-65. [34] Sherma, R.P. (994) U-Processes i the Aalysis of a Geeralized Semiparametric Regressio Estimator, Ecoometric Theory,, 37-395 [35] Stoe, C.J. (994) The Use of Polyomial Splies ad Their Tesor Products i Multivariate Fuctio Estimatio, Aals of Statistics,, 8-7. [36] va der Vaart, A.W. (998) Asymptotic Statistics, Cambridge, U.K.: Cambridge Uiversity Press. [37] va der Vaart, A.W. ad J.A. Weller Weak Covergece ad Empirical Processes, New York: Spriger. [38] Wog, H.W. ad T.A. Severii(99), O Maximum Likelihood Estimatio i Ifiite Dimesioal Parameter Spaces, Aals of Statistics, 9, 63-63. 3

A Appedix A. Proof of Theorem 3.3 Before we derive the liear represetatio for the estimator ˆθ, recall we defied the Fisher orm, deoted by F, as α α F E[φ ig i( x i(θ θ ) (x iβ )(l i l i )) ] (A.) where φ i, g i, l i, l i deote φ(x i β g (x i )), g (x i ), l (x i ), l (x i ) respectively. As we will see, derivig the form of the liear represetatio will rely heavily o covergece of certai terms with respect to this orm. We ote that similar argumets as used i, e.g. Ai ad Che(3), ca be used to coclude that ( ) κ ˆα α F = O p + κ p/k (A.) To establish the limitig distributio theory of the estimator we ote there are may results i the literature for the asymptotic theory for smooth fuctioals- see, e.g. She(997), Che ad She(998), Ai ad Che(3) ad Che et al.(5). However, these results apply oly to the root- case, which is ot possible here. 5 Our proof strategy is to follow the argumets used i Ai ad Che(3) Che et al.(5), but make the ecessary modificatios to accout for the fact that the estimator does ot coverge at the parametric (root-) rate. I the rest of this sectio we will scalarize the problem by derivig the liear represetatio for t (ˆθ θ ) where t is a (k ) o zero vector. Followig Ai ad Che(3) we wish to fid the (k ) vector w i t E[φ ig i( x i + z i w i )( x i + z i w i ) ]t which miimizes: (A.3) ad satisfies (θ, w i ) A for each θ Θ. Clearly, the above expectatio ca be set to by settig w i = I[z i ]( x i /z i ), as z i is cotiuously distributed aroud. The fact that we ca make this expectatio as small as possible relates to the impossibility of attaiig the root- rate for ˆθ. What will determie the rate of covergece of the estimator is the rate of covergece of the above expectatio to whe we replace w i with w i where (θ, w i ) A. So we will aim to fid if t E[φ w i igi( x i + z i w i )( x i + z i w i ) ]t (A.4) as a fuctio of κ. 5 See Che ad Kha(3) for a related impossibility result. A result o upper bouds o achievable rates is available from the author. 4