Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Similar documents
Nonparametric Econometrics

Nonparametric Methods

Nonparametric Regression

Econ 582 Nonparametric Regression

An introduction to nonparametric and semi-parametric econometric methods

41903: Introduction to Nonparametrics

Day 4A Nonparametrics

Finite Sample Performance of Semiparametric Binary Choice Estimators

Modelling Non-linear and Non-stationary Time Series

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression. Badr Missaoui

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Semiparametric Models and Estimators

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Probability. Paul Schrimpf. January 23, UBC Economics 326. Probability. Paul Schrimpf. Definitions. Properties. Random variables.

Michael Lechner Causal Analysis RDD 2014 page 1. Lecture 7. The Regression Discontinuity Design. RDD fuzzy and sharp

Applied Health Economics (for B.Sc.)

Lecture: Simultaneous Equation Model (Wooldridge s Book Chapter 16)

DISCUSSION PAPER. The Bias from Misspecification of Control Variables as Linear. L e o n a r d G o f f. November 2014 RFF DP 14-41

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

Day 3B Nonparametrics and Bootstrap

Quick Tour of Basic Probability Theory and Linear Algebra

Lecture 6: Discrete Choice: Qualitative Response

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

Introduction to Nonparametric and Semiparametric Estimation. Good when there are lots of data and very little prior information on functional form.

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation of Treatment Effects under Essential Heterogeneity

A review of some semiparametric regression models with application to scoring

Final Exam. Economics 835: Econometrics. Fall 2010

ESTIMATING AVERAGE TREATMENT EFFECTS: REGRESSION DISCONTINUITY DESIGNS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics

Economics 583: Econometric Theory I A Primer on Asymptotics

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Local Polynomial Regression

Nonparametric Density Estimation

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

4. Distributions of Functions of Random Variables

ECON 3150/4150, Spring term Lecture 6

HT Introduction. P(X i = x i ) = e λ λ x i

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Time Series and Forecasting Lecture 4 NonLinear Time Series

Introduction to Regression

Nonparametric Function Estimation with Infinite-Order Kernels

Binary Models with Endogenous Explanatory Variables

Simple Estimators for Semiparametric Multinomial Choice Models

BAYESIAN DECISION THEORY

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

2 (Statistics) Random variables

Regression Discontinuity Design Econometric Issues

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

Probability and Distributions

Lecture 3: Statistical Decision Theory (Part II)

Applied Economics. Regression with a Binary Dependent Variable. Department of Economics Universidad Carlos III de Madrid

Non-linear panel data modeling

Lecture 2: Repetition of probability theory and statistics

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

Additive Isotonic Regression

Continuous Random Variables

Random Variables. Random variables. A numerically valued map X of an outcome ω from a sample space Ω to the real line R

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Semiparametric Estimation of a Sample Selection Model in the Presence of Endogeneity

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

Non-parametric Inference and Resampling

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

probability of k samples out of J fall in R.

MLE and GMM. Li Zhao, SJTU. Spring, Li Zhao MLE and GMM 1 / 22

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Simple Estimators for Monotone Index Models

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

ECONOMETRICS II (ECO 2401S) University of Toronto. Department of Economics. Spring 2013 Instructor: Victor Aguirregabiria

7 Semiparametric Estimation of Additive Models

Optimal bandwidth selection for the fuzzy regression discontinuity estimator

Introduction to Regression

Introduction to Maximum Likelihood Estimation

Statistical inference on Lévy processes

Appendix A : Introduction to Probability and stochastic processes

3. Probability and Statistics

Formulas for probability theory and linear models SF2941

Transformation and Smoothing in Sample Survey Data

Binary Choice Models Probit & Logit. = 0 with Pr = 0 = 1. decision-making purchase of durable consumer products unemployment

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

4 Nonparametric Regression

Rewrap ECON November 18, () Rewrap ECON 4135 November 18, / 35

3 Nonparametric Density Estimation

1 Empirical Likelihood

Formulary Applied Econometrics

Adaptive Nonparametric Density Estimators

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

ECO Class 6 Nonparametric Econometrics

Perhaps the simplest way of modeling two (discrete) random variables is by means of a joint PMF, defined as follows.

The logistic regression model is thus a glm-model with canonical link function so that the log-odds equals the linear predictor, that is

Regression Discontinuity Designs

Statistics: Learning models from data

Analogy Principle. Asymptotic Theory Part II. James J. Heckman University of Chicago. Econ 312 This draft, April 5, 2006

Transcription:

0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity mistaken for discontinuity Nonlinearitymistakenfordiscontinuity -.5 0 O 0.2.4.6.8 1 X

Objectives of the slides Overview of nonparametric density estimation, which plays a central role in nonparametric analysis. Methods for estimating conditional means: Nadaraya-Watson, kernel regression. Semiparametric models: partially linear (Robinson, 1988, Yatchew, 1998), single index models (Ichimura, 1993, Klein and Spady, 1993). Familiarize with implementation.

Histograms Kernel density estimation

Distribution function Histograms Kernel density estimation Definition: Cumulative distribution function. The cumulative distribution function (c.d.f.) of a random variable X, denoted F X ( ), is a function with domain on the real line and counterdomain on the interval [0, 1] which satisfies F X (x) = P[X x] = P[ω : X (ω) x] for every real number x. F X ( ) = 0 and F X (+ ) = 1 F X ( ) is a monotone non-decreasing function, i.e. F X (a) F X (b) if a < b F X ( ) is continuous from the right, i.e. lim 0<h 0 F X (x + h) = F X (x)

Discrete random variables Histograms Kernel density estimation Definition: Discrete random variable A random variable will be defined discrete if the range of its domain X is countable. If a random variable is discrete than the cumulative distribution function will be defined to be discrete. Definition: Discrete density function If X is a discrete random variable with values x 1, x 2,...x n,... the function f X (x) = P[X = x j ] if x = x j, j = 1,...n,... and zero otherwise is defined the discrete density function of X.

Continuous random variables Histograms Kernel density estimation Definition: Continuous random variable A random variable will be called continuous if there exists a function f X ( ) such that F X (x) = x f X (u)du for every real number x. Definition: Probability density function If X is a continuous random variable the function f X ( ) in F X (x) = x f X (u)du is called the probability density function (or continuous density function). Note: It is important to recognize that f is not by itself a probability. Instead, the probability that X lies in the interval (x, x + dx) is f (x)dx, or for a finite interval (a, b) it is b a f (x)dx. Any function f ( ) with domain the real line and counterdomain [0, ) is defined to be a probability density function if: f (x) 0 for all x f (x)dx = 1

Histograms Histograms Kernel density estimation Constructing a histogram is straightforward. If X is a discrete random variable with domain {x 1, x 2,...} then select M = #X and point-bins. Then, f ˆ(x) = i 1x i =x n If X is a continuous random variable with domain X then consider a series of bins, intervals that cover the domain of X, assumed to be bounded. Let M be the number of bins, indexed by m, each of the same width h. Then we have bins of the form Then, [x m h, x m + h), m = 1, 2,..., M x 1 < x 2 <... < x M 1 < x M, x m + h = x m+1 h f ˆ(x) = i 1 [x i in same bin as x] nh How to select the bins optimally? In particular, this depends on the bandwidth h.

Kernel Histograms Kernel density estimation The kernel K is a symmetric function satisfying: 1 2 3 K (ψ)dψ = 1; ψk (ψ)dψ = 0 ψ 2 K (ψ)dψ = µ 2 <. Unless otherwise specified (i.e. if the domain of X is bounded with known bounds), the limits of integration are (, ).

Kernel density estimation Histograms Kernel density estimation Let h be the bandwith. Let x be a particular value where we want to estimate f (x). A kernel density estimation is fˆ h (x) = 1 ( ) nh xi x K i h In practice, we need to report a density for the entire domain of x, X, and not only for one particular x. Then we need to specify a grid of values X = [x(1), x(2),..., x(m)], where #X = M for which we are going to estimate the density. Then, a density estimation corresponds to the graph { fˆ h (x(m)), x(m)} M m=1. We could also consider different bandwidths h(x(m)).

Kernel density estimation Mean squared error and asymptotic requirements Histograms Kernel density estimation How to evaluate kernel density estimators? Mean squared error (MSE): MSE (x) = E [( fˆ h (x) f (x)) 2 ] = var( fˆ h (x)) + ( bias( fˆ h (x)) ) 2 Following Pagan&Ullah pp.23-24: bias( ˆ f h (x)) h2 2 f (x)µ 2 var( ˆ f h (x)) f (x) nh K (ψ) 2 dψ Then note that for a consistent estimator of f (x) we have a trade-off between bias and variance: 1 h 0 as n. As the sample size increases, we can make the bins smaller to get a more precise estimate of f (x), i.e. to reduce the bias. 2 However, as the bins become smaller that increases the variance too! Then we also require nh, that is, n grows faster than h decreases.

Kernel density estimation Rate of convergence Histograms Kernel density estimation Since we want to evaluate the density for the entire domain X, then we consider the integrated mean squared error (IMSE): IMSE = MSE (x)dx = f (x) X nh We can now minimize with respect to h, to get ( ) 1/5 ( h opt = K (ψ) 2 dψ µ 2/5 2 K (ψ) 2 dψ + µ 2 h 4 2 f (x)dx 4 X X ) 1/5 f (x)dx n 1/5 That is, h should decrease at rate 1/5 with respect to n. That means that the rate of convergence depends on fˆ h (x) f (x) = O p (n 2/5 ) which means a slower rate of convergence of the (optimal) maximum likelihood estimator of f (x), assuming correct density of O p (n 1/2 ), i.e. n-consistency.

Kernel density estimation Bandwidth selection Histograms Kernel density estimation The most important part in nonparametric density estimation is to select a bandwidth. If h 0, i.e. too small, then there is no smoothing. The density estimator has too many spikes, i.e. one for each observation. If h, i.e. too small, then there is too much smoothing. The density estimator fits the density of the selected kernel. Note that for choosing h opt we need K (ψ) 2 dψ. This depends on the choosen kernal and that it is readily available. µ 2 = ψ 2 K (ψ)dψ. This depends on the choosen kernal and that it is readily available. X f (x)dx. This depends in the unknown f. Bandwidth selection depends on this.

Kernel density estimation Bandwidth selection Histograms Kernel density estimation Rule-of-thumb: Use a standard family of distributions to construct X f (x)dx, i.e. Gaussian. Then X f (x)dx = 3 8 πσ 5. Then, ĥ opt = 1.059ˆσn 1/5, with ˆσ = 1 n i (x i x) 2 Plug-in: use rule-of-thumb to get a prior estimate of f, then compute f and plug-in again. Cross-validation: minimize h by estimating IMSE ( fˆ h ). This is computationally intensive as requires to estimate f leaving each observation out of the sample and using the rest. The rate of convergence is extremely slow and too much volatility. (See Pagan&Ullah pp.50-52.) Use intuition! is an art...

Histograms Kernel density estimation Histograms can be created with the hist command. This can be accessed in www.stata.com/manuals13/rhistogram.pdf Kernel density estimation can be implemented using the kdensity command. This can be accessed in www.stata.com/manuals13/rkdensity.pdf

Local constant conditional mean Marginal effect

Model misspecification Local constant conditional mean Marginal effect Assume that E (y x) = m(x), where m(.) is a continuous and differentiable (non necessarily linear) function of x. Then we can always define y = m(x) + e with E (e x) = 0. What does OLS estimates in this case? Suppose we estimate the model y i = βx i + u i. Then the OLS estimator, ˆβ satisfies β = Cov(y,x) = Cov(m(x)+e,x) = Cov(m(x),x). Var(x) Var(x) Var(x)

Model misspecification Local constant conditional mean Marginal effect Consider the Taylor of m(x), evaluated at x : Then, m(x) = m(x ) + m (x )(x x ) + m (x ) (x x ) 2 +... 2 β = m (x ) + m (x ) ( Cov(x 2, x) 2x Var(x) ) +... 2Var(x) If m(x) = a + bx then β = b. If m(x) = a + bx + cx 2 then β = b + 2cx + c Var(x) as x 2 is an omitted variable) ( Cov(x 2, x) 2x Var(x) ) = b + c Cov(x2,x) Var(x) (same

Local constant conditional mean Local constant conditional mean Marginal effect This estimator is locally averaging those values of y close in terms of x. Consider now the estimation of m(x) E [y x] estimated on x. E [y x] = yf (y x)dy = y f (y, x) g(x) dy = f (x) f (x) where f (y x) is the conditinal pdf of y conditional on x and by definition f (y x) f (y,x) f (x) ; f (y, x) is the joint density of y and x, g(x) yf (y, x)dy.

Local constant conditional mean Nadaraya-Watson kernel regression estimator Local constant conditional mean Marginal effect The Nadaraya-Watson estimator is a weighted average of those y i s that correspond to x i in a neighborhood of x. Consider now the kernel based estimator of the conditional mean. Define ψ i (x) = x i x h where h is the (fixed) bandwidth that weights distances of each x i to the corresponding value of x. Define a kernel K (.). Then we have, ˆm h (x) = i y i K (ψ i ) i K (ψ i )

Local constant conditional mean Nadaraya-Watson kernel regression estimator Local constant conditional mean Marginal effect Note that as h, ˆm h (x) ȳ = n 1 i y i, the unconditional mean of y. What does it mean? For a large bandwidth x lim h ψ i (x) = lim i x h h = 0 and then lim h K (ψ i (x)) = K (0) = max = constant. In this case there is no smoothing based on x. Result: Too much smoothing. Note that as h 0, then ˆm h (x) becomes the nearest neighbor (NN) estimator. What does it mean? For a small bandwidth x lim h 0 ψ i (x) = lim i x h 0 h = except when x i = x. Note that K ( ) = 0 and then, only the x i s equal to x are considered. This is identical to set ˆm 0 (x) = i 1[x i =x]y i i 1[x i =x], that is, for each value of x takes the corresponding vaue of y if there is only one pair (y i, x i = x); it takes the average for all values of y that have x i = x if there are more observations; or takes the closest average of observations with the closest x j = x. Result: No smoothing.

Local constant conditional mean Marginal effect Local constant response - Marginal effect Consider now the marginal effect of x on y. Define, β(x) d m(x) dx = m (x) = f (x)g (x) g(x)f (x) f 2 (x) = g (x) f (x) g(x) f (x) f (x) Then a local kernel estimator of the marginal effect is with ĝ(x) = 1 nh i f ˆ(x) = 1 nh i ˆβ(x) = ĝ (x) f ˆ(x) ĝ(x) f ˆ (x) f ˆ(x) y i K (ψ i ), ĝ (x) = 1 nh 2 y i K (ψ i ) K (ψ i ), f ˆ (x) = 1 nh 2 i i K (ψ i )

Local polynomial linear regression Local constant conditional mean Marginal effect The Nadaraya-Watson estimator can be obtained as ˆm h (x) = argmin a (y i a) 2 K (ψ i ). Now consider an extension (â h (x), ˆb h (x)) = argmin (a,b) (y i a b(x i x)) 2 K (ψ i ) Note that this is a weighted regression estimator, where the weights are given by the kernel. ( âh (x) b h (x) ) [ ( = K (ψ i ) i 1 (x i x) (x i x) (x i x) 2 ) ] 1 ( K (ψ i ) i y i y i (x i x) This can be extended to a higher order polynomial, i.e. y i a b(x i x) c(x i x) 2. In fact, for the estimation of β(x) = d m(x) dx it is better to include a quadratic polynomial to reduce bias. Note that as h, the local linear regression estimator becomes the OLS estimator lim h â h (x) + ˆb h (x) = ˆβ 0 + ˆβ 1 x. )

Some asymptotic properties Assumptions Local constant conditional mean Marginal effect Consider the following assumptions 1 m and f are twice differentiable in a neighborhood of x. f is bounded in a neighborhood of x. x int(x ) [What does it mean? It rules out points with jumps.] 2 The kernel K is a symmetric function satisfying (i) K (ψ)dψ = 1; (ii) ψk (ψ)dψ = 0; (ii) ψ 2 K (ψ)dψ = µ 2 <. [What does it mean? Same properties as for density estimation.] 3 h = h n 0, nh as n. [What does it mean? Same properties as for density estimation.] 4 xs are iid and independent of the error term in the model y = m(x) + u. [What does it mean? Exogeneity assumption.]

Some asymptotic properties Nadaraya-Watson estimator Local constant conditional mean Marginal effect Theorem (Pagan&Ullah p. 101) Under the assumptions above BIAS( ˆm h (x)) = h2 2f µ 2(m f + 2f m ) + O(n 1 h 1 ) + o(h 2 ) V ( ˆm h (x)) = σ2 nhf K 2 (ψ)dψ + o(n 1 h 1 ) Then the optimal bandwidth should satisfy h n n 1/5.

Some asymptotic properties Local linear regression Local constant conditional mean Marginal effect Theorem (Pagan&Ullah p. 105) Under the assumptions above BIAS( ˆm h (x)) = m h 2 µ 2 + O(n 1 h 1 ) + o(h 2 ) 2 V ( ˆm h (x)) = σ2 nhf K 2 (ψ)dψ + o(n 1 h 1 )

The curse of dimentionality Local constant conditional mean Marginal effect The results above can be adjusted for multiple covariates, q. As the number of covariates increases, the rate of convergence deteriorates, and in particular it becomes O p (n 2/(q+4) ). Compare that with OLS models where the rate of convergence is O p (n 1/2) for any q. In particular, h opt n 1/q+4

Local constant conditional mean Marginal effect Nadaraya-Watson and local polynomial kernel regression can be implemented using the lpoly command. This can be accessed in http://www.stata.com/manuals13/rlpoly.pdf

Partially linear models Index models

Partially linear models Index models Partially linear models: y = x β + g(z) + u A semiparametric partially linear model is given by x i is a p 1 vector of covariates; z i is a q 1 vector of covariates; g(.) is an unspecified function; y i = x i β + g(z i ) + u i, i = 1, 2,..., n, u i y i E (y i x i, z i ) then E (u i x i, z i ) = 0 and E (u 2 i x i, z i ) = σ 2 (x i, z i ) (i.e. potentially heteroskedastic). Example: Suppose we are able to assume linearity in some covariates (i.e. x) but we cannot assume it for others (i.e. z).

Partially linear models Index models Partially linear models: y = x β + g(z) + u Robinson s (1988) and Yatchew s (1998) estimators This model avoids the curse of dimensionality if few or a single variable is in z. Separating into linear covariantes and non-linear covariates increase precision of estimates. We have β or ˆβ is n-consistent (usual OLS asymptotics for the linear part).

Partially linear models Index models Partially linear models: y = x β + g(z) + u Robinson s (1988) estimator This model can be estimated by Robinson (1988) estimator. Consider E (y i z i ) = E (x i z i ) β + g(z i ) + E (u i z i ) = E (x i z i ) β + g(z i ) (because E (u z) = 0) Then, substracting from the first equation y i E (y i z i ) = (x i E (x i z i ))β + u i. Denoting ỹ i = y i E (y i z i ) and x i = x i E (x i z i ), we can get. β = [ n i ] 1 n x i x i i However we don t have the conditional expectations E (y z) and E (x z) which will be estimated nonparametrically to get E (y z) and Ê (x z). g(z i ) can then be estimated from ĝ(z i ) = ỹ i x i β. x i ỹ i

Partially linear models Index models Partially linear models: y = x β + g(z) + u Yatchew s (1998) estimator Sort the data according to z, that is, z (1), z (2),..., z (n). Consider the regression in first differences y = x β + g(z) + u If g(.) is smooth, single-valued with bounded first derivative in a compact support, then g(z) 0 as n. Then β can be estimated from a regression of y on x. Obtain ˆβ. [Note that we don t need to know the form of g(.).] g(z) can then be estimated from a nonparametric regression of y x ˆβ on z.

Index models: y = g(x β) + u Partially linear models Index models A semiparametric single index model is given by x i is a q 1 vector of covariates; g(.) is an unspecified function; y i = g(x i β) + u i, i = 1, 2,..., n, Note: this is different from a model where y i = g(x i ) + u i ; u i y i E (y i x i ) then E (u i x i ) = 0 and E (u 2 i x i ) = σ 2 (x i ) (i.e. potentially heteroskedastic).

Index models: y = g(x β) + u Partially linear models Index models Ichimura (1993) method consists on assuming there is one parameter value, β 0, where y i = g(x i β 0) + u i, i = 1, 2,..., n. However we can define E (y x β) for any β. Note that E (y x β) = g(x β) unless β = β 0. Consider a grid of β B, B = [β (1), β (2),..., β (M) ]; For each j = 1, 2,..., M and for each observation i = 1, 2,..., n, estimate nonparametrically ĝ i (x i β (j) ) as a leave-one-out nonparametric kernel estimator of g i (x i β); Choose β as ˆβ = argmin β B n i (y i ĝ i (x i β)) 2

Partially linear models Index models Binary choice semiparametric single index models Suppose now y is a binary variable, i.e. y {0, 1}. The single index model can be applied as an alternative to logit or probit models. This is the Klein and Spady (1993) estimator. Let g(x i β) = Pr[y = 1 x i ]. Then apply Ichimura s method but maximize a quasi-log-likelihood ˆβ = argmax β B n i (1 y i )ln(1 ĝ i (x i β)) + y i ln(ĝ i (x i β)) Note: Compare with probit and logit models where ˆβ = argmax β B n i (1 y i )ln(1 F (x i β)) + y i ln(f (x i β)) where F is either a normal or logistic cdf.

Partially linear models Index models Robinson s (1988) semiparametric partially linear model can be estimated by the command semipar ssc install semipar Yatchew s (1988) semiparametric partially linear model can be estimated by the command plreg sml fits univariate binary-choice models by the semiparametric maximum likelihood estimator of Klein and Spady (1993). http://www.stata-journal.com/sjpdf.html? articlenum=st0144

References Partially linear models Index models This slides are based on Gutierrez, R.G., Linhart, J.M. and Pintblado, J.S. (2003), From the help desk: Local polynomial regression and Stata plugins, Stata Journal, 3(4), 412 419. Ichimura, H. (1993), Semiparametric least squares (SLS) and weighted SLS estimation of single-index models, Journal of Econometrics, 58, 71 120. Klein, R.W. and Spady, R.H. (1993), An efficient semiparametric estimator for binary response models, Econometrica, 61, 387 421. Pagan, A. and Ullah, A. (1999), Nonparametric Econometrics. Cambridge: Cambridge University Press. Racine, J. (2008), Nonparametric Econometrics: A Primer, Foundations and Trends in Econometrics, 3(1), 1-88. Robinson, P. (1988), Root-n-consistent semi-parametric regression, Econometrica, 56, 931 954. Yatchew, A. (1998), Nonparametric regression techniques in economics, Journal of Economic Literature, 36, 669 721.