Nonparametric Econometrics

Size: px

Start display at page:

Download "Nonparametric Econometrics"

Godfrey Preston
5 years ago
Views:

1 Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term / 37

2 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi- and nonparametric extensions Summary and references 2 / 37

3 Introduction Up to now, all regression models relied on functions (or densities) which depend on an unknown finite-dimensional parameter. A finite-dimensional parameter is an element of R q with q N. For example, linear regression models use an additive combination of the covariates (x β). Nonlinear regression models specify a known function of (a linear index of) the covariates. ML theory is based on a assumption about the density of the data, which depends on a finite dimensional parameter. If the functional form or the distributional assumption is wrong, the parameter estimators of these models are inconsistent, however. To circumvent this kind of misspecification problem, nonparametric methods can be used, which do not impose assumptions on some functional form. In the following, nonparametric density and regression estimators will be described, which are the base for most nonparametric econometric models. 3 / 37

4 The Histogram Estimator Consider a sample {X i } n density f X (x). of a random variable X with unknown First, the support of X is divided in K intervals [a j 1, a j ) with min(x ) = a 0 < a 1 <... < a K = max(x ). The histogram estimator for the probability that X takes a value in the interval [a j 1, a j ) is then defined by Pr(X [a j 1, a j )) = 1 n 1{X i [a j 1, a j )}. n This estimator can be viewed as a (crude) approximation of f X at some point x [a j 1, a j ). Furthermore, the histogram estimator is a step-function even if f X is continuous (see the examples of the next slide). A solution is to use smaller intervals [a j 1, a j ), which converge in the limit to a point. This approach is called a local estimation approach, and is the basic idea of the kernel density estimator. 4 / 37

5 The Histogram Estimator The graphs show histograms of the logarithm of wage (from the data set Mroz.dta). The histogram of the top left panel divides the data in 20 classes (the default value of the histogram command of Stata), that of the top right panel uses 30 classes, and those of the bottom left and right panels use 50 and 100 classes, respectively. For the histogram of the bottom right panel, a normal density fitted to the data is added (blue line). Note that the normal density does not seem to fit the data well. 5 / 37

6 The Kernel Density Estimator Consider now the local estimation of a density function. For some intuition of the kernel density estimator, note first that the following holds by the definition of a derivative: f (x) = d dx F (x) = lim h 0 F (x + h) F (x h) 2h The numerator of the last term can be rewritten as F (x + h) F (x h) = Pr(X (x h, x + h)). Define now the following kernel function: { 1/2 if z 1, k(z) = 0 otherwise. The kernel function can be viewed as a weighting function. From this definition of k(z), it follows that ( ) Xi x k = 1 h 2 1{X i (x h, x + h)}. See the next slides for a proof of this claim. 6 / 37

7 The Kernel Density Estimator Consider first the following general results: For some a and ε > 0, a < ε a < ε and a < ε, ε < a < ε. The first claim holds as a a < ε a < ε, a a < ε a < ε. By a < ε ε < a, and by combining the inequalities a < ε and ε < a, the second claim ( ε < a < ε) follows. Next, as h is positive, it holds that X i x h < 1 X i x < h. By the general inequality just derived (with a = X i x and ε = h), it holds that X i x < h h < X i x < h x h < X i < x + h. 7 / 37

8 The Kernel Density Estimator It follows therefore that ( ) { Xi x 1/2 if Xi x h x h < X k = i < x + h, h 0 otherwise, = 1 2 1{X i (x h, x + h)}. Now, recall that for some event A, Pr(A) = E[1{A}], where the expectation can be estimated by Ê[1{A}] = 1 n 1{A i }. n Using these results, the kernel density estimator of f X (x) is given by ˆf (x) = 1 n ( ) Xi x k. nh h The factor h 1 originates from the definition of f (x) as the derivative of the cdf F (x). 8 / 37

9 The Kernel Density Estimator The parameter h is called the bandwidth (or smoothing parameter or window width) of the kernel density estimator. The bandwidth is the most important parameter of the kernel density estimation problem. Note that ˆf (x) is the value of the density of X at some given point x, that is, it is a local estimate. To obtain an estimate of f for each value of the support of X, estimates for several points x supp(x ) are necessary. The kernel used above is called the uniform kernel. Different kernel functions may be used, as long as they have certain properties (which are listed later). Examples for kernel functions are the standard normal density, i.e., k(z) = φ(z), or the Epanechnikov kernel, which is defined as { 3 k(z) = 4 (1 z2 ) if z 1, 0 otherwise. The choice of the kernel function is of minor importance in practice. The choice of h has a much larger impact on the estimation results. 9 / 37

10 The Kernel Density Estimator The graphs show kernel density estimates of the same variable as in the previous example. The top left panel shows the estimates using the default bandwidth of the kdensity command. The graph in the top right panel adds a normal density function (red line) to the default kernel density estimate. The bottom left and right panels use a smaller and a larger bandwidth, respectively (compared to the default value of the kdensity command). 10 / 37

11 The Kernel Density Estimator Consider now the following general properties which have to hold for a kernel function. A kernel function k integrates to one, i.e., k(v)dv = 1, is symmetric, k(v) = k( v), and has a finite second moment, i.e., v 2 k(v)dv = κ 2 <. Note that by the symmetry condition it holds that vk(v)dv = 0. Every kernel function with these properties can be used for a kernel-based local estimation method. 11 / 37

12 The Kernel Density Estimator To proceed, consider first the definition of the mean squared error. ) The mean squared error MSE(ˆθ of an estimator ˆθ of θ is defined as ) [ ) ] [ 2 [ˆθ] [ˆθ] ) ] 2 MSE (ˆθ = E (ˆθ θ = E (ˆθ E + E θ [ ]) 2 [ˆθ]) ( [ˆθ] ) ( [ˆθ] ) ] 2 = E (ˆθ E [ˆθ + 2 (ˆθ E E θ + E θ [ [ˆθ]) ] 2 ( [ˆθ] 2 = E (ˆθ E + E θ), as it holds for the cross-product term that [ˆθ]) ( [ˆθ] )] ( [ˆθ] ) E [(ˆθ E E θ = E θ E [(ˆθ E [ˆθ])] = 0 The MSE can be rewritten as ) ) ( (ˆθ)) 2 MSE (ˆθ = Var (ˆθ + bias, ) [ˆθ] where the bias term is defined by bias (ˆθ = E θ. MSE convergence implies convergence in probability. 12 / 37

13 The Kernel Density Estimator To derive asymptotic properties of the kernel density estimator, the following assumptions are used. The data {X i } n is an iid sample. This is a standard assumption. The true density f (x) is three times differentiable, where the derivatives are denoted by f (s) (x), with s {1, 2, 3}. The derivatives of the density are used for a Taylor series expansion. The kernel function satisfies the assumptions stated previously. By these properties, some elements of the Taylor series expansion are equal to zero. The point x at which the the density is estimated is an interior point of the support of X. For an element at the boundary of the support of X, certain terms of the expansion are unequal to zero, which leads to a larger bias. For n, h 0 and nh. This assumption means that the bandwidth h converges slower to zero than n tends to infinity. As n, observations of a smaller environment of x are used for estimation (local approach). 13 / 37

14 The Kernel Density Estimator To describe the results, the order notation is useful. In the following, let a n and b n denote some sequences, and consider the case of n. Boundedness can be described by the O( )-notation: a n = O(1) a n < C for some positive constant C (and n ). More general, a n = O(b n ) a n b n < C. Convergence to zero is denoted by a n = o(b n ) a n /b n 0. A special case is b n 1, i.e., a n = o(1) a n 0. This notation is used to abbreviate terms which converge faster to zero or to some constant than the other remaining terms. For random variables X n, X n = O p (1) denotes Pr( X n > M) ε for n. X n p 0 is denoted by Xn = o p (1). O p (Y n ) and o p (Y n ) are defined similarly as O(Y n ) and o(y n ). 14 / 37

15 The Kernel Density Estimator Now, consider the MSE of the kernel density estimator at point x, which is given by ) ) ( 2 MSE (ˆf (x) = Var (ˆf (x) + bias (ˆf (x))). First, the bias term is analyzed, which is equal to [ 1 n ( ) ] Xi x E k f (x) = 1 [ ( )] nh h h E X1 x k f (x). h This equality follows by the identical distribution of the sample (which is expressed by setting E[X i ] = E[X 1 ] for all i): [ 1 n ( ) ] Xi x E k = 1 n [ ( )] Xi x E k nh h nh h = 1 n [ ( )] X1 x E k = 1 [ ( )] nh h nh ne X1 x k h = 1 ( )] [k h E X1 x. h 15 / 37

16 The Kernel Density Estimator Next, by the definition of the expectation, the bias can be written as [ ( )] 1 h E X1 x k f (x) = 1 ( ) x1 x k f (x 1 )dx 1 f (x) h h h Consider now the following change of the integration variable: x 1 x h = v x 1 = x + vh. Using this, the bias can be rewritten as follows (see the next slide for a derivation): 1 k(v)f (x + vh)hdv f (x) = k(v)f (x + vh)dv f (x). h Note that by this reformulation, the properties of the kernel function can be used to derive the properties of the bias term. Furthermore, a suitable Taylor expansion of f (x + vh) leads to an expression which is interpretable using the basic assumptions stated above. 16 / 37

17 The Kernel Density Estimator Consider now the following equality just stated: ( ) 1 x1 x k f (x 1 )dx 1 = 1 k(v)f (x + vh)hdv. h h h To see this, the formula for integration by substitution is used: For suitable functions w and g it holds that g(b) g(a) w(x)dx = b a w(g(t))g (t)dt. For the problem at hand, w(x 1 ) = k((x 1 x)/h)f (x 1 ) and g(v) = x + vh (= x 1 ). Let Ī = and I = be the upper and lower bounds of the integral. Therefore, Ī = g(b) and hence b = g 1 (Ī ) = (Ī x)/h. For Ī = (and similarly for I = ), b = g 1 (Ī ) =. For finite Ī and/or I, boundaries may change due to the substitution. Using g (v) = h, the equality claimed above follows. 17 / 37

18 The Kernel Density Estimator A Taylor expansion of a m-times differentiable function g(x) at some point x 0 is given by g(x) = g(x 0 ) + g (1) (x 0 )(x x 0 ) + 1 2! g (2) (x 0 )(x x 0 ) m! g (m) (ξ)(x x 0 ) m, where g (j) (x 0 ) = j g(x)/( x) j x=x0 and ξ lies between x and x 0. Now, consider a Taylor expansion of the density f (x + vh) at the point x: f (x + vh) = f (x) + f (1) (x)(x + vh x) f (2) (x)(x + vh x) = f (x) + f (1) (x)vh f (2) (x)v 2 h For the next term in the expansion it holds that 1 3! f (3) (x)v 3 h 3 = O(h 3 ), as f (3) (x) and v 3 are bounded (by assumption). 18 / 37

19 The Kernel Density Estimator Now insert f (x + vh) = f (x) + f (1) (x)vh f (2) (x)v 2 h 2 + O(h 3 ) into the bias expression: k(v)f (x + vh)dv f (x) ( = k(v) f (x) + f (1) (x)vh + 1 ) 2 f (2) (x)v 2 h 2 + O(h 3 ) By using this expansion, the term f (x + vh) can be expressed by terms involving x alone, which enables the computation of the integral with respect to v. dv f (x). The bias is therefore equal to k(v)f (x + vh)dv f (x) = f (x) k(v)dv + f (1) (x)h vk(v)dv f (2) (x)h 2 v 2 k(v)dv + O(h 3 ) f (x) 19 / 37

20 The Kernel Density Estimator As it holds by the properties of a kernel function that k(v)dv = 1 and vk(v)dv = 0, the bias is equal to ) bias (ˆf (x) = h2 2 f (2) (x) v 2 k(v)dv + O(h 3 ). A similar expression can be derived for the variance of the kernel density estimator at point x. Combining these results, the MSE of the kernel density estimator is given by ) ( ) ( MSE (ˆf (x) = h4 2 κ 2 f (2) κf (x) (x) + + o h ), 4 nh nh where κ = k 2 (v)dv and κ 2 = v 2 k(v)dv. From this result pointwise consistency of the kernel density estimator follows (i.e., for a given point x). 20 / 37

21 The Kernel Density Estimator To compute the kernel density estimator, a value of h is needed. It can be shown that the MSE of ˆf (x) is minimized by the following bandwidth: ( ) 1/5 κf (x) h opt = n 1/5. (κ 2 f (2) (x)) 2 This expression depends on κ and κ 2, which can be computed by knowledge of the used kernel function, and on the true density function f (x) and its second derivative, which are unknown. One simple possibility of this problem is to use a pilot bandwidth h pilot to estimate f (x) and f (2) (x) nonparametrically. Of course, this bandwidth needs also to be chosen. A solution of this is to assume (for deriving a pilot bandwidth) that f (x) is a normally distributed with variance σ 2. Then one can derive h pilot 1.06σn 1/5 (Silverman s rule of thumb). This expression can be used to estimate the expressions needed to derive h opt. Often h pilot is used directly as h opt. Various other methods for bandwidth choice exist. 21 / 37

22 The Kernel Density Estimator Kernel density estimators can also be used to estimate multivariate densities. Given a sample {X i } n, where X i R q and q > 1, the density f (x) = f (x 1, x 2,..., x q ) can be estimated by 1 n ( ) Xi x ˆf (x) = K nh 1... h q h 1 n ( ) ( ) Xi1 x 1 Xiq x q = k k. nh 1... h q h 1 h q where K( ) is called a product kernel, and the functions k( ) are univariate kernels (as used previously). Under some assumptions, it can be shown that ) q nh1... h q (ˆf (x) f (x) κ2 hs 2 f ss (x) 2 s=1 Here, f ss is some derivative of the density f. Note that an asymptotic bias term occurs here. d N (0, κ q f (x)). 22 / 37

23 The Kernel Density Estimator ˆf (x) is (for univariate and multivariate estimators alike) a consistent estimator of f (x). From the expression of the asymptotic distribution it follows that the speed of convergence of nonparametric kernel density estimates is much smaller than for parametric estimators, and decreases with an increasing number of variables, i.e., with increasing q. To understand the concept of convergence speed, consider some parametric estimator ˆθ, for which ˆθ p θ 0 (that is, ˆθ θ 0 = o P (1)). Multiplying this difference by n leads to an expression which does not tend to infinity or to zero, but to a random variable. n is called the speed of convergence of the estimator. As can be seen from the expression of the previous slide, the kernel density estimator has a much slower speed, as nh 1... h q (or nh in the univariate case) is smaller than n, as the bandwidths converge to zero. In practice, this means that nonparametric methods should only be applied if large samples are available. 23 / 37

24 Nonparametric Regression Consider the general regression model y i = g(x i ) + ε i, where g(x) = E[y X = x], and no assumptions on the form of g(x) are imposed. The conditional expectation of y is defined by E[y X = x] = yf Y X (y x)dy = y f Y,X (y, x) dy = f X (x) where the second equality follows by Bayes law. The denominator f X (x) can be estimated by a nonparametric density estimator. Consider now the following kernel estimator of f y,x (y, x): 1 n ( ) ( ) Xi x yi y ˆf Y,X (y, x) = K k, nh 0 h 1... h q h h 0 yfy,x (y, x)dy f X (x) where K( ) is the product kernel for multivariate covariates defined earlier., 24 / 37

25 Nonparametric Regression Consider now the numerator of the estimator of g(x), i.e., consider yfy,x (y, x)dy with f Y,X replaced by the estimator ˆf Y,X just defined: yˆf Y,X (y, x)dy = = 1 n ( ) ( ) Xi x yi y y K k dy nh 0 h 1... h q h h 0 1 n ( ) ( ) Xi x yi y K yk dy. nh 0 h 1... h q h h 0 By a change of variables ((y i y)/h 0 = v y = y i vh 0 ), the right hand side of this expression can be rewritten as 1 n ( ) Xi x K (y i vh 0 )k(v)h 0 dv. nh 0 h 1... h q h As k(v)dv = 1 and vk(v)dv = 0, this simplifies to 1 n ( ) Xi x K y i. nh 1... h q h 25 / 37

26 Nonparametric Regression Using this result, the estimator of g(x) can be rewritten as: n yˆfy,x (y, x)dy ĝ(x) = = K ( X i ) x h yi n ˆf X (x) K ( X i ). x h This estimator is called local constant regression or Nadaraya- Watson estimator. Note that ĝ(x) is a weighted average of y: n ĝ(x) = ω i y i where the weights ω i are defined by K ( X i ) x h ω i = ( ), n j=1 K Xj x h and have the properties ω i 0 and n ω i = 1. The nonparametric regression estimator can be viewed as a weighted average of y, where the weights depend on the distance of the covariates X i to the point x at which ĝ(x) is estimated. 26 / 37

27 Nonparametric Regression Under some assumptions, the following holds: ( ) q nh1... h q ĝ(x) g(x) hs 2 B s (x) s=1 Again, an asymptotic bias term occurs. d N ( 0, κq σ 2 ) (x). f (x) The bandwidth parameter(s) can be determined by several methods, for example by the plug-in approach or by cross-validation. ĝ(x) can be expressed as the solution of an optimization problem: n ( ) ĝ(x) = arg min (y i µ) 2 Xi x K. µ h That is, the nonparametric regression estimate at point x is the constant of a weighted OLS regression without further covariates. The covariates appear only in the kernel function, i.e., they are only used for computing the weights K( ). Better asymptotic properties may be obtained by more general models, which will be discussed next. 27 / 37

28 Nonparametric Regression Consider as a first extension of the local constant regression model the local linear regression model. Here, a local linear instead of a local constant function is used to approximate the unknown g(x). The local linear regression at a point x is determined by the following optimization problem: ( ) â(x) n ( ˆδ(x) = = arg min yi a (X ˆb(x) i x) b ) ( 2 Xi x k a,b h To derive a closed form expression of the estimators, define the following [n n] and [n (k + 1)] dimensional matrices: ( ( ) ( )) X1 x Xn x K x = diag k,..., k, h h X x = (1, (X i x) ),...,n. ). The optimization problem can be restated as ˆδ(x) = arg min δ(x) (y X xδ(x)) K x (y X x δ(x)). 28 / 37

29 Nonparametric Regression The solution can be expressed explicitely as ˆδ(x) = (X xk x X x ) 1 X xk x y. The estimate ĝ(x) of the mean function at point x is equal to â(x). A further generalization is to approximate g(x) locally by a polynomial of order p. For univariate X, the objective function of the local parameters δ(x) = (δ 0 (x),..., δ p (x)) is given by n ( ( ˆδ(x) = arg min yi δ 0 (X i x)δ 1... (X i x) p ) 2k Xi x δ p δ h In this case, the covariate matrix X x is defined as: 1 X 1 x (X 1 x) 2... (X 1 x) p 1 X 2 x (X 2 x) 2... (X 2 x) p X x = X n x (X n x) 2... (X n x) p. ). 29 / 37

30 Nonparametric Regression With K x as defined above, the parameter estimators are given by ˆδ(x) = ( X xk x X x ) 1X x K x y. Again, ĝ(x) is equal to the (weighted) constant of the polynomial expression above, i.e., ĝ(x) = ˆδ 0 (x), where δ 0 (x) is the first element of the vector δ(x). The additional parameters ˆδ s for s = 1,..., p are estimators of the derivatives of the regression function, i.e., δ s (x) = 1 s! s g(x) x s. In fact, the polynomial expression used above corresponds to a Taylor approximation of g(x), which explains the factor 1 s!. When estimating the s th derivative of g(x), p s should be odd to lower the bias of the estimate. The advantage of local polynomial regression models are smaller bias terms compared to local constant regressions. Extensions for multivariate covariates exist, which have a somewhat involved notation. 30 / 37

31 Nonparametric Regression For all nonparametric methods presented here (kernel density and local regression estimators), it is assumed that the covariates are continuous. Therefore, indicator variables and discrete ordered variables can not be used directly for nonparametric regression. A possibility to use multivariate nonparametric estimators in the presence of discrete variables is to divide the data set into cells of different values of the discrete variables and to compute the nonparametric estimators using the remaining continuous variables. As an example, consider the regression estimate for a dependent variable y and two covariates x 1 and x 2. If x 1 and x 2 are continuous, ˆm(y x 1, x 2 ) is a function with two (continuous) arguments. Assume now that x 2 is an indicator variable. The procedure above computes two regressions ˆm(y x 1, x 2 = 1) and ˆm(y x 1, x 2 = 0). The problem of this approach is that the number of observations for each cell can be quite low when there are several discrete variables. If the data set contains only discrete variables, the nonparametric regressions correspond to the cell means of y. 31 / 37

32 Semi- and Nonparametric Extensions The convergence rate of nonparametric regression estimators decreases with an increasing number of regressors (curse of dimensionality). A solution of this problem is to impose some further structure. One possibility for such restrictions are additive models, which model the conditional mean of the dependent variable by a sum of univariate unknown functions of the covariates. A second class of models which circumvent the curse of dimensionality are semiparametric models, which assume a parametric specification for a part of the model. Consider first the basic additive nonparametric model: E[y x] = g 1 (x 1 ) + g 2 (x 2 ) g k (x k ). Here, the rate of convergence is equal to that of an univariate nonparametric regression, which is faster as in the case of a general multivariate nonparametric model. The unknown functions g j (x j ) can be estimated by a backfitting algorithm, for example. 32 / 37

33 Semi- and Nonparametric Extensions A second possibility to circumvent the curse of dimensionality is to impose some parametric structure for a part of the model. One example of this model class is the partial linear model, which uses an unknown function g(x 2 ) to specify the true model as y = x 1β + g(x 2 ) + ε. To estimate the model, consider first the expectation of y given x 2 : E[y x 2 ] = E[x 1 x 2 ] β + g(x 2 ). Now, the following difference does not contain g(x 2 ) and can be estimated by OLS to obtain an estimator of β: ( y Ê[y x 2] = x 1 Ê[x 1 x 2 ]) β + ε. The nonparametric part can be estimated by ĝ(x 2 ) = Ê[y x 2] Ê[x 1 x 2 ] ˆβ. A further example is the single-index model, which uses a linear index of X and an unknown function g( ) to specify the conditional mean as E[y x] = g(x β). 33 / 37

34 Review and Definitions Parametric models depend on a finite dimensional parameter, which is an element of R q for q N. Nonparametric models are based on one (or several) parameter(s) of infinite dimension, that is, the parameter(s) cannot be represented by a finite dimensional value, i.e., by a scalar (or a vector thereof). The parameter of a nonparametric model is a function, like a conditional expectation (i.e., a regression) or a probability density. Nonparametric models weaken functional form assumptions (like the linearity assumptions of the OLS model). Nonlinear models can also be estimated by OLS or by NLS, however. The difference to nonparametric models is that the structure of the nonlinearity is given by the parametric form, i.e., by assumption. A broad class of nonparametric models (all of those presented here) are based on a local estimation approach, where the estimation is based mainly on observations near to the point of interest. There are also nonparametric models which are not based on a local approach (e.g., the Nelson-Aalen or Kaplan-Meier estimators). 34 / 37

35 Review and Definitions Semiparametric estimators depend on finite as well as on infinite dimensional parameters. Examples are the partial linear and singles index models presented previously. Two-step estimators with nonparametric first-step estimates can also be viewed as semiparametric estimators. The parametric part of these estimators is the (scalar) second-step estimator, which is usually an average of the nonparametric first-step estimates evaluated for all observations of the sample. Examples are nonparametric matching or reweighting evaluation estimators (see the following lecture on evalution methods). These estimators are averages of nonparametrically estimated functions. Nevertheless, they are n-consistent, as they have the structure of an U-statistic. For more on U-statistics, see Pagan and Ullah, Nonparametric Econometrics, Cambridge University Press 1999, p. 358, or Powell, Stock, and Stoker (Econometrica, 1989). 35 / 37

36 Summary Parametric specifications of regression equations and density functions may lead to inconsistent estimates. Nonparametric models circumvent the need for distributional or functional form assumptions. A large number of nonparametric methods are based on the local estimation approach. Density functions can be estimated by kernel density estimators. A basic local nonparametric regression estimator is the local constant (or Nadaraya-Watson) regression estimator. The local linear and local polynomial regression models are generalizations of the Nadaraya-Watson regression estimator and improve the asymptotic properties. Several semi- and nonparametric extensions of the basic models were proposed to address the curse of dimensionality. 36 / 37

37 Basic and Additional References Basic references: Cameron and Trivedi (2005), ch. 9, Cameron and Trivedi (2009), sec General textbooks on nonparametric methods: Q. Li and J. S. Racine, Nonparametric Econometrics, Princeton University Press, A. Pagan and A. Ullah, Nonparametric Econometrics, Cambridge University Press, D. Ruppert, M.P. Wand, and R.J. Carroll, Semiparametric Regression, Cambridge University Press, A. Yatchew, Semiparametric Regression for the Applied Econometrician, Cambridge University Press, Survey articles: J. DiNardo and J. L. Tobias, Nonparametric Density and Regression Estimation, Journal of Economic Perspectives, vol. 15(4), Fall 2001, p A. Yatchew, Nonparametric Regression Techniques in Economics, Journal of Economic Literature, Vol. 36 (1998), p / 37

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity