An introduction to nonparametric and semi-parametric econometric methods

Size: px
Start display at page:

Download "An introduction to nonparametric and semi-parametric econometric methods"

Transcription

1 An introduction to nonparametric and semi-parametric econometric methods Robert Breunig Australian National University March 1,

2 Outline 1. Introduction 2. Density Estimation (a) Kernel techniques (b) Bandwidth Selection (c) Estimating derivatives of densities (d) Non-kernel techniques 3. Conditional Mean Estimation 4. Semi-parametric estimation (a) Robinson s method (b) Differencing (c) Binary Choice models (d) Mixed categorical and continuous variables 1

3 Objectives 1. Introduce nonparametric and semiparametric techniques 2. Introduce some of the key issues in the literature 3. Introduce several key tools and techniques 4. Provide examples of the use of techniques 5. Provide reference literature so that interested students can pursue these techniques in their applied work 2

4 Objects of Interest All statistical objects studied by applied econometricians may be expressed as functions of unknown distributions. Measurement of inequality F a (x) F b (x) = x f a (t)dt x f b (t)dt Regression modelling m(x) =E[Y x] = y f(y, x) f 1 (x) dy 3

5 Measurement of response β(x) = de[y x] dx = [ ] d y f(y,x) f 1 (x) dy dx Market risk σ 2 (x) = (y E[Y x]) 2 f(y, x) f 1 (x) dy Discrete choice Prob[Y =1 x] = f(y, x) f 1 (x) >.5 4

6 Parametric Models Parametric econometric methods require the prior specification of the functional form of the object being estimated. For example, one might assume that the conditional mean function is linear f(y, x) m(x) =E[Y x] = y f 1 (x) dy = β 0 + β 1 x This specification implies a constant response [ ] de[y x] d y f(y,x) f 1 (x) dy β(x) = = = β 0 + β 1 x dx dx dx = β 1 5

7 Parametric Methods: Drawbacks Parametric models impose a priori structure on the underlying DGP. Having assumed that this structure is known, we then estimate a handful of unknown parameters. Choice of models is frequently not based upon any attempt to select the correct parametric specification from the space of admissible models. Rather, model selection is usually made on the bases of tractability and ease of interpretation. The risk is that inference, prediction, and policy are all based upon an incorrectly specified parametric model. The consequences of such mis-specification are well known. 6

8 Nonparametric Methods Nonparametric estimators estimate objects of interest to economists by replacing unknown densities and distribution functions with their nonparametric density estimators. They are consistent under less restrictive assumptions than those underlying their parametric counterparts. When there is sufficient data, these estimators frequently reveal features of the data that are invisible under parametric techniques. Different features and structures revealed by nonparametric estimators often lead to different conclusions and policy prescriptions than those based upon parametric methods. 7

9 Four uses of nonparametric 1. Visualizing the data methods 2. Testing and comparing models 3. Conditional Mean Estimation (regression) 4. Combining parametric and nonparametric methods (semi-parametric estimation) 8

10 Basic building block: Nonparametric Kernel Density Estimator f(x) = 1 nh n K i=1 ( ) xi x h 9

11 We would like to estimate the density, f(x), from a sample x 1,x 2,..., x n Histogram Naive Non-parametric/Local Histogram f I (x) = 1 nh n I i=1 ( 1 2 < x i x h < 1 2 ) = n nh n is the number of points which lie between x h 2 and x + h 2. h will determine smoothness of estimate 10

12 Replace Indicator function with smooth weighting function called a Kernel where K (ψ) dψ =1 f(y) = 1 nh n K i=1 ( ) yi y h K (ψ) large for small ψ K (ψ) small for large ψ K () should be symmetric. 11

13 A large class of functions satisfy these assumptions, for example (i) Standard Normal: K(ψ) =(2π) 1/2 exp [ 12 ] (ψ)2 (ii) Uniform: K(ψ) =(2c) 1, c <ψ<cand 0 otherwise. (iii) Epanechnikov (1969) [optimal kernel]: K 0 (ψ) = 3 4 ( 1 ψ 2 ), ψ 1 = 0 otherwise 12

14 In order to implement this estimator, we have to make two choices. 1. Kernel (weight) function: K 2. The smoothing parameter (bandwidth): h It turns out that the choice of kernel does not have much effect on the optimality of the estimator, but that the choice of bandwidth (or window width) has important repercussions for our results. 13

15 Bandwidth selection methods 1. Plug-in methods 2. Likelihood cross-validation 3. Least-squares cross-validation 14

16 All of these methods begin from the same starting point, which is that the bandwidth, h, should be chosen so that the estimated density, f(x) is as close as possible to the true density, f(x). Most of the time we employ some kind of global criteria. The most common is the integrated squared error (ISE) ( f(x) f(x) ) 2 (ISE) or its expected value, the mean integrated squared error (MISE) ( ) 2 E f(x) f(x) (MISE) These two quantities correspond to loss and risk. 15

17 For independent and identically distributed (i.i.d.) data, it is straightforward to show that Bias( ˆf) =E ˆf f = K(ψ)[f(hψ + x) f(x)] dψ and V ( ˆf) =(nh) 1 K 2 (ψ)f(hψ + x)dψ n 1 [ ] 2 K(ψ)f(hψ + x)dψ 16

18 The expressions for exact bias and variance are not useful without knowledge of the quantity that we are attempting to estimate the true underlying density. We can, however, derive approximations to these quantities by expanding f(hψ + x) by Taylor series expansion, for small h. f(hψ + x) =f(x)+hψf (1) (x)+ h2 2 ψ2 f (2) (x)

19 Given i.i.d. assumption above, the assumptions made regarding the kernel function, and the following additional assumptions (A3) The second order derivatives of f are continuous and bounded in some neighborhood of x. (A4) h = h n 0 as n. (A5) nh n as n. we can show that upto O(h 2 ) the bias is given by Bias ( ˆf) = h2 2 μ 2f (2) (x) and upto O(nh) 1, the variance is given by V ( ˆf) =(nh) 1 f(x) K 2 (ψ)dψ, 18

20 [ ( f)) ] 2 MISE = Bias( + Var( f) dx The approximate MISE, using the above expressions, is ( ) AMISE = h4 2 4 μ2 2 f (x) dx +(nh) 1 f(x)dx K 2 (ψ)dψ = 1 4 λ 1h 4 + λ 2 (nh) 1 where λ 1 = μ 2 2 ( f (2) (x)) 2 dx, λ2 = K 2 (ψ)dψ. (1) 19

21 The optimal window width, in the sense that the approximate integrated mean squared error is minimized, will be h = cn

22 Assuming a normal kernel and a normal density, f(x), both λ 1 and λ 2 can be evaluated numerically. This provides h =1.06σ x n 1 5 Software packages which implement nonparametric density estimation (SAS, Shazam, STATA) use this as the default window width. For non-normal distributions it works well as a first approximation. It also can provide a good starting point for data driven methods of bandwidth selection (see more below.) It is by far the most commonly used window width in the literature. 21

23 Silverman (1986) provides several other alternatives which work well for heavily skewed data or multi-modal data. A simple improvement is to replace σ by a robust estimator of spread and he specifies two alternatives that seem to work well; h =.79 Rn 1/5 h =.9 An 1/5, where R is the inter-quartile range and A =min(σ, (R/1.34)). 22

24 Least Squares Cross-validation This is essentially a data-driven technique of choosing the optimal bandwidth. The idea is to minimize a particular criteria function. In least squares cross-validation the function minimized is ISE(h) = = ( ) 2 ˆf(x) f(x) dx ˆf 2 dx + f 2 dx 2 ˆffdx, Since f 2 dx does not depend upon f, the function that people minimize in practice is actually f 2 dx 2 ffdx. 23

25 Further manipulation yields ISE (h) =n 2 h 1 n i=1 n j=1 K K( x i x j h ) 2n 1 n i=1 ˆf i (x i ). as the function that is actually minimized. n 1 n i=1 ˆf i (x i )istheleave one out estimator whichisformedasa standard kernel density estimator, omitting the ith observation. This provides an unbiased estimate of ˆffdx. Most programs actually implement the leave-one out estimator as the actual density estimate since it minimizes the influence of solitary observations. 24

26 Likelihood Cross-validation The basic idea behind this method is to choose an h which maximizes the likelihood log L = n log f(x i ). An estimated log likelihood or i=1 pseudo log likelihood can be written as log L = n log ˆf(x i )=log L(h), i=1 where ˆf(x i ) is a density estimator of f and it depends on h. Maximizing log L with respect to h produces a trivial maximum at h = 0. To overcome this problem, the cross validation principle might be adopted, in which ˆf(x i ) is replaced by ˆf i (x). 25

27 This leave one out version of the estimator can be written as ˆf i (x i )=((n 1)h) 1 n j i j=1 K ( ) xj x i. h Thus the likelihood CV principle is to choose h such that log L(h) = n log ˆf i (x i ) is a maximum. The procedure is also i=1 known as Kullback-Leibler cross validation in the sense that it gives an h for which the Kullback-Leibler distance measure between two densities f and ˆf, I(f, ˆf), is a minimum, where I(f, ˆf) = f(x)log { } f(x) dx; ˆf(x) see Hall (1987). 26

28 A disadvantage of the h obtained by likelihood CV is that it can be severely affected by the tail behavior of f. Furthermore, Hall (1987) has indicated that selecting h by minimizing the Kulback-Leibler measure may be useful for the statistical discrimination problem but not for curve estimation. Thus the likelihood CV procedure has not proven to be of much current interest in the literature. 27

29 Other density estimation techniques Nearest Neighbor Density Estimation Let d(x 1,x) represent the distance of point x 1 from the point x, and for each x denote d k (x) as the distance of x from its k th nearest neighbor (k NN)amongx 1,..., x n. Then, taking h =2d k (x), the estimator can be written as ˆf k NN (x) = #(x 1...x n )in[(x d k (x)), (x+d k (x))] 2nd k (x) = k 2nd k (x) = 1 n I 2nd k (x) i=1 ( ) xi x 2d k (x). The degree of smoothing is controlled by an integer k, typically k n 1/2. 28

30 Series estimation Suppose X is a random variable with density f on the unit interval [0,1]. Under these circumstances it can be expressed as the Fourier series f(x) = a j ζ j (x), j=0 where, for each j 0, the coefficients a j = 1 0 f(x)ζ j (x)dx = Eζ j (x), and the sequence ζ j (x) isgivenbyζ 0 (x) = 1, and ζ j (x) = 2cos π(j +1)x when j is odd and 2sin πjx when j is even. 29

31 Using â j = n 1 n i=1 ζ j(x i ) as an estimator of a j, the orthogonal series estimator is defined as ˆf(x) = m â j ζ j (x), j=0 where m is the cutoff point in the infinite sum and determines the amount of smoothing. The regression analog of this is to express the conditional mean of y as an infinite polynomial in x. 30

32 Variable window width estimators Another option is to let the window width vary with each point in the data according to some rule. The estimator will then have the form ˆf vww (x) = 1 n n i=1 1 h ni K ( ) xi x, In general, the rule should allow larger h in regions where there are few observations and smaller h in those where observations are densely located. h ni 31

33 Penalized likelihood estimators Local Log-Likelihood estimators Both of these techniques treat f(x) as an unknown parameter and try to employ likelihood methods to estimate the unknown quantity. The global likelihood has no finite maximum over the class of all densities, so options are to instead maximize a penalized likelihood function (which imposes some pre-determined amount of smoothness on the function) or the local, kernel-weighted, log-likelihood. 32

34 Example 1 Eruption Length of Old Faithful Geyser in Yellowstone National Park 3 files oldfaithful.pdf Example 2 Hamilton Lin (1996) model of excess stock returns from Standard and Poor 500 Example 3 Ait-Sahalia (1996) nonparametric test of interest rate diffusion models 33

35 Multivariate Density Estimation Consider a bivariate distribution where the ith sample observation is given by (y i,x i )andz =(y, x) isafixedpoint. This can be estimated nonparametrically by ˆf(y, x) = ˆf(z) = 1 nh 2 n i=1 K 1 ( zi z h ), 34

36 The kernel estimator of the marginal density f 1 (x) ofx is ˆf 1 (x) = ˆf(y, x)dy = 1 nh 2 n i=1 = 1 nh i=1 K1 ( yi y h n K ( x i ) x h, x i x h ) dy where K(x) = K 1 (y, x)dy is such that K(x)dx = 1. The estimator of the conditional density of Y given X canthenbewrittenas ˆf(y x) = ˆf(y, x) ˆf 1 (x). 35

37 In general, for a multivariate density estimation problem of dimension d, theoptimalh which minimizes the approximate MISE can be found by substituting nh d for nh in the MISE expression given earlier and minimizing with respect to h. It is easy to show that h = cn 1/(4+d), and, for this h, AMISE =0(n 4/(4+d) ). When the kernel is multivariate standard normal, c = {4/(2d +1)} 1/(d+4). 36

38 Curse of dimensionality It is clear from this result that the higher the dimension q +1,the slower will be the speed of convergence of ˆf to f. Thus one may need a large data size to estimate the multivariate density in high dimensions. 37

39 Multivariate Kernels standard multivariate normal density, where d =dim(ψ) K(ψ) =(2π) d/2 exp( 1 2 ψ ψ) multivariate Epanechnikov kernel K c (ψ) =.5 c 1 d (d + 2)(1 ψ ψ) if ψ ψ<1 and equaling 0 otherwise, where c d is the volume of the unit d-dimensional sphere (c 1 =2, c 2 = π, c 3 =4π/3). 38

40 One disadvantage with direct application of the kernels above is that the variables may exhibit disparate variation. To overcome this problem it is good practice to work with standardized data, i.e., normalized by the standard deviation or some measure of scale. Then each of the elements in ψ will have unit variance and application of a kernel such as the multivariate standard normal is appropriate. 39

41 Conditional mean estimation Consider q +1=p economic variables (Y, X )wherey is the dependent variable and X is a (q 1) vector of regressors; these p variables are taken to be completely characterized by their unknown joint density f(y, x 1,...,x q )=f(y, x), at the points y, x. As noted in the introduction interest frequently centres upon the conditional mean m(x) =E(Y X = x), where x is some fixed value of X. Now suppose that we have n data points (y i,x i ). By definition, Y i = E(Y i X i = x i )+u i = m(x i )+u i where the error term u i has the properties E(u i x i )=0and E(u 2 i x i )=σ 2 (x i ). 40

42 Parametric Estimation Parametric methods specify a form of m(x i ). In the case of a linear specification y i = α + x i β + u i. The least squares estimators of α and β are α =ȳ xβ and β = ( n i=1 (x i x) 2) 1 ( n i=1 (x i x)y i ). The best unbiased parametric estimator of m(x) =α + xβ is m (x) =α + xβ = n a ni (x)y i (2) i=1 where a ni (x) =n 1 +(x x)(x i x) ( n i=1 (x i x) 2) 1.Them in (2) is the weighted sum of y i, where the weights a ni are linear in x, and depend on the distance of x from x. 41

43 The assumption that m(x i )=α + x i β implies certain assumption about the data generating process (joint density). For example, if (y i,x i ) is a bivariate normal density then it can be shown that the mean of the conditional density of y i given x i is, E(y i x i )=α + x i β, where α = Ey i (Ex i )β and β =(var(x i )) 1 cov {(x i,y i )}. This implies that the assumption of linear specification for m(x) holdsif the data comes from the normal distribution. However, if the true distribution is not normal then the linear specification for the conditional expectation may be invalid, and so the least squares estimator of m(x) will become biased and inconsistent. 42

44 For example suppose the true relationship is y i = α + x i β + x 2 i γ + u i then the parameter of interest is β +2γx i = y i / x i. However, if a linear approximation is taken, y i / x i is being estimated under the false restriction that γ = 0. Typically, the exact functional form connecting m(x) withx is unknown. Because of the possibility that forcing the function to be linear or quadratic may affect the accuracy of estimation of m(x), it is worthwhile considering nonparametric estimation of the unknown function, and this task is taken up in the following sections. 43

45 Kernel-Based Estimation Suppose that the x i are i.i.d. random variables. Because m(x i )isthe mean of the conditional density f(y i x i )=f(y X = x i ), there is a potential to employ the methods of density estimation seen earlier. By definition the conditional mean is m = where f 1 (x) is the marginal density of X at x. (yf(y, x)/f 1 (x)) dy, (3) Nadaraya (1964) and Watson (1964) therefore proposed that m be estimated by replacing f(y, x) by ˆf(y, x) andf 1 (x) by ˆf 1 (x), where these density estimators were the kernel estimators discussed above. 44

46 The expressions for ˆf(y, x), ˆf1 (x) from the first part of this talk may be substituted into (3) to give ˆm = y [ (nh p ) 1 (nh q ) 1 n i=1 K ( yi y 1 h n i=1 K ( x i ) x h, x i x h )] dy, (4) where p = q +1andh is the window width. Some simplification yields ˆm = [ (nh q ) 1 n i=1 y i K ( ) ] [ xi x / (nh q ) 1 h n i=1 K ( ) ] xi x h = n K i=1 ( ) xi x y i / h n K i=1 ( ) xi x, h 45

47 A feature of the Nadaraya-Watson estimator is that it is a weighted sum of those y i s that correspond to x i in a neighborhood of x. The weights are low for x i s far away from x and high for x i s closer to x. With this motivation, a general class of nonparametric estimators of m(x) can be written as m = m(x) = n w ni (x)y i i=1 where w ni (x) =w n (x i,x) represents the weight assigned to the i th observation y i, and it depends on the distance of x i from the point x. Note that the parametric estimator m(x) in (2) is a special case with linear weights w ni (x) =a ni (x) such that w ni (x) = 1, but w ni (x) 0 is not necessarily true. 46

48 An implicit assumption in nonparametric estimation is that m(x) is smooth over x, implying that y i contains information about m(x) whenever x i is near to x. The estimator m(x) isasmoothed estimator in the sense that it is constructed, at every point, by local averaging of the observations y i s corresponding to those x i s close to x in some sense. In parametric regression, a functional form is specified for the conditional mean m(x). This functional form, say m(x, β), depends on a finite number of unknown parameters β. The least squares estimate of m = m(x) ism(x, ˆβ), where ˆβ is chosen to minimize n i=1 ( y i m(x i, ˆβ)) 2. (5) 47

49 Compare (5) with the following weighted least squares criterion for the nonparametric estimation of m(x) : n wni(x)[y i m(x)] 2. (6) i=1 In (6), m(x) replaces the m(x, β) that appears in (5). If m(x) is regarded as a single unknown parameter m, it may be estimated by minimizing n wni(x)[y i m] 2. (7) i=1 The resulting estimate, m, ofm(x) is precisely the Nadaraya-Watson estimator. Thus the kernel estimator ˆm is also a least squares estimator, with w ni (x) =K ((x i x)/h). 48

50 One might also think of m(x) as a method of moments estimator. Since E(u i x i )=0, or Ew ni(x)(y i m(x i )) = 0 (8) = E [w ni(x)(y i m)+w ni(x)(m m(x i ))] = 0. (9) If the second term in (9) is ignored and a sample estimate of the first, n 1 n i=1 w ni (x)(y i m), is used, the value of m for which this is zero is again the Nadaraya-Watson estimator. 49

51 Whether the second term can be ignored depends upon the weights wni (x). If the weights were the indicator functions of the local histogram presented earlier, the second term will be identically zero, whereas with kernel weights it is only asymptotically zero. Because the orthogonality relation only holds as n, the situation is out of the normal framework described by Hansen (1982), but it is close to work reported in Powell (1986), in that the expected value of the function the parameter solves changes with the sample size (through h) and so its large sample limit has to be used instead. 50

52 Local Linear Nonparametric Regression The Nadaraya-Watson estimator of m(x) minimizes Σ n i=1 {y i α} 2 K ( x i ) x h with respect to α, giving ˆm(x) =ˆα = [ ΣK ( x i )] x 1 ( h ΣK xi ) x h yi. Stone (1977) and Cleveland (1979) suggested that instead one minimize n {y i α (x i x) β} 2 K i=1 ( ) xi x, h with respect to α and β and set ˆm(x) equal to the resulting estimate of α. 51

53 This estimate can be found by performing a weighted least squares regression of y i against z i =(1 (x i x)) with weights [ ( K xi )] x 1/2. h Thus, while the Nadaraya-Watson estimator is fitting a constant to the data close to x, the local linear approximation is fitting a straight line. This local linear smoothing estimator has been extensively investigated by Fan (1992a), (1993), Fan and Gijbels (1992) and Ruppert and Wand (1994). 52

54 The resulting estimator has the form m LL (x) = n i=1 w LL ni (x)y i, with weights wni LL = e 1 ( z i K i z i ) 1 z i K i, where e 1 is a column vector of dimension the same as z i with unity as first element and zero elsewhere. One advantage of this estimator is that it can be analysed with standard regression techniques, and it has the same first order statistical properties irrespective of whether the x i are stochastic or non-stochastic. The optimal window width is proportional to n 1/5. 53

55 Applications of the idea in econometrics are McManus (1994) to estimation of cost functions, Gourrieourx and Scaillat (1994) to the term structure, Lin and Shu (1994) to estimation of a disequilibrium transition model, Bossaerts and Hillion (1997) to options prices and their determinants, and Ullah and Roy(1996) for a nutrition/income relation. Implementation and computations are discussed in Cleveland et al (1988). Hastie and Loader (1993) provide an excellent account of the history and potential of the method. 54

56 The logic of linear local regression smoothing can be seen by expanding m (x i ) around x to get m (x i )=m(x)+ m x (x )(x i x), (10) where x lies between x i and x. This may be expressed as m (x i )=α + β (x )(x i x). (11) 55

57 Now, since E (y i x i )=m (x i ), the objective function Σ(y i m i (x i )) 2 K i =Σ(y i α β (x )(x i x)) 2 K i is essentially the residual sum of squares from a regression using only observations close to x i = x. Notice that this means that β (x ) will be very close to constant as x must lie between x i and x. This also points to the fact that improvements might be available from expanding m (x i )asaj th order polynomial in (x i x), but doing so requires the derivatives m (j) to exist. 56

58 Example 4 Eruption Length of Old Faithful Geyser Conditional on Waiting Time 57

59 Other Notes The optimal h can be found by minimizing the MISE similar to the density case, and it can be shown that h opt α n 1/(q+4) Cross validation may be performed by minimizing the estimated prediction error (EPE), n 1 Σ(y i ˆm(x i )) 2,where ˆm(x i )is computed as the leave-one-out estimator deleting the i th observation in the sums. To appreciate why minimizing EPE is sensible notice that, when the leave one out estimator is employed and observations are independent, ˆm i is independent of y i, meaning that E (ˆm i (y i m i )) = 0, and so E(EPE)=σ 2 + E ( n 1 Σ( ˆm i m i ) 2) = σ 2 + MASE. 58

60 Minimizing E(EPE) with respect to h is therefore equivalent to minimizing MASE with respect to h. Unfortunately, minimizing the sample EPE tends to produce an estimator of h that converges only extremely slowly to the value of h minimizing E(EPE), of order n 1/10, The curse of dimensionality means that pure nonparametric regression is difficult to use in higher dimension problems. 59

61 Semi-parametric estimation A number of models exist in the literature which have the distinguishing feature that part of the model is linear and part constitutes an unknown non-linear format. which could be written in matrix form as y i = x 1iβ + g 1 (x 2i )+u i, (12) In (23) x 2i cannot have unity as an element. y = X 1 β + g 1 + u. (13) 60

62 This intercept restriction is an identification condition arising from the fact that g 1 (x 2i ) is unconstrained and therefore can have a constant term as part of its definition. Hence, it would always be possible to add any constant number to (23) and then absorb it into g(x 2i ), showing that, without some further restriction upon the nature of g 1 (x 2i ), it is impossible to consistently estimate an intercept. This issue of identification of parameters, particularly in regards to the intercept, but sometimes a scale parameter as well, arises a good deal in the semi-parametric literature and needs to be dealt with by imposing some restrictions. The parameter of interest is β so that the issue is how to estimate it in the presence of the unknown function g 1. 61

63 A Semi-Parametric Estimator of β Taking the conditional expectation of (13) leads to E (y i x 2i )=E (x 1i x 2i ) β + g 1 (x 2i ). Consequently and y i E (y i x 2i )=(x 1i E (x 1i x 2i )) β + u i (14) g 1 (x 2i )=E (y i x 2i ) E (x 1i x 2i ) β. (15) 62

64 Since (14) has the properties of a linear regression model with dependent variable y i E (y i x 2i ) and independent variables (x 1i E (x 1i x 2i )), an obvious estimator of β is ˆβ = [ n ] 1 [ n ] (x 1i ˆm 12i )(x 1i ˆm 12i ) (x 1i ˆm 12i )(y i ˆm 2i ), i=1 i=1 (16) where ˆm 12i and ˆm 2i are the kernel based estimators of m 12i = E(x 1i x 2i )andm 2i = E(y i x 2i ). 63

65 Once ˆβ is found, g 1 (x 2i ) can be estimated from (15) as ĝ 1 (x 2i )= ˆm 2i ˆm 12i ˆβ, (17) for example Stock (1989) works with this model but is particularly interested in estimating g 1 (x 2i ) rather than β. The kernel estimator for β in the context of (13) was analyzed by Robinson (1988) 64

66 Differencing Consider again the partial linear model y i = x 1i β + g 1 (x 2i )+ε i, (18) where x 1 is a scalar. Order the x 2 from smallest to largest so that x 21 x x 2n Suppose that x 1 is a smooth function of x 2 where E[x 1 x 2 ]=g(x 2 ) and therefore x 1 = g(x 2 )+u 65

67 y i y i 1 =(x 1i x 1,i 1 ) β +(f(x 2i ) f(x 2,i 1 )) + ε i ε i 1 =(g(x 2i ) g(x 2,i 1 )) β +(u i u i 1 ) β+ (f(x 2i ) f(x 2,i 1 )) + ε i ε i 1 Provided that the functions f and g are sufficiently smooth and that the data is sufficiently dense, the differences f(x 2i ) f(x 2,i 1 )and g(x 2i ) g(x 2,i 1 ) should be very small providing the approximations z i z i 1 =u i u i 1 y i y i 1 =(u i u i 1 ) β + ε i ε i 1 66

68 The non-parametric difference estimator of β is simply β diff = (zi z i 1 ) (y i y i 1 ) (zi z i 1 ) 2 which converges at the usual rate of n, with normal distribution so that ( D β diff N β, 1.5 σ 2 ) ε n σu 2 67

69 Example 5: Yatchew and No (2001) Gasoline Demand in Canada 68

70 Binary Choice Models We often start with the idea of an underlying linear (latent variable) model yi = x iβ + u i (19) y i =1whenyi > 0 and takes value 0 otherwise. The standard approach to estimating β in (22) is via maximum likelihood. The likelihood function is formed for a sample of size n as L = G i = n [y i ln(g i )+(1 y i )ln(1 G i )] (20) i=1 x i β [g(u)] = Prob(u i <x iβ) 69

71 G is assumed to be normal (probit) or logistic (logit) in most applications. Klein and Spady (1993) propose to estimate a smooth version of the likelihood that locally approximates the parametric likelihood. Note that x i β could be written in more general terms, but Klein and Spady do retain the linear index function in their method. The key transformation is to note that G in (20) is the probability that u is less than the x iβ conditional on the index function and the parameter β. This can be written as a G[x iβ; β] =Prob(y =1) g υ y=1 g υ (21) where g υ y=1 is the distribution of the index function conditional on y =1andg υ is the unconditional distribution of the index function. a Prob(A B) = Prob(A B) Prob(B) = Prob(B A)Prob(A) Prob(B) 70

72 These can both be estimated nonparametrically using standard kernel techniques while the Prob(y = 1) can be estimated as the sample fraction of observations with y i =1. 71

73 Ichimura and Thompson (1998) propose a wider class of estimates which is based upon a random coefficients approach. y i = x iβ i + u i (22) y i =1wheny i > 0 and takes value 0 otherwise. The distribution of β i is estimated by nonparametric methods with few restrictions. Ai and Chen (Econometrica, 2003) have proposed a better method for estimating binary choice models which is currently considered the state of the art. 72

74 Additional notes on bandwidth selection Plug-in methods Usually reserved for simple density estimation Fan and Gijbels (1996) provide plug-in estimators for regression estimation Least-squares cross-validation popular in many applications Ichimura and Todd (2004, Handbook of Econometrics V) find that this method works well in a simulation study The biggest problem with least-squares cross-validation happens when the data are sparse. In this case the method tends to choose a bandwidth which is too large in order to avoid having zero densities in any area (the criterion takes on an unbounded value if the density is zero at any point). 73

75 Variable bandwidth selection methods result in estimates that are no longer densities. Thus global bandwidth selection methods tend to be preferred There are also bootstrap bandwidth selection methods which tend to be very computationally intensive 74

76 Reducing the curse of dimensionality Restricting the class of models ex: Separable models of Robinson and Yatchew ex: Klein and Spady Binary Choice Model Changing the Parameter of Interest ex: Average derivative methods 75

77 Specifying different stochastic assumptions see Powell (1984, J. of Econometrics) I won t discuss this last one. But these methods essentially involve making some restriction on the conditional distribution of observable variables but not enough to estimate the model parametrically. Powell applies these to various limited dependent variable models including the Tobit model. 76

78 Average Derivative Method Consider the model y i = g(x i )+u i, (23) Suppose that instead of estimating the derivative g (x) atevery point, we are interested in E(g (x)) (24) The advantage is that by taking the average over all points, the curse of dimensionality is eliminated. Even though the function g can not be estimated at the rate of parametric convergence, the average of its derivatives can. 77

79 These estimators have achieved great popularity and are discussed in Stoker (1986, Econometrica) Härdle and Stoker (1989, JASA) Powell, Stock and Stoker (1989, Econometrica) The simplest form is the direct average derivative estimator which is simply n Ê(y i x i ) x t i i=1 β = n (25) t i where t is a trimming function that removes points which have zero or negative densities. i=1 78

80 What affects the results? Bandwidth Choice Trimming 79

81 Trimming Trimming essentially refers to the practice of dropping some observations which meet a particular criterion. In other cases, it may mean rounding values at or near zero up to some acceptable level. (ex: Klein and Spady.) Practical reasons In all of the regression estimators that we have looked at, some type of density estimate appears in the denominator of the expression. If this is zero or near zero, the estimate of the conditional mean function is undefined. So it is sometimes necessary to drop data points in order to avoid the boundary problem. 80

82 Technical reasons Semiparametric estimators which use non-parametric estimators in their construction. The non-parametric estimators need to have uniform rates of convergence in order to establish the asymptotic properties of the semiparametric estimators. This generally involves the use of bounded kernels and densities (for x, typically) that are bounded. So most technical proofs involve the introduction of some trimming function. (See Robinson (1988) or Klein and Spady (1993) for examples.) 81

83 Additively Separable Models This represents another way to restrict the class of models y i = β 0 + g 1 (x i1 )+g 2 (x i2 )+...+ g k (x ik )+u (26) Less restrictive than it appears because some variables could involve interactions with other variables. Estimates achieve the univariate rate of convergence: n 2/5 Complicated to estimate. Use Backfitting or an integration approach of Newey (1994, Econometric Theory) and Härdle and Linton (1996, Biometrika) Less commonly applied than the partially linear model 82

84 Partially Linear Models: Recent developments Refinements have been proposed by Ahn and Powell (1993, Journal of Econometrics) Heckman, Ichimura, and Todd (1998, U. of Chicago, still unpublished) These deal with the case where instrumental variables are needed and where sample selection correction of unknown functional form is estimated. 83

85 Other Notes The book by Pagan and Ullah (1998) remains an excellent reference. The new book by Li and Racine (2006) is written to serve more as a teaching text, complete with problem sets and examples. More recent developments are discussed by Ichimura and Todd (Handbook of Econometrics, Volume 5, 2004). I particularly like their section on bandwidth selection (chapter 6) for semi-parametric, parametric, and average derivative regression estimation techniques. 84

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014 2 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation 3 2.1 Histogram

More information

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods Robert V. Breunig Centre for Economic Policy Research, Research School of Social Sciences and School of

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Finite Sample Performance of Semiparametric Binary Choice Estimators

Finite Sample Performance of Semiparametric Binary Choice Estimators University of Colorado, Boulder CU Scholar Undergraduate Honors Theses Honors Program Spring 2012 Finite Sample Performance of Semiparametric Binary Choice Estimators Sean Grover University of Colorado

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

Econ 273B Advanced Econometrics Spring

Econ 273B Advanced Econometrics Spring Econ 273B Advanced Econometrics Spring 2005-6 Aprajit Mahajan email: amahajan@stanford.edu Landau 233 OH: Th 3-5 or by appt. This is a graduate level course in econometrics. The rst part of the course

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics

UNIVERSITY OF CALIFORNIA Spring Economics 241A Econometrics DEPARTMENT OF ECONOMICS R. Smith, J. Powell UNIVERSITY OF CALIFORNIA Spring 2006 Economics 241A Econometrics This course will cover nonlinear statistical models for the analysis of cross-sectional and

More information

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures David Hunter Pennsylvania State University, USA Joint work with: Tom Hettmansperger, Hoben Thomas, Didier Chauveau, Pierre Vandekerkhove,

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Estimation of Treatment Effects under Essential Heterogeneity

Estimation of Treatment Effects under Essential Heterogeneity Estimation of Treatment Effects under Essential Heterogeneity James Heckman University of Chicago and American Bar Foundation Sergio Urzua University of Chicago Edward Vytlacil Columbia University March

More information

Section 7: Local linear regression (loess) and regression discontinuity designs

Section 7: Local linear regression (loess) and regression discontinuity designs Section 7: Local linear regression (loess) and regression discontinuity designs Yotam Shem-Tov Fall 2015 Yotam Shem-Tov STAT 239/ PS 236A October 26, 2015 1 / 57 Motivation We will focus on local linear

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

Kernel density estimation

Kernel density estimation Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms

More information

4 Nonparametric Regression

4 Nonparametric Regression 4 Nonparametric Regression 4.1 Univariate Kernel Regression An important question in many fields of science is the relation between two variables, say X and Y. Regression analysis is concerned with the

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity

Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Identification and Estimation of Partially Linear Censored Regression Models with Unknown Heteroscedasticity Zhengyu Zhang School of Economics Shanghai University of Finance and Economics zy.zhang@mail.shufe.edu.cn

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation

Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation Economics 620, Lecture 19: Introduction to Nonparametric and Semiparametric Estimation Nicholas M. Kiefer Cornell University Professor N. M. Kiefer (Cornell University) Lecture 19: Nonparametric Analysis

More information

APEC 8212: Econometric Analysis II

APEC 8212: Econometric Analysis II APEC 8212: Econometric Analysis II Instructor: Paul Glewwe Spring, 2014 Office: 337a Ruttan Hall (formerly Classroom Office Building) Phone: 612-625-0225 E-Mail: pglewwe@umn.edu Class Website: http://faculty.apec.umn.edu/pglewwe/apec8212.html

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit

Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit Econometrics Lecture 5: Limited Dependent Variable Models: Logit and Probit R. G. Pierse 1 Introduction In lecture 5 of last semester s course, we looked at the reasons for including dichotomous variables

More information

SINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL

SINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL SINGLE-STEP ESTIMATION OF A PARTIALLY LINEAR MODEL DANIEL J. HENDERSON AND CHRISTOPHER F. PARMETER Abstract. In this paper we propose an asymptotically equivalent single-step alternative to the two-step

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

LECTURE 2 LINEAR REGRESSION MODEL AND OLS

LECTURE 2 LINEAR REGRESSION MODEL AND OLS SEPTEMBER 29, 2014 LECTURE 2 LINEAR REGRESSION MODEL AND OLS Definitions A common question in econometrics is to study the effect of one group of variables X i, usually called the regressors, on another

More information

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model

More information

Introduction to Nonparametric and Semiparametric Estimation. Good when there are lots of data and very little prior information on functional form.

Introduction to Nonparametric and Semiparametric Estimation. Good when there are lots of data and very little prior information on functional form. 1 Introduction to Nonparametric and Semiparametric Estimation Good when there are lots of data and very little prior information on functional form. Examples: y = f(x) + " (nonparametric) y = z 0 + f(x)

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Han Liu John Lafferty Larry Wasserman Statistics Department Computer Science Department Machine Learning Department Carnegie Mellon

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Generated Covariates in Nonparametric Estimation: A Short Review.

Generated Covariates in Nonparametric Estimation: A Short Review. Generated Covariates in Nonparametric Estimation: A Short Review. Enno Mammen, Christoph Rothe, and Melanie Schienle Abstract In many applications, covariates are not observed but have to be estimated

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Peter Hall, Qi Li, Jeff Racine 1 Introduction Nonparametric techniques robust to functional form specification.

More information

More on Estimation. Maximum Likelihood Estimation.

More on Estimation. Maximum Likelihood Estimation. More on Estimation. In the previous chapter we looked at the properties of estimators and the criteria we could use to choose between types of estimators. Here we examine more closely some very popular

More information

Bandwidth selection for kernel conditional density

Bandwidth selection for kernel conditional density Bandwidth selection for kernel conditional density estimation David M Bashtannyk and Rob J Hyndman 1 10 August 1999 Abstract: We consider bandwidth selection for the kernel estimator of conditional density

More information

A Bootstrap Test for Conditional Symmetry

A Bootstrap Test for Conditional Symmetry ANNALS OF ECONOMICS AND FINANCE 6, 51 61 005) A Bootstrap Test for Conditional Symmetry Liangjun Su Guanghua School of Management, Peking University E-mail: lsu@gsm.pku.edu.cn and Sainan Jin Guanghua School

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information

A review of some semiparametric regression models with application to scoring

A review of some semiparametric regression models with application to scoring A review of some semiparametric regression models with application to scoring Jean-Loïc Berthet 1 and Valentin Patilea 2 1 ENSAI Campus de Ker-Lann Rue Blaise Pascal - BP 37203 35172 Bruz cedex, France

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression

More information

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human

More information

SEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH. Mustafa Koroglu. A Thesis presented to The University of Guelph

SEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH. Mustafa Koroglu. A Thesis presented to The University of Guelph SEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH by Mustafa Koroglu A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Doctor of Philosophy in Economics

More information

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON

Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON Parametric identification of multiplicative exponential heteroskedasticity ALYSSA CARLSON Department of Economics, Michigan State University East Lansing, MI 48824-1038, United States (email: carls405@msu.edu)

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

Simple Estimators for Semiparametric Multinomial Choice Models

Simple Estimators for Semiparametric Multinomial Choice Models Simple Estimators for Semiparametric Multinomial Choice Models James L. Powell and Paul A. Ruud University of California, Berkeley March 2008 Preliminary and Incomplete Comments Welcome Abstract This paper

More information

Nonparametric Econometrics in R

Nonparametric Econometrics in R Nonparametric Econometrics in R Philip Shaw Fordham University November 17, 2011 Philip Shaw (Fordham University) Nonparametric Econometrics in R November 17, 2011 1 / 16 Introduction The NP Package R

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon

Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Discussion of the paper Inference for Semiparametric Models: Some Questions and an Answer by Bickel and Kwon Jianqing Fan Department of Statistics Chinese University of Hong Kong AND Department of Statistics

More information

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines

Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Econ 2148, fall 2017 Gaussian process priors, reproducing kernel Hilbert spaces, and Splines Maximilian Kasy Department of Economics, Harvard University 1 / 37 Agenda 6 equivalent representations of the

More information

Nonparametric Estimation of Luminosity Functions

Nonparametric Estimation of Luminosity Functions x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number

More information

Lecture 6: Discrete Choice: Qualitative Response

Lecture 6: Discrete Choice: Qualitative Response Lecture 6: Instructor: Department of Economics Stanford University 2011 Types of Discrete Choice Models Univariate Models Binary: Linear; Probit; Logit; Arctan, etc. Multinomial: Logit; Nested Logit; GEV;

More information

Flexible Estimation of Treatment Effect Parameters

Flexible Estimation of Treatment Effect Parameters Flexible Estimation of Treatment Effect Parameters Thomas MaCurdy a and Xiaohong Chen b and Han Hong c Introduction Many empirical studies of program evaluations are complicated by the presence of both

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Simple Estimators for Monotone Index Models

Simple Estimators for Monotone Index Models Simple Estimators for Monotone Index Models Hyungtaik Ahn Dongguk University, Hidehiko Ichimura University College London, James L. Powell University of California, Berkeley (powell@econ.berkeley.edu)

More information

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression

Statistics II. Management Degree Management Statistics IIDegree. Statistics II. 2 nd Sem. 2013/2014. Management Degree. Simple Linear Regression Model 1 2 Ordinary Least Squares 3 4 Non-linearities 5 of the coefficients and their to the model We saw that econometrics studies E (Y x). More generally, we shall study regression analysis. : The regression

More information

Introduction to Econometrics

Introduction to Econometrics Introduction to Econometrics Lecture 3 : Regression: CEF and Simple OLS Zhaopeng Qu Business School,Nanjing University Oct 9th, 2017 Zhaopeng Qu (Nanjing University) Introduction to Econometrics Oct 9th,

More information

Topic 4: Model Specifications

Topic 4: Model Specifications Topic 4: Model Specifications Advanced Econometrics (I) Dong Chen School of Economics, Peking University 1 Functional Forms 1.1 Redefining Variables Change the unit of measurement of the variables will

More information

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random

Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Calibration Estimation of Semiparametric Copula Models with Data Missing at Random Shigeyuki Hamori 1 Kaiji Motegi 1 Zheng Zhang 2 1 Kobe University 2 Renmin University of China Econometrics Workshop UNC

More information

Day 3B Nonparametrics and Bootstrap

Day 3B Nonparametrics and Bootstrap Day 3B Nonparametrics and Bootstrap c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based on A. Colin Cameron and Pravin K. Trivedi (2009,2010),

More information

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case

Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Identification and Estimation Using Heteroscedasticity Without Instruments: The Binary Endogenous Regressor Case Arthur Lewbel Boston College Original December 2016, revised July 2017 Abstract Lewbel (2012)

More information

11. Bootstrap Methods

11. Bootstrap Methods 11. Bootstrap Methods c A. Colin Cameron & Pravin K. Trivedi 2006 These transparencies were prepared in 20043. They can be used as an adjunct to Chapter 11 of our subsequent book Microeconometrics: Methods

More information

DISCUSSION PAPER. The Bias from Misspecification of Control Variables as Linear. L e o n a r d G o f f. November 2014 RFF DP 14-41

DISCUSSION PAPER. The Bias from Misspecification of Control Variables as Linear. L e o n a r d G o f f. November 2014 RFF DP 14-41 DISCUSSION PAPER November 014 RFF DP 14-41 The Bias from Misspecification of Control Variables as Linear L e o n a r d G o f f 1616 P St. NW Washington, DC 0036 0-38-5000 www.rff.org The Bias from Misspecification

More information

What s New in Econometrics? Lecture 14 Quantile Methods

What s New in Econometrics? Lecture 14 Quantile Methods What s New in Econometrics? Lecture 14 Quantile Methods Jeff Wooldridge NBER Summer Institute, 2007 1. Reminders About Means, Medians, and Quantiles 2. Some Useful Asymptotic Results 3. Quantile Regression

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be

Quantile methods. Class Notes Manuel Arellano December 1, Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Quantile methods Class Notes Manuel Arellano December 1, 2009 1 Unconditional quantiles Let F (r) =Pr(Y r). Forτ (0, 1), theτth population quantile of Y is defined to be Q τ (Y ) q τ F 1 (τ) =inf{r : F

More information

ECON 594: Lecture #6

ECON 594: Lecture #6 ECON 594: Lecture #6 Thomas Lemieux Vancouver School of Economics, UBC May 2018 1 Limited dependent variables: introduction Up to now, we have been implicitly assuming that the dependent variable, y, was

More information

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research

Linear models. Linear models are computationally convenient and remain widely used in. applied econometric research Linear models Linear models are computationally convenient and remain widely used in applied econometric research Our main focus in these lectures will be on single equation linear models of the form y

More information

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA

LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA The Annals of Statistics 1998, Vol. 26, No. 3, 1028 1050 LOCAL LINEAR REGRESSION FOR GENERALIZED LINEAR MODELS WITH MISSING DATA By C. Y. Wang, 1 Suojin Wang, 2 Roberto G. Gutierrez and R. J. Carroll 3

More information

13 Endogeneity and Nonparametric IV

13 Endogeneity and Nonparametric IV 13 Endogeneity and Nonparametric IV 13.1 Nonparametric Endogeneity A nonparametric IV equation is Y i = g (X i ) + e i (1) E (e i j i ) = 0 In this model, some elements of X i are potentially endogenous,

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Working Paper No Maximum score type estimators

Working Paper No Maximum score type estimators Warsaw School of Economics Institute of Econometrics Department of Applied Econometrics Department of Applied Econometrics Working Papers Warsaw School of Economics Al. iepodleglosci 64 02-554 Warszawa,

More information

Transformation and Smoothing in Sample Survey Data

Transformation and Smoothing in Sample Survey Data Scandinavian Journal of Statistics, Vol. 37: 496 513, 2010 doi: 10.1111/j.1467-9469.2010.00691.x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department

More information

Applied Health Economics (for B.Sc.)

Applied Health Economics (for B.Sc.) Applied Health Economics (for B.Sc.) Helmut Farbmacher Department of Economics University of Mannheim Autumn Semester 2017 Outlook 1 Linear models (OLS, Omitted variables, 2SLS) 2 Limited and qualitative

More information

Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis"

Ninth ARTNeT Capacity Building Workshop for Trade Research Trade Flows and Trade Policy Analysis Ninth ARTNeT Capacity Building Workshop for Trade Research "Trade Flows and Trade Policy Analysis" June 2013 Bangkok, Thailand Cosimo Beverelli and Rainer Lanz (World Trade Organization) 1 Selected econometric

More information

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University.

Summer School in Statistics for Astronomers V June 1 - June 6, Regression. Mosuk Chow Statistics Department Penn State University. Summer School in Statistics for Astronomers V June 1 - June 6, 2009 Regression Mosuk Chow Statistics Department Penn State University. Adapted from notes prepared by RL Karandikar Mean and variance Recall

More information

Linear Regression. Junhui Qian. October 27, 2014

Linear Regression. Junhui Qian. October 27, 2014 Linear Regression Junhui Qian October 27, 2014 Outline The Model Estimation Ordinary Least Square Method of Moments Maximum Likelihood Estimation Properties of OLS Estimator Unbiasedness Consistency Efficiency

More information