ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

Size: px
Start display at page:

Download "ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd"

Transcription

1 ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014

2 2

3 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation Histogram Estimator What is the variance and bias of this estimator? Selecting the bin size Symmetric Binning Histogram Estimator Standard kernel density estimator Bias Variance Choice of Bandwidth Alternative Estimator- k nearest neighbor Asymptotic Distribution of Kernel Density Nonparametric Regression Overview The Local approach Two types of potential problems Interpretation of Local Polynomial Estimators as a local regression Properties of Nonparametric Regression Estimators Consistency Asymptotic Normality How might we choose the bandwidth? Bandwidth Selection for Density Estimation First Generation Methods Plug-in method (local version i

4 ii CONTENTS Global plug-in method Rule-of-thumb Method (Silverman Global Method #2: Least Squares Cross-validation Biased Cross-validation Second Generation Methods Bandwidth selection for Nonparametric Regression Plug-in Estimator (local method global Plug-in Estimator Bandwidth Choice in Nonparametric Regression Global MISE criterion blocking method Bandwidth Selection for Nonparametric Regression- Cross- Validation

5 Chapter 1 Review of Stochastic Order Symbols Let X n be a sequence of random variables. Then and X n = o P (1 if X n P 0 as n X n = O P (a n if X n /a n is stochastically bounded. Recall that X n /a n is stochastically bounded if ɛ > 0, M < : P ( X n /a n > M < ɛ, n If a sequence is o p (1 then it is O p (1. Convergence in probability implies stochastic boundedness but not vice versa. 1

6 2 CHAPTER 1. REVIEW OF STOCHASTIC ORDER SYMBOLS

7 Chapter 2 Nonparametric Density Estimation Nonparametric density estimators allow estimation of densities without having to specify the functional form of the density. Instead, weaker assumptions such as continuity and differentiability can be assumed. 2.1 Histogram Estimator The most commonly used nonparametric density estimator is the histogram estimator. Suppose we estimate density by a histogram with bin width equal to h Proportion of data in bin: 1 n n 1(x i [x 0, x 0 + h] ˆf(x 0 h This suggests a density estimator: ˆf(x 0 = 1 nh n 1(x i [x 0, x 0 + h] 3

8 4 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION What is the variance and bias of this estimator? Bias E ˆf(x 0 = 1 h E(1(x i [x 0, x 0 + h] = 1 h x0 +h 1(x i [x 0, x 0 + h]f(x i dx i = 1 {f(x 0 + f ( x(x i x 0 }dx i x [x i, x 0 ] h x 0 = f(x 0 1 x0 +h [ x0 +h ] 1 dx i + f ( x(x i x 0 dx i h x } 0 x {{} 0 h }{{} 1 R(x 0,h Assume the first derivative is bounded: f (x c < x In that case, the remainder term can be bounded from above by R(x 0, h x0 +h x 0 f ( x x i x 0 dx i h C h 2 1 h = O p(h Thus, E ˆf(x0 f(x 0 Ch Consistency of the histogram requires that h 0 as n. Note that these derivations assumed f is differentiable and derivative is bounded.

9 2.1. HISTOGRAM ESTIMATOR 5 Variance Mean-square consistency requires that var 0 as well. If observations are iid, we can write var ˆf(x n 0 = 1 var{1(x i [x 0, x 0 + h]} n 2 h 2 We will show that the variance is bounded above by something that goes to zero (below, it is bounded by 0. var{1(x i [x 0, x 0 + h]} = E ( 1(x i [x 0, x 0 + h] 2 E(1( 2 E ( 1(x i [x 0, x 0 + h] 2 E (1(x i [x 0, x 0 + h] We already analyzed this term above in studying the bias. We showed that Assume that f(x 0 C, so, E(1(x i [x 0, x 0 + h] = hf(x 0 + O p (h 2 var ˆf(x 0 C n h n 2 h 2 = C nh = O p( 1 nh Consistency therefore requires nh Remarks (i The estimator is not root-n consistent, because the variance gets large as h 0 (ii For consistency, we require both h 0 and n h Selecting the bin size How might you choose h? One way is to minimize the mean-squared error (MSE, focussing on the lowest order terms (the terms that converge the slowest: MSE = C 2 h 2 + C nh

10 6 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION h = 2 C2 h + C nh 2 = 0 2 C 2 h 3 = + C n h = [ C 2C ] n 3 }{{} optimal choice of bin width 2.2 Symmetric Binning Histogram Estimator Instead of constructing the bins as before, consider constructing the bins symmetrically around point of evaluation: ˆf(x 0 = 1 n n 1 2h 1( x 0 x i < h Can rewrite as ˆf(x 0 = 1 n 1 nh 2 1( x 0 x i h < 1 }{{} uniform kernel Note that the area under the kernel function integrates to 1 and the kernel function is symmetric.

11 2.3. STANDARD KERNEL DENSITY ESTIMATOR 7 Figure 2.1: Symmetric uniform kernel function 2.3 Standard kernel density estimator Instead of uniform kernel, use kernel with weights that give more weight to closer observations and also go smoothly to zero. We also require that it integrates to 1 and is symmetric. Let Examples: s = x 0 x i h k(s = (s2 1 2 if s 1 biweight or quartic kernel = 0 else k(s = 1 2π e s2 2 normal kernel With the normal kernel, the weights are always positive. The so-called Nadaraya-Watson Kernel Estimator is given by ˆf(x 0 = 1 n k( x 0 x i n

12 8 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION Figure 2.2: Kernel function with weights that go smoothly to zero Bias E ˆf(x 0 = 1 k( x 0 x i f(x i dx i let s = x 0 x i ds = 1 x i = x 0 s when x i =, s i = = = k(sf(x 0 s ds k(sf(x 0 s ds Assume f is r times differentiable with bounded r th derivative f(x 0 s = f(x 0 + f (x 0 ( s f (x 0 h 2 ns r! f r (x 0 ( s r + 1 r! [f r ( x f r (x 0 ]( s r }{{} Remainder term R(x 0,s If k(ssr = 0 R = 1..r

13 2.3. STANDARD KERNEL DENSITY ESTIMATOR 9 then bias is k(sr(x 0, sds h r nc 2 s r k(sds (since we assumed bounded rth derivative Thus, this estimator improves on histogram if some moments equal 0. Typically, k satisfies If we require k(sds = 1 k(ssds = 0 k(ss 2 ds = 0 Then k(s must be negative in places, as in the figure below. Remark If f r satisfies a lipschitz condition i.e. f r ( x f r (x 0 m x x 0 then Variance V ar ˆf(x 0 = 1 = 1 nh 2 n Bias C 0 h r+1 n [Ek 2 ( x 0 x i C 1 nh = O p ( 1 n n 2 h 2 n n V ark( x 0 x i s r k(sds = O p (h r+1 n }{{} higher order Ek( x 0 x i 2 ] can show using same techniques does not improve on histogram

14 10 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION Figure 2.3: Kernel function that is sometimes negative Choice of Bandwidth Choose h to minimize the asymptotic mean-squared error (AMSE: minamse(h = C 2 h 2h 2r n + C 1 n h = C2 2 2r h 2r 1 n C 1 nh 2 n h 2r+1 n = C 1 C 2 2 2rn = 0 = ( C 1 2rC 2 1 2r+1 (n 1 2r+1 shrinks to zero as n grows large Now, let s look at order of AMSE (without making the Lipschitz assumption. Plug in the optimal value for that was obtained above. Let C 3 = C 1 2rC 2 1 2r+1.

15 2.4. ALTERNATIVE ESTIMATOR- K NEAREST NEIGHBOR 11 MSE(h k = C 2 C 2r 3 n 2r 2r+1 + C 1 n C 3 n 1 2r+1 = C 2 C3 2r n 2r 2r+1 = O p (n 2r 2r+1 + C 1 C3 1 n 2r 2r+1 so MSE = O p (n r 2r + 1 < 1 2 2r+1 r (parametric rate is n 1 2 Recall ( ˆβ ols β 0 (0, σ 2 ( X X N 1 N 1 }{{} V ar RMSE = MSE is O p (N 1 2 How do you generalize kernel density estimator to more dimensions? ˆf(x 0, z 0 = 1 n,x,z n k 1 ( x 0 x i k 2 ( z 0 z i,z,x 2.4 Alternative Estimator- k nearest neighbor Alternatively, the bandwidth can be chosen so as to include a certain number of neighbors in each point estimation. In this case, the bandwidth will be variable:

16 12 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION For m-dimensional data, f(x 0 2r k = #data in bin=k 1 n ˆf(x 0 = k 1 2r k n ˆf(x 0 = k 1 (2r k m n 2.5 Asymptotic Distribution of Kernel Density We already did consistency (we showed conditions required for conv. in mean square ˆf(x 0 = 1 n k( x 0 x i p n h E ˆf(x 0 f(x 0 as h 0 n by LLN V AR ˆf(x 0 p 0 as n Now, asymptotic distribution nhn ( 1 n k( x 0 x i f(x 0 = n ( 1 n k( x 0 x i E k( x 0 x i n n (1 + n ( 1 n E k( x 0 x i f(x 0 n (2

17 2.5. ASYMPTOTIC DISTRIBUTION OF KERNEL DENSITY 13 Analyze term (1: can find a CLT for this part V AR 1 k( x 0 x i hn Assume 1 n n n [ 1 hn k( x 0 x i N (0, V AR 1 hn k( x 0 x i = 1 E (k( x 0 x i 2 h }{{ n } k(sds = 1, f(x 0 k2 (sds k(ssds = 0 [ 1 O(h 2 n] 0 as 0 Term (1 N(0, f(x 0 k 2 (sds 1 E( k( x ] 0 x i hn 1 E (k( x 0 x i 2 Now, show Term (2 p 0 (will require more stringent conditions on the bandwidth nhn ( 1 n E k( x 0 x i f(x 0 = n {E 1 k( x 0 x i f(x 0 } n = 1 = k( x 0 x i f(x i dx i s = x 0 x i k(sf(x 0 s ds = k(s[f(x 0 + f (x 0 ( s + f ( x s 2 h 2 2 n]ds = f(x 0 k(sds + f (x 0 k(ssds + h 2 n x between x 0 and x 0 + sh. ds = dx i f ( x k(ss 2 ds 2 Assume f ( x M and k(ssds = 0 (satisfied if k is symmetric.

18 14 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION then E 1 ( x0 x i k f(x 0 is O p (h 2 h n n which implies that nhn E( 1 k( x 0 x i f(x 0 is O p ( n h 2 n = O p ( nh 5 n nhn E( 1 k( x 0 x i f(x 0 = O p ( nh 5 n Hence, we require nh 5 n 0 in addition to 0, n. The required condition for asymptotic normality is stronger than for consistency.

19 Chapter 3 Nonparametric Regression 3.1 Overview Consider the model: where we assume y i = g(x i + ɛ i E(ɛ i x i = 0 E(ɛ 2 i x i = c < and we would like to estimated g(x without having to impose functional form assumptions on g, except assumptions such as continuity and differentiability. There are two types of nonparametric estimation methods: local and global. These two approaches reflect two different ways to reduce the problem of estimating a function into estimation of real numbers. Local approaches consider a real valued function g (x at a single point x = x 0. The problem of estimating a function becomes estimating a real number h (x 0. If we are interested in evaluating the function in the neighborhood of the point x 0, we can approximate the function by g(x 0 or, if g (x is continuously differentiable at x 0, then a better approximation might be g (x 0 + g (x 0 (x x 0. Thus, the problem of estimating a function at a 15

20 16 CHAPTER 3. NONPARAMETRIC REGRESSION point may be thought of as estimating two real numbers g (x 0 and g (x 0, making use of observations in the neighborhood. Either way, if we want to estimate the function over a wider range of x values, the same, pointwise problem can be solved at the different points of evaluation. Global approaches introduce a coordinate system in a space of functions, which reduces the problem of estimating a function into that of estimating a set of real numbers. Any element v in a d-dimensional vector space can be uniquely expressed using a system of independent vectors {b j } d j=1 as v = d j=1 θ j b j, where one can think of {b j } d j=1 as a system of coordinates and (θ 1,..., θ d as the representation of v using the coordinate system. Likewise, using an appropriate set of linearly independent functions {φ j (x} j=1 as coordinates any square integrable real valued function can be uniquely expressed by a set of coefficients. That is, given an appropriate set of linearly independent functions {φ j (x} j=1, any square integrable function g (x has unique coefficients {θ j } j=1 such that g (x = θ j φ j (x. j=1 One can think of {φ j (x} j=1 as a system of coordinates and (θ 1, θ 2,... as the representation of g (x using the coordinate system. This observation allows us to translate the problem of estimating a function into a problem of estimating a sequence of real numbers {θ j } j=1. Well known bases are polynomial series and Fourier series. These bases are infinitely differentiable everywhere. Other well known bases are polynomial spline bases and wavelet bases. The literature on nonparametric estimation methods is vast. These notes will focus on local methods, in particular, kernel-based methods, which are among the most commonly used (at least by economists!. 3.2 The Local approach Examples: kernel regression, local polynomial regression Early smoothing methods date back to the 1860 s in actuarial literature (see Cleveland (1979, JASA, where they were used to study life expectancies at different ages.

21 3.2. THE LOCAL APPROACH 17 One possibility is to estimate the conditional mean by: E(y x = Ê(y x = yf(y xdy = y ˆf(x, y ˆf(x dy. f(x, y y f(x dy Or, with repeated data at each point, could estimate by g(x 0 by ĝ(x 0 = y i 1(x i = x 0 i 1(x i = x 0 i A drawback is that we can only estimate this way at points where we have data. Need a way of interpolating or extrapolating between and outside data observations. We could create cells as in the histogram estimator: ĝ(x 0 = = n y i 1(x i [x 0, x 0 + ] n 1(x i [x 0, x 0 + ] n y i 1( x 0 x i < 1 n 1( x 0 x i < 1 If we want ĝ(x to be smooth, we need to choose a kernel function that goes to 0 smoothly

22 18 CHAPTER 3. NONPARAMETRIC REGRESSION Figure 3.1: Repeated data estimator ĝ(x 0 = = n y i k( x 0 x i n k( x 0 x i = n y i w i (x 0 n y i k( x 0 x i ( g(x 0 f(x 0 n ( f(x 0 1 n 1 n k( x 0 x i

23 3.2. THE LOCAL APPROACH 19 where w i (x 0 = k( x 0 x i n k( x 0 x i note : n w i (x 0 = 1 We don t need k(sds = 1, but we do require that kernel used in numerator and denominator integrate to the same value Two types of potential problems (i When density at x 0 is low, the denominator tends to be close to 0 and we get a division by zero problem. (iiat boundary points, the estimator has higher order bias, even with appropriately chosen k(. This problem is known as boundary bias Interpretation of Local Polynomial Estimators as a local regression Solve this problem at each point of evaluation, x 0 (which may or may not correspond to points in the data: n â : argmin (y i a b 1 (x i x 0 b 2 (x i x b k (x i x 0 k 2 k i {a,b 1,..,b k } ( x0 x i k i = k ˆb1 provides an estimator for g (x 0 ˆb2 provides an estimator for g (x 0 2 if k = 0, â = n y i k( x 0 x i hn n k( x 0 x i = n y i w i (x 0 hn standard Nadaraya-Watson kernel regression estimator where n w i (x 0 = 1

24 20 CHAPTER 3. NONPARAMETRIC REGRESSION if k = 1 where â = n n y i k i (x j x 0 2 n y k k k (x k x 0 n k l (x l x 0 j=1 k=1 l=1 n n k i k j (x j x 0 2 { n k k (x k x 0 } 2 j=1 k=1 k i = k( x 0 x i We can again write the expression as n y i w i (x 0 where the weights satisfy and n w i (x 0 = 1 n w i (x 0 (x i x 0 = 0 k = 0 weights for the standard Nadaraya-Watson kernel estimator do not satisfy this second property. The LLR estimator (k=1 improves on kernel estimator in two ways (a the asymptotic bias does not depend on design density of the data (i.e. f(x does not appear in the bias expression. (b the bias has a higher order of convergence at boundary points Intuitively, the standard kernel estimator fits a local constant, so when there is no data on the other side, the estimate will tend to be too high or too low. It does not capture the slope. LLR gets the slope. y i = g(x i + ε i

25 3.2. THE LOCAL APPROACH 21 Figure 3.2: Boundary bias - How LLR improves on kernel regression ĝ LLR (x 0 g(x 0 = n w i (x 0 y i g(x 0 = n w i (x 0 ε i + n w i (x 0 (g(x i g(x 0 = n w i (x 0 ε i + g (x 0 (x i x 0 + [g ( x g (x 0 ] (x i x 0 = n w i (x 0 ε i + g (x 0 n w i (x 0 (x i x 0 + }{{}}{{} variance part =0 with LLR + n w i (x 0 (g ( x g (x 0 }{{} (x i x 0

26 22 CHAPTER 3. NONPARAMETRIC REGRESSION If g satisfies a Lipschitz condition, the last term can be bounded. 3.3 Properties of Nonparametric Regression Estimators 1. Consistency (will show convergence in mean square 2. Asymptotic Normality In showing the asymptotic properties, we will work with the standard kernel regression estimator, because it is simpler to analyze than LLR and the basic techniques are the same. Assume ˆm(x 0 = 1 n y i k( x 0 x i i 1 n k( x 0 x i i (i 0, n (ii sk(sds = 0 k(sds 0 k(ss2 ds < C (iii f(x and g(x are c 2 with bounded second derivatives Consistency We already showed 1 k( x 0 x i MS f(x 0 k(sds (denominator n i will now show 1 y i k( x 0 x i MS f(x 0 g(x 0 k(sds (numerator n i Then apply Mann.Wald theorem (plim of a continuous function is continuous function of plim to conclude ratio P g(x 0 Denominator:

27 3.3. PROPERTIES OF NONPARAMETRIC REGRESSION ESTIMATORS23 V AR( 1 n k( x 0 x i 1 n n 2 h n 2 E(k2 ( x 0 x i = 1 nh O p(h 2 n = O p ( 1 nh Thus, 1 k( x 0 x i MS E( 1 k( x 0 x i n i n i h n = f(x 0 k(sds + O p ( Numerator: E( 1 n y i k( x 0 x i = 1 E(E(y i x i k( x 0 x i n Also, can show Hence, = 1 E(g (x i k( x 0 x i = 1 g (x i k( x 0 x i f(x i dx i Let φ(x i = f (x i g(x i and Taylor expand (as before = φ (x 0 where φ(x 0 = g(x 0 f(x 0 k(sds + O p ( V AR( 1 ( n x0 x i y i k n c n (3.1 c n 0 as n (3.2

28 24 CHAPTER 3. NONPARAMETRIC REGRESSION ratio g (x 0 f (x 0 k(sds f (x 0 k(sds = g (x 0 provided that 0 n Asymptotic Normality Show nhn (ĝ(x 0 g(x 0 N {[ ] 1 2 g (x 0 + g (x 0 f (x 0 } h 2 n nhn u 2 k(udu, f(x 0 1 σ 2 (x 0 k 2 (u du f (x 0 where σ 2 (x 0 = E (ε 2 i x i = x 0 conditional variance [ ] yi k i nhn g (x 0 = [ ] (yi g(x 0 k i n ki ki (3.3 y i = g(x i + ε i E(ε i x i = 0 = εi k i n ki }{{} variance part = n ( 1 + (g(xi g(x 0 k i n ki }{{} bias part n εi k i 1 n ki + n ( 1 n (g(xi g(x 0 k i 1 ki n Assuming k(sds = 1, we showed before that

29 3.3. PROPERTIES OF NONPARAMETRIC REGRESSION ESTIMATORS25 1 n ki MS f(x 0 + O( We will now analyze the distribution of the numerator 1 nhn εi k i = 1 ε i k( x 0 x i = 1 1 ε i k( x 0 x i n nhn n hn i i and apply CLT to get N ( ( 1 0, V AR ε i k( x 0 x i hn (3.4 ( 1 V AR ε i k( x 0 x i = 1 ( E ε 2 i k 2 ( x 0 x i + higher order terms hn = 1 E ( E(ε 2 i x i k 2 ( x 0 x i =σ 2 (x i = 1 σ 2 (x i k 2 ( x 0 x i f(x i dx i h n = σ 2 (x 0 f(x 0 k 2 (sds + O( by change of var Now, analyze the bias term

30 26 CHAPTER 3. NONPARAMETRIC REGRESSION 1 nhn (g(xi g(x 0 k i = 1 (g(x i g(x 0 k( x 0 x i n nhn 1 nhn n i (g(x i g(x 0 k( x 0 x i f(x i dx i = n (g(x 0 s g(x 0 k(s f(x 0 s ds = n {g (x 0 ( s g (x 0 s 2 h 2 n [g ( x g (x 0 ]s 2 h 2 n} +higher order (will ignore {f(x 0 s f (x 0 + s2 h 2 n 2 f ( x}k(sds if g satisfies lipschitz condition g ( x g (x 0 C x x 0 then [g ( x g (x 0 ] 2 term will be higher order -other terms =0 { }}{ = g f sk(sds + g f h 2 n g (x 0 f(x 0 h 2 n s 2 k(sds g f ( x h 3 n 2 }{{} higher order k(ss 3 ds k(ss 2 ds + higher order terms get nhn (g(xi g(x 0 k i n [g (x 0 f (x g (x 0 f(x 0 ] h 2 n k(ss 2 ds

31 3.3. PROPERTIES OF NONPARAMETRIC REGRESSION ESTIMATORS27 require nhn h 2 n 0 i.e. nh 5 n 0 Putting these results together, we get nhn (ĝ(x 0 g(x 0 N ( [ g (x 0 f (x f(x 2 g (x 0 ] n h 2 n k(ss 2 ds, f(x 0 1 σ 2 (x 0 k(s 2 ds (which is what we set out to show Remarks: Getting asymptotic distribution requires plug-in estimator of g, f, f, g, σ 2 (x 0 (all the ingredients of the bias and variance g -obtain by local linear or local quadratic regression f -derivative of estimator for f gives estimator of f ˆf (x 0 = 1 n k ( x 0 x i n f obtain by standard density estimator k2 (sds, k(ssds constants that depend on kernel function k(s σ 2 (x 0 conditional variance. Estimate by Ê(ε 2 i x 0 = wi (x 0 ˆε 2 i wi (x 0 fitted residuals ŷ ˆm(x i = ˆε i We showed distribution of kernel estimator. LLR? Fan (JASA, 1992 showed What about distribution of nhn (ĝ LLR (x 0 g(x 0 = N 1 2 g (x 0 k(ss 2 ds nh n h 2 n, f(x 0 1 σ 2 (x 0 k 2 (sds }{{}}{{} one less term in bias expression same variance Remarks:

32 28 CHAPTER 3. NONPARAMETRIC REGRESSION Bias of LLR does not depend on f(x 0 (the design density of the data. Because of this feature, Fan refers to the estimator as being Design-Adaptive Fan showed that in general, better to use odd-order local polynomial. There is no cost in variance, bias of odd-order does not depend on f(x. LLR estimator does not suffer from boundary bias problem How might we choose the bandwidth? Could choose the bandwidth to minimize the pointwise asymptotic MSE(AMSE Remarks: AMSE LLR = [h 2 n 1 2 g (x 0 k(ss 2 ds] 2 + f(x 0 1 σ 2 (x 0 n = 4h 3 n 1 4 g (x 0 2 [ k(ss 2 ds] 2 + f(x 0 1 σ 2 (x 0 nh 2 n 1 5 = f(x 0 1 σ 2 (x 0 g (x 0 2 k 2 (sds [ k(ss 2 ds] }{{ 2 } constant (1 higher variance σ 2 (x 0 wider bandwidth n 1 5 (2 more variability in function (g (x 0 wider bandwidth k 2 (sds k 2 (sds = 0 (3 more data narrower bandwidth (with kernel regression (i.e. local polynomial of degree 0, bias term also depends on f(x and therefore optimal bandwidth choice will depend on f(x (4 difficulty obtaining optimal bandwidth requires estimates of σ 2 (x 0, g (x 0 (and in the case of the usual regression estimator f(x 0. This is the problem of how to choose pilot bandwidth. One could assume normality in choosing a pilot for f(x 0 (this is Silverman s rule-of-thumb method. Could also use fixed bw or nearest neighbor for σ 2 (x 0, g (x 0

33 3.3. PROPERTIES OF NONPARAMETRIC REGRESSION ESTIMATORS29 Figure 3.3: Functions of different smoothness (5Could minimize AMISE = E (ĝ(x 0 g(x 0 dx 0 bandwidth and pick a global For further information on choosing the bandwidth, see below and see book by Fan and Gijbels (1996.

34 30 CHAPTER 3. NONPARAMETRIC REGRESSION

35 Chapter 4 Bandwidth Selection for Density Estimation 4.1 First Generation Methods Plug-in method (local version One could derive the optimal bandwidth at each point of evaluation, x 0, which would lead to a pointwise localized bandwidth estimator ĥ(x 0 = arg min AMSE ˆf(x ( 2 0 = E ˆf (x0 f (x 0 [ ] 2 = h4 4 f (x 0 2 k(ss 2 ds + 1 nh f(x 0 k 2 (sds ĥ(x 0 = [ k 2 (sds [ k(ss2 ds ] 2 f(x 0 (f (x 0 2 ] 1 5 n 1 5 We need to estimate f (x 0, f (x 0 to get optimal plug-in bandwidth, and the optimal bandwidth would be different for different points of estimation Global plug-in method Because the above method is computationally intensive, one could instead derive a optimal bandwidth for all the points of evaluation by minimizing a different criterion - the mean integrate squared error: 31

36 32CHAPTER 4. BANDWIDTH SELECTION FOR DENSITY ESTIMATION ( MISE ˆf(x 2 0 = E ˆf(x0 f(x 0 dx0 removes x 0 by integration = 1 f (x 0 dx 0 k 2 (sds + h4 nh 4 ( k(ss 2 ds 2 ( f (x 0 2 dx 0 }{{}}{{}}{{} =1 A B h 4 MISE = ( A B n 1 5 note: f (x 0 2 dx 0 is a measure of the variation in f(x 0. If f is highly variable then h will be small. Here, we still need estimate of f (x 0 but no longer require an estimate of f(x 0 (it was eliminated by integrating Rule-of-thumb Method (Silverman If x Normal then Silverman shows (in his book Density Estimation h 4 MISE n 1 5 SD(x SD = standard deviation This method if often applied to nonnormal data, but it has a tendency to oversmooth, particularly when data density is multimodel. Sometimes, the standard deviation is replaced by the 75%-25% interquartile range. This is an older method, now available in some software packages, but the performance is not so good. Other methods are better Global Method #2: Least Squares Cross-validation The idea is to minimize an estimate of the integrated squared error (ISE minise = h = ˆf(x 0 2 dx 0 } {{ } term #1 { ˆf(x 0 f(x 0 } 2 dx 0 2 ˆf(x 0 f(x 0 dx 0 + } {{ } term #2 f(x 0 2 dx 0 }{{} term doesnt depend on h

37 4.1. FIRST GENERATION METHODS 33 T erm #1 = 1 n 2 h 2 i j k( x i x 0 k( x j x 0 let s = x i x 0 x 0 = x j + sh = 1 k( x i n 2 h 2 i j h x j }{{ h} x i x 0 hn = 1 k( x i x j n 2 h h i j x 0 h = x j h + s sk(sds sk(sds Note: k(a sk(sds is the density of the sum of two random variables with density k (also known as the convolution Now examine Term #2: 2 ˆf(x0 f(x 0 dx 0 y = a + s Pr(y y 0 = Pr(a + s y 0 = y = An unbiased estimator for term (2 is 2 k( x i x j N(N 1 h i j i = Pr(a y 0 s F (y 0 sk(sds k(y 0 sk(sds 2 N 2 h k( x i x j h Note that we need to use leave-one-out estimator to get independence. ĥ CV = arg min 1 k( x i x j sk(sds n 2 h i j h }{{} need to calculate convolution for this part i j i 2 n 2 h k( x i x j i j i h }{{} ith point does not get used, = ˆf i (x i ˆf i (x i = }{{} ˆf(x i K(o 1 nh leave one out estimator

38 34CHAPTER 4. BANDWIDTH SELECTION FOR DENSITY ESTIMATION (1 turns out that ĥcv can only be estimated at rate n 1 10 (result due to Stone which is very slow. However, in Monte Carlo studies LSCV often does better that Rule-of-thumb method. (2 Marson (1990 found a tendency for local minima near small h areas, which lead to a tendency to undersmooth. (3 LSCV is usually implemented by searching over a grid of bandwidth values Biased Cross-validation AMISE = 1 nh k 2 (sds + h4 n 4 Γ 2 k(sds f (x 2 dx + o( 1 nh + h4 what if we plug in ˆf (x? note: ˆf(x = 1 nh ˆf (x = 1 i k( x i x h k ( x i x nh 2 i h ˆf (x = 1 k ( x i x nh 3 i h ˆf (x 2 = 1 n n k( x i x n 2 h 6 h j=1 k(x j x h What is bias in estimation b (x 2? E{ ˆf (x 2 } = = 1 nh 6 can show = f (x + 1 nh 5 1 E( k ( x i x n 2 h 6 i h k ( x j x dx j E ( x i x h 2 dx k (s 2 d + higher order

39 4.1. FIRST GENERATION METHODS 35 h BCV = arg min 1 nh k 2 (sds+ h4 4 s 2 k(sds[ ˆf (x 2 dx 1 k (sds ] nh } 5 {{} subtract this bias here Second Generation Methods Sheather and Jones(1991 Solve -the -equation Plug-in Method h SIP I = [ ] 1 k 2 5 (sds f g(h (x2 dx [ s 2 k(sds ] 2 n 1 5 find h that solves this We need to plug in an estimate for f (x. The bandwidth that is optimal for f(x is not same as optimal bandwidth for f We can find an analogue to AMSE for R(f = f (x 2 dx get [ ] g = C 1 (k{r(f } 1 7 C2 (k n 1 7 R(f = f (xdx enters into the formula for the optimal bandwidth for g C 1 (k and C 2 (k are functions of the kernel. R(f is f (x 2 dx Now solve for g as a function of h g(h = [ C 3 (k{ R(f ] R(f } 1 7 C4 (k h 5 7 Now we need an estimate of f. At this point, SJ suggest using a rule of thumb method, based on normal density assumption What is main advantage of this method over other methods? For most methods ĥ h MISE h MISE n p

40 36CHAPTER 4. BANDWIDTH SELECTION FOR DENSITY ESTIMATION Recall, for CV, p = 1. for SJPI, 10 ρ = 5 14 (close to 1 2 so rate of convergence of the bandwidth is faster. This method is available in some software packages.

41 Chapter 5 Bandwidth selection for Nonparametric Regression Recall that the distribution for the LLR estimator was ( 1 nhn (ĝ LLR (x 0 g(x 0 N 2 g (x 0 k 2 (ssds nh n h 2 n, f(x 0 1 σ 2 (x 0 k 2 (s ds 5.1 Plug-in Estimator (local method choose bandwidth to minimize the MSE data-dependent [ ] 2 [ ] 2 1 min 2 g (x 0 k(ss 2 ds h 4 n + f(x 0 1 σ 2 (x 0 k 2 (Sds nh }{{}} n {{} Variance=C 1 n 1 h 1 n h n = Bias 2 =C 0 h 4 n [ f(x0 1 σ 2 (x 0 k 2 (sds g (x 0 2 k(ss 2 ds ] 1 5 n 1 5 note: min C 0 h 4 n + C 1 n 1 h 1 n 4C 0 h 3 n C 1 n 1 h 2 n = 0 h n : h n(x 0 variable bandwidth h 5 n = C 1 n 1 4C 0 = 37 ( C1 4C n 1 5

42 38CHAPTER 5. BANDWIDTH SELECTION FOR NONPARAMETRIC REGRESSION Remarks: g high smaller bandwidth want smaller bandwidth in regions where function is highly variable σ 2 (x 0 high use a larger bandwidth 5.2 global Plug-in Estimator minimize global criterion s.a.mise h r n = { } σ 2 (x 0 dx 0 k (sds g n 1 (x Mf: Fan &Gijbels (1996 Local Polynomial Reg dx 0 k(ss2 ds 5.3 Bandwidth Choice in Nonparametric Regression y i = m(x i + ε i min n {y i α... β p (x i x p } k( x i x how to choose, given p 5.4 Global MISE criterion V AR(y x {}}{ (p + 1p! 2 R(k p σ 2 (xdx h MISE = µ p+1 (k p 2 m p+1 (x 2 f(xdx 1 2p+3 n 1 2p+3 for LLR, p = 1 and get n 1 5, for p odd Fan & Gijbels (1996 Local Polynomial Regression k p = function of kernel R(k p = k p (s 2 ds µ l (k p = µ l k p (µdµ [ = R(k σ 2 (xdx µ 2 (k 2 m (x 2 f(xdx ] 1 5 n 1 5

43 5.5. BLOCKING METHOD 39 unknowns are σ 2 (x, m (x, f(x Apply same plug-in method to estimate ˆm (x then take average Still need σ 2 (xdx 5.5 blocking method (1 divide x into N blocks (2 Estimate y = m(x + ε by Q-degree poly within each block σ 2 (N = (n sn 1 i j { } y i ˆm Q j (x i 1(x i X j 5.6 Bandwidth Selection for Nonparametric Regression- Cross-Validation Recall LLR estimator defined as Let y = x = y 1... y N â = arg min n (y i a b(x i x 0 2 k( x i x 0 w = 1 (x 1 x (x N x 0 ( x k 1 x k N Z ( x N x 0 N N ĝ LLR (x 0 = [1 0](x wx 1 (x wy coefficient from a weighted least squares problem. Let weights depend on x 0, h

44 40CHAPTER 5. BANDWIDTH SELECTION FOR NONPARAMETRIC REGRESSION H(h = [1 0] (x wx 1 x w NN N ĝ LLR (x 0 = H(h y 1 N choose h to min E((ĝ LLR (x 0 g(x 2 x minimize conditional variance at each point of evaluation = E((H (h y g (H(hy g x = E((H (h g + H (h ε g ((H (h g + H (h ε g x = E(((H (h Ig + H (h ε ((H (h Ig + H (h ε x = g (H (h I (H (h Ig + E( ε 1 N H(h H(h ɛ x recall trab = trba N 1 1 N N 1 = g (H (h I (H (h Ig + E(tr(ε H(h H(hɛ x tr(a + B = tr(a + tr(b = g (H (h I (H (h Ig + E(tr(εε tr(h(h H(h x trace is a linear mapping = g (H (h I (H (h Ig + nσ 2 εtr(h(h H(h need estimator for this Consider what is estimated by (y H(hy (y H(hy = y (I H (h (I H (hy sum of fitted residuals = (g + ɛ (I H (h (I H (h(g + ε { } +g = (I H (h (I H (hg + g (I H (h (I H (hε +ε (I H (h (I H (hg + ε (I H (h (I H (hε E(g (I H (h (I H (hg x = g (I H (h (I H (hg E(g (I H (h (I H (hε x = 0 E(ε (I H (h (I H (hε x = E(tr(ε (I H (h (I H (hε x = E(tr(εε tr ((I H (h (I H (h x = nσ 2 tr ((I H (h (I H (h = nσ 2 [tri 2tr(H (htri + trh(h H(h] together, (y H(hy (y H(hy = g (I H (h (I H (hg + nσ 2 tri 2nσ 2 trh (h thi + nσ }{{} 2 trh (h H (h don t want this term if we use ĝ i (x i instead of ĝ (x i (leave-one-out estimator

45 5.6. BANDWIDTH SELECTION FOR NONPARAMETRIC REGRESSION- CROSS-VALIDATION41 then trh (h = 0 n ĥ CV = arg min (y i ĝ i (x i 2 h H

Nonparametric Density and Regression Estimation

Nonparametric Density and Regression Estimation Nonparametrc Densty and Regresson Estmaton February 2, 2007 Densty Estmaton. Hstogram Estmator The smplest nonparametrc densty estmator s the hstogram estmator. Suppose we estmate densty by a hstogram

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

4 Nonparametric Regression

4 Nonparametric Regression 4 Nonparametric Regression 4.1 Univariate Kernel Regression An important question in many fields of science is the relation between two variables, say X and Y. Regression analysis is concerned with the

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

3 Nonparametric Density Estimation

3 Nonparametric Density Estimation 3 Nonparametric Density Estimation Example: Income distribution Source: U.K. Family Expenditure Survey (FES) 1968-1995 Approximately 7000 British Households per year For each household many different variables

More information

Kernel density estimation

Kernel density estimation Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

ECO Class 6 Nonparametric Econometrics

ECO Class 6 Nonparametric Econometrics ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

Adaptive Kernel Estimation of The Hazard Rate Function

Adaptive Kernel Estimation of The Hazard Rate Function Adaptive Kernel Estimation of The Hazard Rate Function Raid Salha Department of Mathematics, Islamic University of Gaza, Palestine, e-mail: rbsalha@mail.iugaza.edu Abstract In this paper, we generalized

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)

EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) 1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

Linear regression COMS 4771

Linear regression COMS 4771 Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between

More information

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i NADARAYA WATSON ESTIMATE JAN 0, 2006: version 2 DATA: (x i, Y i, i =,..., n. ESTIMATE E(Y x = m(x by n i= ˆm (x = Y ik ( x i x n i= K ( x i x EXAMPLES OF K: K(u = I{ u c} (uniform or box kernel K(u = u

More information

Regression #3: Properties of OLS Estimator

Regression #3: Properties of OLS Estimator Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

On the Robust Modal Local Polynomial Regression

On the Robust Modal Local Polynomial Regression International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques

Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques Hung Chen hchen@math.ntu.edu.tw Department of Mathematics National Taiwan University 3rd March 2004 Meet at NS 104 On Wednesday

More information

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density

More information

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example

Example continued. Math 425 Intro to Probability Lecture 37. Example continued. Example continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different

More information

An introduction to nonparametric and semi-parametric econometric methods

An introduction to nonparametric and semi-parametric econometric methods An introduction to nonparametric and semi-parametric econometric methods Robert Breunig Australian National University robert.breunig@anu.edu.au http://econrsss.anu.edu.au/staff/breunig/course_bb.htm March

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design

Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.

Unbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it

More information

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET

AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface

More information

Optimal Bandwidth Choice for the Regression Discontinuity Estimator

Optimal Bandwidth Choice for the Regression Discontinuity Estimator Optimal Bandwidth Choice for the Regression Discontinuity Estimator Guido Imbens and Karthik Kalyanaraman First Draft: June 8 This Draft: September Abstract We investigate the choice of the bandwidth for

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

Nonparametric Regression. Changliang Zou

Nonparametric Regression. Changliang Zou Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Log-Density Estimation with Application to Approximate Likelihood Inference

Log-Density Estimation with Application to Approximate Likelihood Inference Log-Density Estimation with Application to Approximate Likelihood Inference Martin Hazelton 1 Institute of Fundamental Sciences Massey University 19 November 2015 1 Email: m.hazelton@massey.ac.nz WWPMS,

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS

ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS A COMPARISON OF TWO NONPARAMETRIC DENSITY ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL MENGJUE TANG A THESIS IN THE DEPARTMENT OF MATHEMATICS AND STATISTICS PRESENTED IN PARTIAL FULFILLMENT OF THE

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Nonparametric Estimation of Luminosity Functions

Nonparametric Estimation of Luminosity Functions x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number

More information

On Asymptotic Normality of the Local Polynomial Regression Estimator with Stochastic Bandwidths 1. Carlos Martins-Filho.

On Asymptotic Normality of the Local Polynomial Regression Estimator with Stochastic Bandwidths 1. Carlos Martins-Filho. On Asymptotic Normality of the Local Polynomial Regression Estimator with Stochastic Bandwidths 1 Carlos Martins-Filho Department of Economics IFPRI University of Colorado 2033 K Street NW Boulder, CO

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

Section Properties of Rational Expressions

Section Properties of Rational Expressions 88 Section. - Properties of Rational Expressions Recall that a rational number is any number that can be written as the ratio of two integers where the integer in the denominator cannot be. Rational Numbers:

More information

Week 9 The Central Limit Theorem and Estimation Concepts

Week 9 The Central Limit Theorem and Estimation Concepts Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population

More information

Lecture 02 Linear classification methods I

Lecture 02 Linear classification methods I Lecture 02 Linear classification methods I 22 January 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/32 Coursewebsite: A copy of the whole course syllabus, including a more detailed description of

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch

From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch From Histograms to Multivariate Polynomial Histograms and Shape Estimation Assoc Prof Inge Koch Statistics, School of Mathematical Sciences University of Adelaide Inge Koch (UNSW, Adelaide) Poly Histograms

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Integral approximation by kernel smoothing

Integral approximation by kernel smoothing Integral approximation by kernel smoothing François Portier Université catholique de Louvain - ISBA August, 29 2014 In collaboration with Bernard Delyon Topic of the talk: Given ϕ : R d R, estimation of

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression

More information

The Priestley-Chao Estimator - Bias, Variance and Mean-Square Error

The Priestley-Chao Estimator - Bias, Variance and Mean-Square Error The Priestley-Chao Estimator - Bias, Variance and Mean-Square Error Bias, variance and mse properties In the previous section we saw that the eact mean and variance of the Pristley-Chao estimator ˆm()

More information

High-dimensional regression

High-dimensional regression High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and

More information

A New Method for Varying Adaptive Bandwidth Selection

A New Method for Varying Adaptive Bandwidth Selection IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive

More information

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives

More information

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 B Technical details B.1 Variance of ˆf dec in the ersmooth case We derive the variance in the case where ϕ K (t) = ( 1 t 2) κ I( 1 t 1), i.e. we take K as

More information

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari

MS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari MS&E 226: Small Data Lecture 6: Bias and variance (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 47 Our plan today We saw in last lecture that model scoring methods seem to be trading o two di erent

More information

Simple and Efficient Improvements of Multivariate Local Linear Regression

Simple and Efficient Improvements of Multivariate Local Linear Regression Journal of Multivariate Analysis Simple and Efficient Improvements of Multivariate Local Linear Regression Ming-Yen Cheng 1 and Liang Peng Abstract This paper studies improvements of multivariate local

More information

Bandwidth selection for kernel conditional density

Bandwidth selection for kernel conditional density Bandwidth selection for kernel conditional density estimation David M Bashtannyk and Rob J Hyndman 1 10 August 1999 Abstract: We consider bandwidth selection for the kernel estimator of conditional density

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018

Econometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018 Econometrics I Lecture 10: Nonparametric Estimation with Kernels Paul T. Scott NYU Stern Fall 2018 Paul T. Scott NYU Stern Econometrics I Fall 2018 1 / 12 Nonparametric Regression: Intuition Let s get

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS

ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS By YOUSEF FAYZ M ALHARBI A thesis submitted to The University of Birmingham for the Degree of DOCTOR OF PHILOSOPHY School of Mathematics The

More information

SEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH. Mustafa Koroglu. A Thesis presented to The University of Guelph

SEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH. Mustafa Koroglu. A Thesis presented to The University of Guelph SEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH by Mustafa Koroglu A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Doctor of Philosophy in Economics

More information

Ch 2: Simple Linear Regression

Ch 2: Simple Linear Regression Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Nonparametric confidence intervals. for receiver operating characteristic curves

Nonparametric confidence intervals. for receiver operating characteristic curves Nonparametric confidence intervals for receiver operating characteristic curves Peter G. Hall 1, Rob J. Hyndman 2, and Yanan Fan 3 5 December 2003 Abstract: We study methods for constructing confidence

More information

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

On variable bandwidth kernel density estimation

On variable bandwidth kernel density estimation JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced

More information

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes

Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract Convergence

More information

1 Empirical Likelihood

1 Empirical Likelihood Empirical Likelihood February 3, 2016 Debdeep Pati 1 Empirical Likelihood Empirical likelihood a nonparametric method without having to assume the form of the underlying distribution. It retains some of

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants

Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:

More information

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)

Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

Lecture Notes 15 Prediction Chapters 13, 22, 20.4.

Lecture Notes 15 Prediction Chapters 13, 22, 20.4. Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

Single Equation Linear GMM with Serially Correlated Moment Conditions

Single Equation Linear GMM with Serially Correlated Moment Conditions Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )

More information