ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd
|
|
- Beryl May
- 6 years ago
- Views:
Transcription
1 ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014
2 2
3 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation Histogram Estimator What is the variance and bias of this estimator? Selecting the bin size Symmetric Binning Histogram Estimator Standard kernel density estimator Bias Variance Choice of Bandwidth Alternative Estimator- k nearest neighbor Asymptotic Distribution of Kernel Density Nonparametric Regression Overview The Local approach Two types of potential problems Interpretation of Local Polynomial Estimators as a local regression Properties of Nonparametric Regression Estimators Consistency Asymptotic Normality How might we choose the bandwidth? Bandwidth Selection for Density Estimation First Generation Methods Plug-in method (local version i
4 ii CONTENTS Global plug-in method Rule-of-thumb Method (Silverman Global Method #2: Least Squares Cross-validation Biased Cross-validation Second Generation Methods Bandwidth selection for Nonparametric Regression Plug-in Estimator (local method global Plug-in Estimator Bandwidth Choice in Nonparametric Regression Global MISE criterion blocking method Bandwidth Selection for Nonparametric Regression- Cross- Validation
5 Chapter 1 Review of Stochastic Order Symbols Let X n be a sequence of random variables. Then and X n = o P (1 if X n P 0 as n X n = O P (a n if X n /a n is stochastically bounded. Recall that X n /a n is stochastically bounded if ɛ > 0, M < : P ( X n /a n > M < ɛ, n If a sequence is o p (1 then it is O p (1. Convergence in probability implies stochastic boundedness but not vice versa. 1
6 2 CHAPTER 1. REVIEW OF STOCHASTIC ORDER SYMBOLS
7 Chapter 2 Nonparametric Density Estimation Nonparametric density estimators allow estimation of densities without having to specify the functional form of the density. Instead, weaker assumptions such as continuity and differentiability can be assumed. 2.1 Histogram Estimator The most commonly used nonparametric density estimator is the histogram estimator. Suppose we estimate density by a histogram with bin width equal to h Proportion of data in bin: 1 n n 1(x i [x 0, x 0 + h] ˆf(x 0 h This suggests a density estimator: ˆf(x 0 = 1 nh n 1(x i [x 0, x 0 + h] 3
8 4 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION What is the variance and bias of this estimator? Bias E ˆf(x 0 = 1 h E(1(x i [x 0, x 0 + h] = 1 h x0 +h 1(x i [x 0, x 0 + h]f(x i dx i = 1 {f(x 0 + f ( x(x i x 0 }dx i x [x i, x 0 ] h x 0 = f(x 0 1 x0 +h [ x0 +h ] 1 dx i + f ( x(x i x 0 dx i h x } 0 x {{} 0 h }{{} 1 R(x 0,h Assume the first derivative is bounded: f (x c < x In that case, the remainder term can be bounded from above by R(x 0, h x0 +h x 0 f ( x x i x 0 dx i h C h 2 1 h = O p(h Thus, E ˆf(x0 f(x 0 Ch Consistency of the histogram requires that h 0 as n. Note that these derivations assumed f is differentiable and derivative is bounded.
9 2.1. HISTOGRAM ESTIMATOR 5 Variance Mean-square consistency requires that var 0 as well. If observations are iid, we can write var ˆf(x n 0 = 1 var{1(x i [x 0, x 0 + h]} n 2 h 2 We will show that the variance is bounded above by something that goes to zero (below, it is bounded by 0. var{1(x i [x 0, x 0 + h]} = E ( 1(x i [x 0, x 0 + h] 2 E(1( 2 E ( 1(x i [x 0, x 0 + h] 2 E (1(x i [x 0, x 0 + h] We already analyzed this term above in studying the bias. We showed that Assume that f(x 0 C, so, E(1(x i [x 0, x 0 + h] = hf(x 0 + O p (h 2 var ˆf(x 0 C n h n 2 h 2 = C nh = O p( 1 nh Consistency therefore requires nh Remarks (i The estimator is not root-n consistent, because the variance gets large as h 0 (ii For consistency, we require both h 0 and n h Selecting the bin size How might you choose h? One way is to minimize the mean-squared error (MSE, focussing on the lowest order terms (the terms that converge the slowest: MSE = C 2 h 2 + C nh
10 6 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION h = 2 C2 h + C nh 2 = 0 2 C 2 h 3 = + C n h = [ C 2C ] n 3 }{{} optimal choice of bin width 2.2 Symmetric Binning Histogram Estimator Instead of constructing the bins as before, consider constructing the bins symmetrically around point of evaluation: ˆf(x 0 = 1 n n 1 2h 1( x 0 x i < h Can rewrite as ˆf(x 0 = 1 n 1 nh 2 1( x 0 x i h < 1 }{{} uniform kernel Note that the area under the kernel function integrates to 1 and the kernel function is symmetric.
11 2.3. STANDARD KERNEL DENSITY ESTIMATOR 7 Figure 2.1: Symmetric uniform kernel function 2.3 Standard kernel density estimator Instead of uniform kernel, use kernel with weights that give more weight to closer observations and also go smoothly to zero. We also require that it integrates to 1 and is symmetric. Let Examples: s = x 0 x i h k(s = (s2 1 2 if s 1 biweight or quartic kernel = 0 else k(s = 1 2π e s2 2 normal kernel With the normal kernel, the weights are always positive. The so-called Nadaraya-Watson Kernel Estimator is given by ˆf(x 0 = 1 n k( x 0 x i n
12 8 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION Figure 2.2: Kernel function with weights that go smoothly to zero Bias E ˆf(x 0 = 1 k( x 0 x i f(x i dx i let s = x 0 x i ds = 1 x i = x 0 s when x i =, s i = = = k(sf(x 0 s ds k(sf(x 0 s ds Assume f is r times differentiable with bounded r th derivative f(x 0 s = f(x 0 + f (x 0 ( s f (x 0 h 2 ns r! f r (x 0 ( s r + 1 r! [f r ( x f r (x 0 ]( s r }{{} Remainder term R(x 0,s If k(ssr = 0 R = 1..r
13 2.3. STANDARD KERNEL DENSITY ESTIMATOR 9 then bias is k(sr(x 0, sds h r nc 2 s r k(sds (since we assumed bounded rth derivative Thus, this estimator improves on histogram if some moments equal 0. Typically, k satisfies If we require k(sds = 1 k(ssds = 0 k(ss 2 ds = 0 Then k(s must be negative in places, as in the figure below. Remark If f r satisfies a lipschitz condition i.e. f r ( x f r (x 0 m x x 0 then Variance V ar ˆf(x 0 = 1 = 1 nh 2 n Bias C 0 h r+1 n [Ek 2 ( x 0 x i C 1 nh = O p ( 1 n n 2 h 2 n n V ark( x 0 x i s r k(sds = O p (h r+1 n }{{} higher order Ek( x 0 x i 2 ] can show using same techniques does not improve on histogram
14 10 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION Figure 2.3: Kernel function that is sometimes negative Choice of Bandwidth Choose h to minimize the asymptotic mean-squared error (AMSE: minamse(h = C 2 h 2h 2r n + C 1 n h = C2 2 2r h 2r 1 n C 1 nh 2 n h 2r+1 n = C 1 C 2 2 2rn = 0 = ( C 1 2rC 2 1 2r+1 (n 1 2r+1 shrinks to zero as n grows large Now, let s look at order of AMSE (without making the Lipschitz assumption. Plug in the optimal value for that was obtained above. Let C 3 = C 1 2rC 2 1 2r+1.
15 2.4. ALTERNATIVE ESTIMATOR- K NEAREST NEIGHBOR 11 MSE(h k = C 2 C 2r 3 n 2r 2r+1 + C 1 n C 3 n 1 2r+1 = C 2 C3 2r n 2r 2r+1 = O p (n 2r 2r+1 + C 1 C3 1 n 2r 2r+1 so MSE = O p (n r 2r + 1 < 1 2 2r+1 r (parametric rate is n 1 2 Recall ( ˆβ ols β 0 (0, σ 2 ( X X N 1 N 1 }{{} V ar RMSE = MSE is O p (N 1 2 How do you generalize kernel density estimator to more dimensions? ˆf(x 0, z 0 = 1 n,x,z n k 1 ( x 0 x i k 2 ( z 0 z i,z,x 2.4 Alternative Estimator- k nearest neighbor Alternatively, the bandwidth can be chosen so as to include a certain number of neighbors in each point estimation. In this case, the bandwidth will be variable:
16 12 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION For m-dimensional data, f(x 0 2r k = #data in bin=k 1 n ˆf(x 0 = k 1 2r k n ˆf(x 0 = k 1 (2r k m n 2.5 Asymptotic Distribution of Kernel Density We already did consistency (we showed conditions required for conv. in mean square ˆf(x 0 = 1 n k( x 0 x i p n h E ˆf(x 0 f(x 0 as h 0 n by LLN V AR ˆf(x 0 p 0 as n Now, asymptotic distribution nhn ( 1 n k( x 0 x i f(x 0 = n ( 1 n k( x 0 x i E k( x 0 x i n n (1 + n ( 1 n E k( x 0 x i f(x 0 n (2
17 2.5. ASYMPTOTIC DISTRIBUTION OF KERNEL DENSITY 13 Analyze term (1: can find a CLT for this part V AR 1 k( x 0 x i hn Assume 1 n n n [ 1 hn k( x 0 x i N (0, V AR 1 hn k( x 0 x i = 1 E (k( x 0 x i 2 h }{{ n } k(sds = 1, f(x 0 k2 (sds k(ssds = 0 [ 1 O(h 2 n] 0 as 0 Term (1 N(0, f(x 0 k 2 (sds 1 E( k( x ] 0 x i hn 1 E (k( x 0 x i 2 Now, show Term (2 p 0 (will require more stringent conditions on the bandwidth nhn ( 1 n E k( x 0 x i f(x 0 = n {E 1 k( x 0 x i f(x 0 } n = 1 = k( x 0 x i f(x i dx i s = x 0 x i k(sf(x 0 s ds = k(s[f(x 0 + f (x 0 ( s + f ( x s 2 h 2 2 n]ds = f(x 0 k(sds + f (x 0 k(ssds + h 2 n x between x 0 and x 0 + sh. ds = dx i f ( x k(ss 2 ds 2 Assume f ( x M and k(ssds = 0 (satisfied if k is symmetric.
18 14 CHAPTER 2. NONPARAMETRIC DENSITY ESTIMATION then E 1 ( x0 x i k f(x 0 is O p (h 2 h n n which implies that nhn E( 1 k( x 0 x i f(x 0 is O p ( n h 2 n = O p ( nh 5 n nhn E( 1 k( x 0 x i f(x 0 = O p ( nh 5 n Hence, we require nh 5 n 0 in addition to 0, n. The required condition for asymptotic normality is stronger than for consistency.
19 Chapter 3 Nonparametric Regression 3.1 Overview Consider the model: where we assume y i = g(x i + ɛ i E(ɛ i x i = 0 E(ɛ 2 i x i = c < and we would like to estimated g(x without having to impose functional form assumptions on g, except assumptions such as continuity and differentiability. There are two types of nonparametric estimation methods: local and global. These two approaches reflect two different ways to reduce the problem of estimating a function into estimation of real numbers. Local approaches consider a real valued function g (x at a single point x = x 0. The problem of estimating a function becomes estimating a real number h (x 0. If we are interested in evaluating the function in the neighborhood of the point x 0, we can approximate the function by g(x 0 or, if g (x is continuously differentiable at x 0, then a better approximation might be g (x 0 + g (x 0 (x x 0. Thus, the problem of estimating a function at a 15
20 16 CHAPTER 3. NONPARAMETRIC REGRESSION point may be thought of as estimating two real numbers g (x 0 and g (x 0, making use of observations in the neighborhood. Either way, if we want to estimate the function over a wider range of x values, the same, pointwise problem can be solved at the different points of evaluation. Global approaches introduce a coordinate system in a space of functions, which reduces the problem of estimating a function into that of estimating a set of real numbers. Any element v in a d-dimensional vector space can be uniquely expressed using a system of independent vectors {b j } d j=1 as v = d j=1 θ j b j, where one can think of {b j } d j=1 as a system of coordinates and (θ 1,..., θ d as the representation of v using the coordinate system. Likewise, using an appropriate set of linearly independent functions {φ j (x} j=1 as coordinates any square integrable real valued function can be uniquely expressed by a set of coefficients. That is, given an appropriate set of linearly independent functions {φ j (x} j=1, any square integrable function g (x has unique coefficients {θ j } j=1 such that g (x = θ j φ j (x. j=1 One can think of {φ j (x} j=1 as a system of coordinates and (θ 1, θ 2,... as the representation of g (x using the coordinate system. This observation allows us to translate the problem of estimating a function into a problem of estimating a sequence of real numbers {θ j } j=1. Well known bases are polynomial series and Fourier series. These bases are infinitely differentiable everywhere. Other well known bases are polynomial spline bases and wavelet bases. The literature on nonparametric estimation methods is vast. These notes will focus on local methods, in particular, kernel-based methods, which are among the most commonly used (at least by economists!. 3.2 The Local approach Examples: kernel regression, local polynomial regression Early smoothing methods date back to the 1860 s in actuarial literature (see Cleveland (1979, JASA, where they were used to study life expectancies at different ages.
21 3.2. THE LOCAL APPROACH 17 One possibility is to estimate the conditional mean by: E(y x = Ê(y x = yf(y xdy = y ˆf(x, y ˆf(x dy. f(x, y y f(x dy Or, with repeated data at each point, could estimate by g(x 0 by ĝ(x 0 = y i 1(x i = x 0 i 1(x i = x 0 i A drawback is that we can only estimate this way at points where we have data. Need a way of interpolating or extrapolating between and outside data observations. We could create cells as in the histogram estimator: ĝ(x 0 = = n y i 1(x i [x 0, x 0 + ] n 1(x i [x 0, x 0 + ] n y i 1( x 0 x i < 1 n 1( x 0 x i < 1 If we want ĝ(x to be smooth, we need to choose a kernel function that goes to 0 smoothly
22 18 CHAPTER 3. NONPARAMETRIC REGRESSION Figure 3.1: Repeated data estimator ĝ(x 0 = = n y i k( x 0 x i n k( x 0 x i = n y i w i (x 0 n y i k( x 0 x i ( g(x 0 f(x 0 n ( f(x 0 1 n 1 n k( x 0 x i
23 3.2. THE LOCAL APPROACH 19 where w i (x 0 = k( x 0 x i n k( x 0 x i note : n w i (x 0 = 1 We don t need k(sds = 1, but we do require that kernel used in numerator and denominator integrate to the same value Two types of potential problems (i When density at x 0 is low, the denominator tends to be close to 0 and we get a division by zero problem. (iiat boundary points, the estimator has higher order bias, even with appropriately chosen k(. This problem is known as boundary bias Interpretation of Local Polynomial Estimators as a local regression Solve this problem at each point of evaluation, x 0 (which may or may not correspond to points in the data: n â : argmin (y i a b 1 (x i x 0 b 2 (x i x b k (x i x 0 k 2 k i {a,b 1,..,b k } ( x0 x i k i = k ˆb1 provides an estimator for g (x 0 ˆb2 provides an estimator for g (x 0 2 if k = 0, â = n y i k( x 0 x i hn n k( x 0 x i = n y i w i (x 0 hn standard Nadaraya-Watson kernel regression estimator where n w i (x 0 = 1
24 20 CHAPTER 3. NONPARAMETRIC REGRESSION if k = 1 where â = n n y i k i (x j x 0 2 n y k k k (x k x 0 n k l (x l x 0 j=1 k=1 l=1 n n k i k j (x j x 0 2 { n k k (x k x 0 } 2 j=1 k=1 k i = k( x 0 x i We can again write the expression as n y i w i (x 0 where the weights satisfy and n w i (x 0 = 1 n w i (x 0 (x i x 0 = 0 k = 0 weights for the standard Nadaraya-Watson kernel estimator do not satisfy this second property. The LLR estimator (k=1 improves on kernel estimator in two ways (a the asymptotic bias does not depend on design density of the data (i.e. f(x does not appear in the bias expression. (b the bias has a higher order of convergence at boundary points Intuitively, the standard kernel estimator fits a local constant, so when there is no data on the other side, the estimate will tend to be too high or too low. It does not capture the slope. LLR gets the slope. y i = g(x i + ε i
25 3.2. THE LOCAL APPROACH 21 Figure 3.2: Boundary bias - How LLR improves on kernel regression ĝ LLR (x 0 g(x 0 = n w i (x 0 y i g(x 0 = n w i (x 0 ε i + n w i (x 0 (g(x i g(x 0 = n w i (x 0 ε i + g (x 0 (x i x 0 + [g ( x g (x 0 ] (x i x 0 = n w i (x 0 ε i + g (x 0 n w i (x 0 (x i x 0 + }{{}}{{} variance part =0 with LLR + n w i (x 0 (g ( x g (x 0 }{{} (x i x 0
26 22 CHAPTER 3. NONPARAMETRIC REGRESSION If g satisfies a Lipschitz condition, the last term can be bounded. 3.3 Properties of Nonparametric Regression Estimators 1. Consistency (will show convergence in mean square 2. Asymptotic Normality In showing the asymptotic properties, we will work with the standard kernel regression estimator, because it is simpler to analyze than LLR and the basic techniques are the same. Assume ˆm(x 0 = 1 n y i k( x 0 x i i 1 n k( x 0 x i i (i 0, n (ii sk(sds = 0 k(sds 0 k(ss2 ds < C (iii f(x and g(x are c 2 with bounded second derivatives Consistency We already showed 1 k( x 0 x i MS f(x 0 k(sds (denominator n i will now show 1 y i k( x 0 x i MS f(x 0 g(x 0 k(sds (numerator n i Then apply Mann.Wald theorem (plim of a continuous function is continuous function of plim to conclude ratio P g(x 0 Denominator:
27 3.3. PROPERTIES OF NONPARAMETRIC REGRESSION ESTIMATORS23 V AR( 1 n k( x 0 x i 1 n n 2 h n 2 E(k2 ( x 0 x i = 1 nh O p(h 2 n = O p ( 1 nh Thus, 1 k( x 0 x i MS E( 1 k( x 0 x i n i n i h n = f(x 0 k(sds + O p ( Numerator: E( 1 n y i k( x 0 x i = 1 E(E(y i x i k( x 0 x i n Also, can show Hence, = 1 E(g (x i k( x 0 x i = 1 g (x i k( x 0 x i f(x i dx i Let φ(x i = f (x i g(x i and Taylor expand (as before = φ (x 0 where φ(x 0 = g(x 0 f(x 0 k(sds + O p ( V AR( 1 ( n x0 x i y i k n c n (3.1 c n 0 as n (3.2
28 24 CHAPTER 3. NONPARAMETRIC REGRESSION ratio g (x 0 f (x 0 k(sds f (x 0 k(sds = g (x 0 provided that 0 n Asymptotic Normality Show nhn (ĝ(x 0 g(x 0 N {[ ] 1 2 g (x 0 + g (x 0 f (x 0 } h 2 n nhn u 2 k(udu, f(x 0 1 σ 2 (x 0 k 2 (u du f (x 0 where σ 2 (x 0 = E (ε 2 i x i = x 0 conditional variance [ ] yi k i nhn g (x 0 = [ ] (yi g(x 0 k i n ki ki (3.3 y i = g(x i + ε i E(ε i x i = 0 = εi k i n ki }{{} variance part = n ( 1 + (g(xi g(x 0 k i n ki }{{} bias part n εi k i 1 n ki + n ( 1 n (g(xi g(x 0 k i 1 ki n Assuming k(sds = 1, we showed before that
29 3.3. PROPERTIES OF NONPARAMETRIC REGRESSION ESTIMATORS25 1 n ki MS f(x 0 + O( We will now analyze the distribution of the numerator 1 nhn εi k i = 1 ε i k( x 0 x i = 1 1 ε i k( x 0 x i n nhn n hn i i and apply CLT to get N ( ( 1 0, V AR ε i k( x 0 x i hn (3.4 ( 1 V AR ε i k( x 0 x i = 1 ( E ε 2 i k 2 ( x 0 x i + higher order terms hn = 1 E ( E(ε 2 i x i k 2 ( x 0 x i =σ 2 (x i = 1 σ 2 (x i k 2 ( x 0 x i f(x i dx i h n = σ 2 (x 0 f(x 0 k 2 (sds + O( by change of var Now, analyze the bias term
30 26 CHAPTER 3. NONPARAMETRIC REGRESSION 1 nhn (g(xi g(x 0 k i = 1 (g(x i g(x 0 k( x 0 x i n nhn 1 nhn n i (g(x i g(x 0 k( x 0 x i f(x i dx i = n (g(x 0 s g(x 0 k(s f(x 0 s ds = n {g (x 0 ( s g (x 0 s 2 h 2 n [g ( x g (x 0 ]s 2 h 2 n} +higher order (will ignore {f(x 0 s f (x 0 + s2 h 2 n 2 f ( x}k(sds if g satisfies lipschitz condition g ( x g (x 0 C x x 0 then [g ( x g (x 0 ] 2 term will be higher order -other terms =0 { }}{ = g f sk(sds + g f h 2 n g (x 0 f(x 0 h 2 n s 2 k(sds g f ( x h 3 n 2 }{{} higher order k(ss 3 ds k(ss 2 ds + higher order terms get nhn (g(xi g(x 0 k i n [g (x 0 f (x g (x 0 f(x 0 ] h 2 n k(ss 2 ds
31 3.3. PROPERTIES OF NONPARAMETRIC REGRESSION ESTIMATORS27 require nhn h 2 n 0 i.e. nh 5 n 0 Putting these results together, we get nhn (ĝ(x 0 g(x 0 N ( [ g (x 0 f (x f(x 2 g (x 0 ] n h 2 n k(ss 2 ds, f(x 0 1 σ 2 (x 0 k(s 2 ds (which is what we set out to show Remarks: Getting asymptotic distribution requires plug-in estimator of g, f, f, g, σ 2 (x 0 (all the ingredients of the bias and variance g -obtain by local linear or local quadratic regression f -derivative of estimator for f gives estimator of f ˆf (x 0 = 1 n k ( x 0 x i n f obtain by standard density estimator k2 (sds, k(ssds constants that depend on kernel function k(s σ 2 (x 0 conditional variance. Estimate by Ê(ε 2 i x 0 = wi (x 0 ˆε 2 i wi (x 0 fitted residuals ŷ ˆm(x i = ˆε i We showed distribution of kernel estimator. LLR? Fan (JASA, 1992 showed What about distribution of nhn (ĝ LLR (x 0 g(x 0 = N 1 2 g (x 0 k(ss 2 ds nh n h 2 n, f(x 0 1 σ 2 (x 0 k 2 (sds }{{}}{{} one less term in bias expression same variance Remarks:
32 28 CHAPTER 3. NONPARAMETRIC REGRESSION Bias of LLR does not depend on f(x 0 (the design density of the data. Because of this feature, Fan refers to the estimator as being Design-Adaptive Fan showed that in general, better to use odd-order local polynomial. There is no cost in variance, bias of odd-order does not depend on f(x. LLR estimator does not suffer from boundary bias problem How might we choose the bandwidth? Could choose the bandwidth to minimize the pointwise asymptotic MSE(AMSE Remarks: AMSE LLR = [h 2 n 1 2 g (x 0 k(ss 2 ds] 2 + f(x 0 1 σ 2 (x 0 n = 4h 3 n 1 4 g (x 0 2 [ k(ss 2 ds] 2 + f(x 0 1 σ 2 (x 0 nh 2 n 1 5 = f(x 0 1 σ 2 (x 0 g (x 0 2 k 2 (sds [ k(ss 2 ds] }{{ 2 } constant (1 higher variance σ 2 (x 0 wider bandwidth n 1 5 (2 more variability in function (g (x 0 wider bandwidth k 2 (sds k 2 (sds = 0 (3 more data narrower bandwidth (with kernel regression (i.e. local polynomial of degree 0, bias term also depends on f(x and therefore optimal bandwidth choice will depend on f(x (4 difficulty obtaining optimal bandwidth requires estimates of σ 2 (x 0, g (x 0 (and in the case of the usual regression estimator f(x 0. This is the problem of how to choose pilot bandwidth. One could assume normality in choosing a pilot for f(x 0 (this is Silverman s rule-of-thumb method. Could also use fixed bw or nearest neighbor for σ 2 (x 0, g (x 0
33 3.3. PROPERTIES OF NONPARAMETRIC REGRESSION ESTIMATORS29 Figure 3.3: Functions of different smoothness (5Could minimize AMISE = E (ĝ(x 0 g(x 0 dx 0 bandwidth and pick a global For further information on choosing the bandwidth, see below and see book by Fan and Gijbels (1996.
34 30 CHAPTER 3. NONPARAMETRIC REGRESSION
35 Chapter 4 Bandwidth Selection for Density Estimation 4.1 First Generation Methods Plug-in method (local version One could derive the optimal bandwidth at each point of evaluation, x 0, which would lead to a pointwise localized bandwidth estimator ĥ(x 0 = arg min AMSE ˆf(x ( 2 0 = E ˆf (x0 f (x 0 [ ] 2 = h4 4 f (x 0 2 k(ss 2 ds + 1 nh f(x 0 k 2 (sds ĥ(x 0 = [ k 2 (sds [ k(ss2 ds ] 2 f(x 0 (f (x 0 2 ] 1 5 n 1 5 We need to estimate f (x 0, f (x 0 to get optimal plug-in bandwidth, and the optimal bandwidth would be different for different points of estimation Global plug-in method Because the above method is computationally intensive, one could instead derive a optimal bandwidth for all the points of evaluation by minimizing a different criterion - the mean integrate squared error: 31
36 32CHAPTER 4. BANDWIDTH SELECTION FOR DENSITY ESTIMATION ( MISE ˆf(x 2 0 = E ˆf(x0 f(x 0 dx0 removes x 0 by integration = 1 f (x 0 dx 0 k 2 (sds + h4 nh 4 ( k(ss 2 ds 2 ( f (x 0 2 dx 0 }{{}}{{}}{{} =1 A B h 4 MISE = ( A B n 1 5 note: f (x 0 2 dx 0 is a measure of the variation in f(x 0. If f is highly variable then h will be small. Here, we still need estimate of f (x 0 but no longer require an estimate of f(x 0 (it was eliminated by integrating Rule-of-thumb Method (Silverman If x Normal then Silverman shows (in his book Density Estimation h 4 MISE n 1 5 SD(x SD = standard deviation This method if often applied to nonnormal data, but it has a tendency to oversmooth, particularly when data density is multimodel. Sometimes, the standard deviation is replaced by the 75%-25% interquartile range. This is an older method, now available in some software packages, but the performance is not so good. Other methods are better Global Method #2: Least Squares Cross-validation The idea is to minimize an estimate of the integrated squared error (ISE minise = h = ˆf(x 0 2 dx 0 } {{ } term #1 { ˆf(x 0 f(x 0 } 2 dx 0 2 ˆf(x 0 f(x 0 dx 0 + } {{ } term #2 f(x 0 2 dx 0 }{{} term doesnt depend on h
37 4.1. FIRST GENERATION METHODS 33 T erm #1 = 1 n 2 h 2 i j k( x i x 0 k( x j x 0 let s = x i x 0 x 0 = x j + sh = 1 k( x i n 2 h 2 i j h x j }{{ h} x i x 0 hn = 1 k( x i x j n 2 h h i j x 0 h = x j h + s sk(sds sk(sds Note: k(a sk(sds is the density of the sum of two random variables with density k (also known as the convolution Now examine Term #2: 2 ˆf(x0 f(x 0 dx 0 y = a + s Pr(y y 0 = Pr(a + s y 0 = y = An unbiased estimator for term (2 is 2 k( x i x j N(N 1 h i j i = Pr(a y 0 s F (y 0 sk(sds k(y 0 sk(sds 2 N 2 h k( x i x j h Note that we need to use leave-one-out estimator to get independence. ĥ CV = arg min 1 k( x i x j sk(sds n 2 h i j h }{{} need to calculate convolution for this part i j i 2 n 2 h k( x i x j i j i h }{{} ith point does not get used, = ˆf i (x i ˆf i (x i = }{{} ˆf(x i K(o 1 nh leave one out estimator
38 34CHAPTER 4. BANDWIDTH SELECTION FOR DENSITY ESTIMATION (1 turns out that ĥcv can only be estimated at rate n 1 10 (result due to Stone which is very slow. However, in Monte Carlo studies LSCV often does better that Rule-of-thumb method. (2 Marson (1990 found a tendency for local minima near small h areas, which lead to a tendency to undersmooth. (3 LSCV is usually implemented by searching over a grid of bandwidth values Biased Cross-validation AMISE = 1 nh k 2 (sds + h4 n 4 Γ 2 k(sds f (x 2 dx + o( 1 nh + h4 what if we plug in ˆf (x? note: ˆf(x = 1 nh ˆf (x = 1 i k( x i x h k ( x i x nh 2 i h ˆf (x = 1 k ( x i x nh 3 i h ˆf (x 2 = 1 n n k( x i x n 2 h 6 h j=1 k(x j x h What is bias in estimation b (x 2? E{ ˆf (x 2 } = = 1 nh 6 can show = f (x + 1 nh 5 1 E( k ( x i x n 2 h 6 i h k ( x j x dx j E ( x i x h 2 dx k (s 2 d + higher order
39 4.1. FIRST GENERATION METHODS 35 h BCV = arg min 1 nh k 2 (sds+ h4 4 s 2 k(sds[ ˆf (x 2 dx 1 k (sds ] nh } 5 {{} subtract this bias here Second Generation Methods Sheather and Jones(1991 Solve -the -equation Plug-in Method h SIP I = [ ] 1 k 2 5 (sds f g(h (x2 dx [ s 2 k(sds ] 2 n 1 5 find h that solves this We need to plug in an estimate for f (x. The bandwidth that is optimal for f(x is not same as optimal bandwidth for f We can find an analogue to AMSE for R(f = f (x 2 dx get [ ] g = C 1 (k{r(f } 1 7 C2 (k n 1 7 R(f = f (xdx enters into the formula for the optimal bandwidth for g C 1 (k and C 2 (k are functions of the kernel. R(f is f (x 2 dx Now solve for g as a function of h g(h = [ C 3 (k{ R(f ] R(f } 1 7 C4 (k h 5 7 Now we need an estimate of f. At this point, SJ suggest using a rule of thumb method, based on normal density assumption What is main advantage of this method over other methods? For most methods ĥ h MISE h MISE n p
40 36CHAPTER 4. BANDWIDTH SELECTION FOR DENSITY ESTIMATION Recall, for CV, p = 1. for SJPI, 10 ρ = 5 14 (close to 1 2 so rate of convergence of the bandwidth is faster. This method is available in some software packages.
41 Chapter 5 Bandwidth selection for Nonparametric Regression Recall that the distribution for the LLR estimator was ( 1 nhn (ĝ LLR (x 0 g(x 0 N 2 g (x 0 k 2 (ssds nh n h 2 n, f(x 0 1 σ 2 (x 0 k 2 (s ds 5.1 Plug-in Estimator (local method choose bandwidth to minimize the MSE data-dependent [ ] 2 [ ] 2 1 min 2 g (x 0 k(ss 2 ds h 4 n + f(x 0 1 σ 2 (x 0 k 2 (Sds nh }{{}} n {{} Variance=C 1 n 1 h 1 n h n = Bias 2 =C 0 h 4 n [ f(x0 1 σ 2 (x 0 k 2 (sds g (x 0 2 k(ss 2 ds ] 1 5 n 1 5 note: min C 0 h 4 n + C 1 n 1 h 1 n 4C 0 h 3 n C 1 n 1 h 2 n = 0 h n : h n(x 0 variable bandwidth h 5 n = C 1 n 1 4C 0 = 37 ( C1 4C n 1 5
42 38CHAPTER 5. BANDWIDTH SELECTION FOR NONPARAMETRIC REGRESSION Remarks: g high smaller bandwidth want smaller bandwidth in regions where function is highly variable σ 2 (x 0 high use a larger bandwidth 5.2 global Plug-in Estimator minimize global criterion s.a.mise h r n = { } σ 2 (x 0 dx 0 k (sds g n 1 (x Mf: Fan &Gijbels (1996 Local Polynomial Reg dx 0 k(ss2 ds 5.3 Bandwidth Choice in Nonparametric Regression y i = m(x i + ε i min n {y i α... β p (x i x p } k( x i x how to choose, given p 5.4 Global MISE criterion V AR(y x {}}{ (p + 1p! 2 R(k p σ 2 (xdx h MISE = µ p+1 (k p 2 m p+1 (x 2 f(xdx 1 2p+3 n 1 2p+3 for LLR, p = 1 and get n 1 5, for p odd Fan & Gijbels (1996 Local Polynomial Regression k p = function of kernel R(k p = k p (s 2 ds µ l (k p = µ l k p (µdµ [ = R(k σ 2 (xdx µ 2 (k 2 m (x 2 f(xdx ] 1 5 n 1 5
43 5.5. BLOCKING METHOD 39 unknowns are σ 2 (x, m (x, f(x Apply same plug-in method to estimate ˆm (x then take average Still need σ 2 (xdx 5.5 blocking method (1 divide x into N blocks (2 Estimate y = m(x + ε by Q-degree poly within each block σ 2 (N = (n sn 1 i j { } y i ˆm Q j (x i 1(x i X j 5.6 Bandwidth Selection for Nonparametric Regression- Cross-Validation Recall LLR estimator defined as Let y = x = y 1... y N â = arg min n (y i a b(x i x 0 2 k( x i x 0 w = 1 (x 1 x (x N x 0 ( x k 1 x k N Z ( x N x 0 N N ĝ LLR (x 0 = [1 0](x wx 1 (x wy coefficient from a weighted least squares problem. Let weights depend on x 0, h
44 40CHAPTER 5. BANDWIDTH SELECTION FOR NONPARAMETRIC REGRESSION H(h = [1 0] (x wx 1 x w NN N ĝ LLR (x 0 = H(h y 1 N choose h to min E((ĝ LLR (x 0 g(x 2 x minimize conditional variance at each point of evaluation = E((H (h y g (H(hy g x = E((H (h g + H (h ε g ((H (h g + H (h ε g x = E(((H (h Ig + H (h ε ((H (h Ig + H (h ε x = g (H (h I (H (h Ig + E( ε 1 N H(h H(h ɛ x recall trab = trba N 1 1 N N 1 = g (H (h I (H (h Ig + E(tr(ε H(h H(hɛ x tr(a + B = tr(a + tr(b = g (H (h I (H (h Ig + E(tr(εε tr(h(h H(h x trace is a linear mapping = g (H (h I (H (h Ig + nσ 2 εtr(h(h H(h need estimator for this Consider what is estimated by (y H(hy (y H(hy = y (I H (h (I H (hy sum of fitted residuals = (g + ɛ (I H (h (I H (h(g + ε { } +g = (I H (h (I H (hg + g (I H (h (I H (hε +ε (I H (h (I H (hg + ε (I H (h (I H (hε E(g (I H (h (I H (hg x = g (I H (h (I H (hg E(g (I H (h (I H (hε x = 0 E(ε (I H (h (I H (hε x = E(tr(ε (I H (h (I H (hε x = E(tr(εε tr ((I H (h (I H (h x = nσ 2 tr ((I H (h (I H (h = nσ 2 [tri 2tr(H (htri + trh(h H(h] together, (y H(hy (y H(hy = g (I H (h (I H (hg + nσ 2 tri 2nσ 2 trh (h thi + nσ }{{} 2 trh (h H (h don t want this term if we use ĝ i (x i instead of ĝ (x i (leave-one-out estimator
45 5.6. BANDWIDTH SELECTION FOR NONPARAMETRIC REGRESSION- CROSS-VALIDATION41 then trh (h = 0 n ĥ CV = arg min (y i ĝ i (x i 2 h H
Nonparametric Density and Regression Estimation
Nonparametrc Densty and Regresson Estmaton February 2, 2007 Densty Estmaton. Hstogram Estmator The smplest nonparametrc densty estmator s the hstogram estmator. Suppose we estmate densty by a hstogram
More informationNonparametric Density Estimation
Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted
More informationNonparametric Methods
Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis
More informationTime Series and Forecasting Lecture 4 NonLinear Time Series
Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations
More informationNonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction
Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(
More informationDensity estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas
0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity
More informationNonparametric Regression
Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y
More informationLocal Polynomial Regression
VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based
More information4 Nonparametric Regression
4 Nonparametric Regression 4.1 Univariate Kernel Regression An important question in many fields of science is the relation between two variables, say X and Y. Regression analysis is concerned with the
More information41903: Introduction to Nonparametrics
41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific
More informationLocal regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression
Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction
More informationO Combining cross-validation and plug-in methods - for kernel density bandwidth selection O
O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem
More informationEcon 582 Nonparametric Regression
Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume
More informationNonparametric Econometrics
Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-
More informationModelling Non-linear and Non-stationary Time Series
Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September
More informationNonparametric Regression. Badr Missaoui
Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y
More information3 Nonparametric Density Estimation
3 Nonparametric Density Estimation Example: Income distribution Source: U.K. Family Expenditure Survey (FES) 1968-1995 Approximately 7000 British Households per year For each household many different variables
More informationKernel density estimation
Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms
More informationAdditive Isotonic Regression
Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive
More informationECO Class 6 Nonparametric Econometrics
ECO 523 - Class 6 Nonparametric Econometrics Carolina Caetano Contents 1 Nonparametric instrumental variable regression 1 2 Nonparametric Estimation of Average Treatment Effects 3 2.1 Asymptotic results................................
More informationBoundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta
Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density
More informationQuantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management
Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.
More informationAdaptive Kernel Estimation of The Hazard Rate Function
Adaptive Kernel Estimation of The Hazard Rate Function Raid Salha Department of Mathematics, Islamic University of Gaza, Palestine, e-mail: rbsalha@mail.iugaza.edu Abstract In this paper, we generalized
More informationLocal Polynomial Modelling and Its Applications
Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,
More informationIntroduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β
Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured
More informationEC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix)
1 EC212: Introduction to Econometrics Review Materials (Wooldridge, Appendix) Taisuke Otsu London School of Economics Summer 2018 A.1. Summation operator (Wooldridge, App. A.1) 2 3 Summation operator For
More informationNon-parametric Inference and Resampling
Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend
More informationAdaptive Nonparametric Density Estimators
Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where
More informationLinear regression COMS 4771
Linear regression COMS 4771 1. Old Faithful and prediction functions Prediction problem: Old Faithful geyser (Yellowstone) Task: Predict time of next eruption. 1 / 40 Statistical model for time between
More informationNADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i
NADARAYA WATSON ESTIMATE JAN 0, 2006: version 2 DATA: (x i, Y i, i =,..., n. ESTIMATE E(Y x = m(x by n i= ˆm (x = Y ik ( x i x n i= K ( x i x EXAMPLES OF K: K(u = I{ u c} (uniform or box kernel K(u = u
More informationRegression #3: Properties of OLS Estimator
Regression #3: Properties of OLS Estimator Econ 671 Purdue University Justin L. Tobias (Purdue) Regression #3 1 / 20 Introduction In this lecture, we establish some desirable properties associated with
More informationLocal linear multiple regression with variable. bandwidth in the presence of heteroscedasticity
Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable
More informationOn the Robust Modal Local Polynomial Regression
International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression
More information12 - Nonparametric Density Estimation
ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6
More informationMonte Carlo Methods for Statistical Inference: Variance Reduction Techniques
Monte Carlo Methods for Statistical Inference: Variance Reduction Techniques Hung Chen hchen@math.ntu.edu.tw Department of Mathematics National Taiwan University 3rd March 2004 Meet at NS 104 On Wednesday
More informationHistogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction
Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density
More informationExample continued. Math 425 Intro to Probability Lecture 37. Example continued. Example
continued : Coin tossing Math 425 Intro to Probability Lecture 37 Kenneth Harris kaharri@umich.edu Department of Mathematics University of Michigan April 8, 2009 Consider a Bernoulli trials process with
More informationMS&E 226: Small Data
MS&E 226: Small Data Lecture 6: Bias and variance (v5) Ramesh Johari ramesh.johari@stanford.edu 1 / 49 Our plan today We saw in last lecture that model scoring methods seem to be trading off two different
More informationAn introduction to nonparametric and semi-parametric econometric methods
An introduction to nonparametric and semi-parametric econometric methods Robert Breunig Australian National University robert.breunig@anu.edu.au http://econrsss.anu.edu.au/staff/breunig/course_bb.htm March
More informationSpatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood
Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationOptimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design
Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department
More informationPenalized Splines, Mixed Models, and Recent Large-Sample Results
Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong
More informationUnbiased Estimation. Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others.
Unbiased Estimation Binomial problem shows general phenomenon. An estimator can be good for some values of θ and bad for others. To compare ˆθ and θ, two estimators of θ: Say ˆθ is better than θ if it
More informationAUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET. Questions AUTOMATIC CONTROL COMMUNICATION SYSTEMS LINKÖPINGS UNIVERSITET
The Problem Identification of Linear and onlinear Dynamical Systems Theme : Curve Fitting Division of Automatic Control Linköping University Sweden Data from Gripen Questions How do the control surface
More informationOptimal Bandwidth Choice for the Regression Discontinuity Estimator
Optimal Bandwidth Choice for the Regression Discontinuity Estimator Guido Imbens and Karthik Kalyanaraman First Draft: June 8 This Draft: September Abstract We investigate the choice of the bandwidth for
More informationPreface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation
Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric
More informationNonparametric Regression. Changliang Zou
Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,
More informationLecture 3: Statistical Decision Theory (Part II)
Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical
More informationLog-Density Estimation with Application to Approximate Likelihood Inference
Log-Density Estimation with Application to Approximate Likelihood Inference Martin Hazelton 1 Institute of Fundamental Sciences Massey University 19 November 2015 1 Email: m.hazelton@massey.ac.nz WWPMS,
More informationA nonparametric method of multi-step ahead forecasting in diffusion processes
A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School
More informationESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL A COMPARISON OF TWO NONPARAMETRIC DENSITY MENGJUE TANG A THESIS MATHEMATICS AND STATISTICS
A COMPARISON OF TWO NONPARAMETRIC DENSITY ESTIMATORS IN THE CONTEXT OF ACTUARIAL LOSS MODEL MENGJUE TANG A THESIS IN THE DEPARTMENT OF MATHEMATICS AND STATISTICS PRESENTED IN PARTIAL FULFILLMENT OF THE
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationNonparametric Estimation of Luminosity Functions
x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number
More informationOn Asymptotic Normality of the Local Polynomial Regression Estimator with Stochastic Bandwidths 1. Carlos Martins-Filho.
On Asymptotic Normality of the Local Polynomial Regression Estimator with Stochastic Bandwidths 1 Carlos Martins-Filho Department of Economics IFPRI University of Colorado 2033 K Street NW Boulder, CO
More informationModel-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego
Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships
More informationIntroduction to Regression
Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric
More informationNonparametric Modal Regression
Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric
More informationFunction of Longitudinal Data
New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for
More informationSection Properties of Rational Expressions
88 Section. - Properties of Rational Expressions Recall that a rational number is any number that can be written as the ratio of two integers where the integer in the denominator cannot be. Rational Numbers:
More informationWeek 9 The Central Limit Theorem and Estimation Concepts
Week 9 and Estimation Concepts Week 9 and Estimation Concepts Week 9 Objectives 1 The Law of Large Numbers and the concept of consistency of averages are introduced. The condition of existence of the population
More informationLecture 02 Linear classification methods I
Lecture 02 Linear classification methods I 22 January 2016 Taylor B. Arnold Yale Statistics STAT 365/665 1/32 Coursewebsite: A copy of the whole course syllabus, including a more detailed description of
More informationBickel Rosenblatt test
University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance
More informationFrom Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch
From Histograms to Multivariate Polynomial Histograms and Shape Estimation Assoc Prof Inge Koch Statistics, School of Mathematical Sciences University of Adelaide Inge Koch (UNSW, Adelaide) Poly Histograms
More informationA Novel Nonparametric Density Estimator
A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with
More informationIntegral approximation by kernel smoothing
Integral approximation by kernel smoothing François Portier Université catholique de Louvain - ISBA August, 29 2014 In collaboration with Bernard Delyon Topic of the talk: Given ϕ : R d R, estimation of
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression
More informationThe Priestley-Chao Estimator - Bias, Variance and Mean-Square Error
The Priestley-Chao Estimator - Bias, Variance and Mean-Square Error Bias, variance and mse properties In the previous section we saw that the eact mean and variance of the Pristley-Chao estimator ˆm()
More informationHigh-dimensional regression
High-dimensional regression Advanced Methods for Data Analysis 36-402/36-608) Spring 2014 1 Back to linear regression 1.1 Shortcomings Suppose that we are given outcome measurements y 1,... y n R, and
More informationA New Method for Varying Adaptive Bandwidth Selection
IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive
More informationAlternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods
Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives
More informationSUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1
SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 B Technical details B.1 Variance of ˆf dec in the ersmooth case We derive the variance in the case where ϕ K (t) = ( 1 t 2) κ I( 1 t 1), i.e. we take K as
More informationNew Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data
ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human
More informationDESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA
Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,
More informationChapter 9. Non-Parametric Density Function Estimation
9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least
More informationDensity estimators for the convolution of discrete and continuous random variables
Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract
More informationMS&E 226: Small Data. Lecture 6: Bias and variance (v2) Ramesh Johari
MS&E 226: Small Data Lecture 6: Bias and variance (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 47 Our plan today We saw in last lecture that model scoring methods seem to be trading o two di erent
More informationSimple and Efficient Improvements of Multivariate Local Linear Regression
Journal of Multivariate Analysis Simple and Efficient Improvements of Multivariate Local Linear Regression Ming-Yen Cheng 1 and Liang Peng Abstract This paper studies improvements of multivariate local
More informationBandwidth selection for kernel conditional density
Bandwidth selection for kernel conditional density estimation David M Bashtannyk and Rob J Hyndman 1 10 August 1999 Abstract: We consider bandwidth selection for the kernel estimator of conditional density
More informationIntroduction to Regression
Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression
More informationEconometrics I. Lecture 10: Nonparametric Estimation with Kernels. Paul T. Scott NYU Stern. Fall 2018
Econometrics I Lecture 10: Nonparametric Estimation with Kernels Paul T. Scott NYU Stern Fall 2018 Paul T. Scott NYU Stern Econometrics I Fall 2018 1 / 12 Nonparametric Regression: Intuition Let s get
More informationRegression: Lecture 2
Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and
More informationERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS
ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS By YOUSEF FAYZ M ALHARBI A thesis submitted to The University of Birmingham for the Degree of DOCTOR OF PHILOSOPHY School of Mathematics The
More informationSEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH. Mustafa Koroglu. A Thesis presented to The University of Guelph
SEMIPARAMETRIC APPLICATIONS IN ECONOMIC GROWTH by Mustafa Koroglu A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Doctor of Philosophy in Economics
More informationCh 2: Simple Linear Regression
Ch 2: Simple Linear Regression 1. Simple Linear Regression Model A simple regression model with a single regressor x is y = β 0 + β 1 x + ɛ, where we assume that the error ɛ is independent random component
More informationIntroduction to Regression
Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures
More informationNonparametric confidence intervals. for receiver operating characteristic curves
Nonparametric confidence intervals for receiver operating characteristic curves Peter G. Hall 1, Rob J. Hyndman 2, and Yanan Fan 3 5 December 2003 Abstract: We study methods for constructing confidence
More informationAnalysis methods of heavy-tailed data
Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter
More informationChap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University
Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics
More informationOn variable bandwidth kernel density estimation
JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced
More informationPointwise convergence rates and central limit theorems for kernel density estimators in linear processes
Pointwise convergence rates and central limit theorems for kernel density estimators in linear processes Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract Convergence
More information1 Empirical Likelihood
Empirical Likelihood February 3, 2016 Debdeep Pati 1 Empirical Likelihood Empirical likelihood a nonparametric method without having to assume the form of the underlying distribution. It retains some of
More informationFall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.
1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n
More informationStatistics: Learning models from data
DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial
More informationNonparametric Inference in Cosmology and Astrophysics: Biases and Variants
Nonparametric Inference in Cosmology and Astrophysics: Biases and Variants Christopher R. Genovese Department of Statistics Carnegie Mellon University http://www.stat.cmu.edu/ ~ genovese/ Collaborators:
More informationStat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13)
Stat 5100 Handout #26: Variations on OLS Linear Regression (Ch. 11, 13) 1. Weighted Least Squares (textbook 11.1) Recall regression model Y = β 0 + β 1 X 1 +... + β p 1 X p 1 + ε in matrix form: (Ch. 5,
More information9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures
FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models
More informationLecture Notes 15 Prediction Chapters 13, 22, 20.4.
Lecture Notes 15 Prediction Chapters 13, 22, 20.4. 1 Introduction Prediction is covered in detail in 36-707, 36-701, 36-715, 10/36-702. Here, we will just give an introduction. We observe training data
More information36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1
36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations
More informationSingle Equation Linear GMM with Serially Correlated Moment Conditions
Single Equation Linear GMM with Serially Correlated Moment Conditions Eric Zivot October 28, 2009 Univariate Time Series Let {y t } be an ergodic-stationary time series with E[y t ]=μ and var(y t )
More information