3 Nonparametric Density Estimation

Size: px
Start display at page:

Download "3 Nonparametric Density Estimation"

Transcription

1 3 Nonparametric Density Estimation Example: Income distribution Source: U.K. Family Expenditure Survey (FES) Approximately 7000 British Households per year For each household many different variables are observed: Income, expenditures, age of household members, professions, etc. Nominal net incomes (136 of 7041 housholds): EconometricsII-Kneip 3 1

2 Characterizing distributions i.i.d random sample X 1,..., X n with underlying density f Traditional density estimator : histogram Histogram for FES income data (year=1976): income There are some drawbacks when using an histogram for estimating a density: How to choose the binwidth (as well as the starting point)? An histogram is a step function (non-smooth); it is an inefficient estimator of the underlying density f. EconometricsII-Kneip 3 2

3 histogram for FES income data in the year 1983 (large binwidth): histogram for FES income data in the year 1983 (small binwidth): EconometricsII-Kneip 3 3

4 3.1 Kernel density estimator: basic properties Data: i.i.d. sample X 1,..., X n of a continuous random variables X Problem: Estimate the density function f(x) Qualitative assumption: f ist smooth (at least twice differentiable) income Histogram with binwidth 2h: intervals [x j 1, x j ) with x j x j 1 = 2h estimation at the center points x = (x j 1 + x j )/2 with ˆf hist (x) = {X i [x j 1, x j )} 2hn K(z) = = 1 nh n ( ) x Xi K h i=1 1/2 if z [ 1, 1) 0 else EconometricsII-Kneip 3 4

5 Moving histogram (= elementary kernel estimator with rectangular kernel): Estimation at each point x by ˆf h (x) = 1 nh n ( ) x Xi K h i=1 h - bandwidth ( 2 binwidth of original histogram) Motivation: {X i [x h, x + h]} 2hn P (X i [x h, x + h]) 2h = P (X i [x h, x + h]) 2h = f(x) + O(h 2 ) 1 + O P ( ) nh General definition of a kernel density estimator: Replace the rectangular kernel by a smooth, continuously differentiable kernel function K and estimate f(x) by ˆf h (x) = 1 nh n ( ) x Xi K h i=1 for all x IR. The bandwidth h as well as the kernel function K have to be chosen by the econometrician. Simple properties of kernel density estimators: Positivity : K 0 ˆf h 0 Smoothness : K continuous, differentiable ˆf h continuous, differentiable K density function ˆf h density function EconometricsII-Kneip 3 5

6 Second-order kernel K: K is a density function which is symmetric at 0. K(x)dx = 1 and xk(x)dx = 0 Important (second-order) kernel functions: Family of symmetric Beta-densities: For p = 0, 1, 2,... K(u; p) = Const p ( 1 u 2 ) p für u [ 1, 1] and 0 else Resulting kernels for different values of p (u [ 1, 1]): p = 0 Uniform kernel: K(u) = 1 2 p = 1 Epanechnikov kernel: K(u) = 3 4 (1 u2 ) p = 2 Quartic/Biweight kernel: K(u) = (1 u2 ) 2 p = 3 Triweight kernel: K(u) = (1 u2 ) 3 Gaussian kernel: K(u) = ϕ(u) = 1 2π exp( u 2 /2), u IR Possible generalization: For m = 3, 4, 5,... an m-th order kernel function has to satisfy K(x)dx = 1, x q K(x)dx = 0, q = 1,..., m 1, x m K(x)dx 0 Note: Higher order kernels are almost never used in practice. The problem is that for m > 2 the above conditions imply that K(x) < 0 for some x IR. This implies that in general the resulting estimate ˆf h (x) is not a density. EconometricsII-Kneip 3 6

7 Kernel density estimators for different bandwidths (Gaussian kernel) 5 x 10 3 Family Expenditure Survey (1990) h = income before housing costs 4.5 x 10 3 Family Expenditure Survey (1990) h = income before housing costs EconometricsII-Kneip 3 7

8 4 x 10 3 Family Expenditure Survey (1990) h = income before housing costs 3.5 x 10 3 Family Expenditure Survey (1990) h = income before housing costs EconometricsII-Kneip 3 8

9 Kernel estimator with normal-reference bandwidth 4.5 x 10 3 Family Expenditure Survey (1990) Normal Reference income before housing costs Kernel density estimators with estimated optimal bandwidth (plugin) 4.5 x 10 3 Family Expenditure Survey (1990) Sheather / Jones income before housing costs EconometricsII-Kneip 3 9

10 3.2 The accuracy of kernel density estimators In the following we will consider the asymptotic behavior of a kernel density estimator as n and h h n 0 such that 1 nh 0. We will need some additional assumptions: The underlying density f is twice continuously differentiable K is a continuous second order kernel function with compact support [ 1, 1]. Note: A compact support of K is assumed in order to simplify arguments. The asymptotic expansions for bias and variance of the estimator remain valid under weaker assumptions: K is a bounded function which is continuous on its support S K IR, and lim y y 2 K(y) = 0. We now first derive the pointwise bias of ˆf h (x) at an arbitrary point x. E( ˆf h (x)) = 1 n = E n i=1 ( 1 h E K( x X i h ) ( 1 h K(x X i ) h = ) ) 1 h K(x u h )f(u)du EconometricsII-Kneip 3 10

11 and = 1 h K(x u h )f(u)du = K(y)f(x + yh)dy K(y) {f(x) + f (x)yh + 12! } f (x)y 2 h 2 dy + o(h 2 ) = f(x) + h f (x) K(y)y 2 dy +o(h 2 ) }{{} ν 2 (K) Bias( ˆf h (x))) = E( ˆf h (x)) f(x) = h 2 ν 2(K) f (x) + o(h 2 ) 2 For the variance we obtain: Var( ˆf h (x)) = 1 n ( 1 n 2 E h K(x X i ) E( 1 h h K(x X i )) h = f(x) nh i=1 K(y) 2 dy } {{ } R(K) +o( 1 nh ) This implies that the mean-squared error is given by MSE( ˆf h (x)) = Bias( ˆf h (x)) 2 + Var( ˆf h (x)) = h 4 ν 2(K) 2 4 ) 2 f (x) 2 + f(x) nh R(K) + o(h4 + 1 nh ) We can immediately infer that as n and h h n 0, 1 nh 0, the kernel estimator ˆf h (x) is a weakly consistent estimator of f(x), EconometricsII-Kneip 3 11

12 ˆf h (x) P f(x) It can even be shown that there is a uniform convergence in probability sup ˆf h (x) f(x) P 0 x IR At the same time the central limit theorem implies that nh( ˆfh (x) E( ˆf h (x))) D N(0, f(x)r(k)) which can also be written in the form ˆf h (x) AN(E( ˆf h (x)), f(x) nh R(K)) A general measure of the accuracy of a kernel density estimator over all point x IR is the mean integrated squared error (MISE): MISE( ˆf h ) = = = h 4 ν 2(K) 2 4 E( ˆf h (x) f(x)) 2 dx = Bias( ˆf h (x)) 2 dx + MSE( ˆf h (x))dx Var( ˆf h (x))dx f (x) nh R(K) + o(h4 + 1 nh ) The formula shows that the choice of an appropriate bandwidth is crucial for the accuracy of a kernel density estimator: The bias decreases as h decreases. Variance increases as h decreases. EconometricsII-Kneip 3 12

13 When ignoring the smaller order terms o(h nh ) an asymptotically optimal bandwidth balancing squared bias and variance minimizes the asymptotic MISE: h opt = { R(K) nν 2 (K) 2 f (x) 2 dx } 1/5, and for large n the corresponding minimal value of the asymptotic mean integrated squared error is given by = 5 4 MISE( ˆf hopt ) = min MISE( ˆf h ) h>0 } 1/5 {ν 2 (K) 2 R(K) 4 f (x) 2 dx n 4/5 This implies that when using an optimal bandwidth a kernel density estimator has the rate of convergence n 2/5 : ˆf hopt (x) f(x) = O P (n 2/5 ), while the resulting MISE depends on the curvature f (x) 2 dx of the (unknown) true density f, the choice of the kernel function, i.e. the constant ν 2 (K) 2 R(K) 4. Note: If a kernel density estimator is applied with a bandwidth h h opt, then it is common to speak of undersmoothing if h < h opt and of oversmoothing if h > h opt. Theory of optimal kernels: An obvious idea is to choose the kernel function which minimal value of C(K) = ν 2 (K) 2 R(K) 4. EconometricsII-Kneip 3 13

14 This leads to the variational problem of minimizing C(K) with respect to all possible second order kernels K. Hodges and Lehmann (1956) showed that the Epanechnikov kernel is optimal in the sense that it provides the minimal value C opt. The efficiency of some kernel K relative to the Epanechnikov kernel is usually defined as C opt C(K). The following table shows that, although the Epanechnikov kernel is optimal, the efficiency loss when using other popular kernels is very limited. This explains that in practice often biweight or triweight kernels are prefered to the Epanechnikov kernel, since the latter does not possess continuous first derivatives. Kernel C opt C(K) Epanechnikov Uniform Biweight Triweight Normal Optimal rates of convergence Consider an estimator θ n θ n (X 1,..., X n ) of some parameter (vector) θ IR d, d 1, of interest. For some r > 0, θ n possesses the the rate of convergence n r if θ n θ = O P (n r ) and n r = O P ( θ n θ) The rate of convergence tell us how fast the accuracy of the estimators improves as the sample size increases. An important justification for the use of a specific estimator consists in verifying that the estimator achieves the optimal rate of convergence. EconometricsII-Kneip 3 14

15 Optimal rates of convergence depend on the nature of the estimation problem to be studied. Let Ω denote the space of all possible values of the unknown parameter θ. Note that the probability distribution P θ of an estimator θ n will then depend on the true value of θ Ω. For some r > 0, n r is a lower rate of convergence if there exists a constant c > 0 such that for any possible estimator θ n θ n (X 1,..., X n ) lim inf sup P θ ( θ n θ > cn r) > 0 n θ Ω Moreover, n r is an achievable rate of convergence if there exists an estimator θ n such that lim lim sup sup P θ ( θ n θ > cn r) = 0 c n θ Ω The rate n r is called an optimal rate of convergence if it is both a lower and an achievable rate of convergence. Optimal rates of convergence are usually only of interest in nonparametric settings. In standard parametric problems (there are exceptions!) the optimal rates of convergence are n 1/2. This follows from the fact that for such estimators bias is asymptotically negligible, and variance is of order n 1/2. In many situations there also exist bounds for the variance of the most efficient estimator (Cramer-Rao lower bound). The situation is different in nonparametric function estimation. Any estimator then has to balance bias and variance which means that the rate n 1/2 cannot be reached. In nonparametric regression or density estimation we generally have to estimate a d-dimensional function f, d 1, from given data. If the aim is to estimate the r-th order derivative of f, r = EconometricsII-Kneip 3 15

16 0, 1, 2,..., then for classes of p-times continuously differentiable functions with bounded p-th derivatives, the optimal rates of convergence are f (r) (x) f(x) = O P (n p r 2p+d ) For the problem of estimating a density function f (i.e. r = 0) this means: If f is twice continuously differentiable, i.e. p = 2, then the optimal rate of convergence is n 2/5. Hence, using a secondorder kernel, a kernel density estimator achieves the optimal rate of convergence If f is four times continuously differentiable, i.e. p = 4, then the optimal rate of convergence is n 4/9. When using a second-order kernel, the corresponding kernel density estimator still only possesses the rate of convergence n 2/5. In this case, optimal rates of convergence can in principle be reached by using a fourth order kernel (often not a good idea in practice). 3.4 Estimating derivatives of f Based on a smooth kernel is is easy to estimate derivatives f (r) (x), r = 1, 2,.... If K is r-times continuously differentiable, then also ˆf h (x) is r-times continuously differentiable, and an estimate of f (r) (x) is given by ˆf (r) h (x) = 1 nh r+1 n i=1 K (r) ( x Xi h Assume a second-order kernel K which is r-times continuously differentiable and has the compact support [ 1, 1] (e.g. biweight EconometricsII-Kneip 3 16 )

17 if r = 1, or triweight if r = 2 ). If f is at least r + 2-times continuously differentiable, then it is easily verified that and E( MSE( ˆf (r) h Var( (x)) = f (r) (x) + h 2 ν 2(K) f (r+2) (x) + o(h 2 ) 2 (r) f(x) ˆf h (x)) = nh 2r+1 R(K(r) ) + o( (r) ˆf h (x)) ν 2(K) 2 =h4 f (r+2) (x) o(h nh 2r+1 ) 1 nh 2r+1 ) f(x) nh 2r+1 R(K(r) ) An optimal bandwidth h (r) opt for estimating f (r) (x) is thus very different from an optimal bandwidth h opt for estimating f(x). If r = 1, then an optimal bandwidth for estimating f (x) is of order h (1) opt n 1/7. If r = 2, then an optimal bandwidth for estimating f (x) is of order h (2) opt n 1/ Bandwidth selection Normal reference bandwidth: A straightforward approach is to use a standard family of distributions to assign a value to the term f (x) 2 dx in the asymptotic expression of h opt. In many applications one may expect that the structure of the true density f is not extremely different from the structure of a normal density. A reasonable approximation of an optimal bandwidth for estimating f may then be obtained by referring to the optimal bandwidth of a normal density. EconometricsII-Kneip 3 17

18 Normal density: ϕ µ,σ (x) = 1 σ ϕ(x µ σ ), where ϕ is the standard normal density, and where µ, σ 2 are mean, variance of X i. Some calculations lead to ϕ µ,σ(x) 2 dx = 3 π8σ 5 Normal reference bandwidth: h NR = { 8 πr(k) 3ν 2 (K) 2 n } 1/5 ˆσ, where ˆσ denotes a suitable estimate of the standard deviation of X. Cross-validation: We obviously obtain = ˆf h (x) 2 dx 2 ( ˆf h (x) f(x)) 2 dx ˆf h (x)f(x)dx + f(x) 2 dx Since f(x)2 dx does not depend on h, minimizing MISE( ˆf h ) over h is equivalent to minimizing E( ˆf h (x) 2 dx) E(2 ˆf h (x)f(x)dx) These terms can be estimated by cross-validation: CV (h) = 1 n n i=1 ˆf h (x) 2 dx 2 1 n n i=1 EconometricsII-Kneip 3 18 ˆf h, i (X i ),

19 where for each i = 1,..., n we use ˆf h,i to denote the kernel estimator obtained when dropping the i-th observation. In other words, for each i = 1,..., n the estimator ˆf h, i is determined from the reduced sample X 1,..., X i 1, X i+1,..., X n. Under fairly general conditions, minimizing CV (h) over h leads to a consistent estimator ĥopt,cv of h opt. The rate of convergence is ĥopt,cv h opt = O P (n 1/10 ). Plug-in methods: In the expression { R(K) h opt = nν 2 (K) 2 f (x) 2 dx } 1/5 the quantities R(K) and ν 2 (K) can be directly computed from the selected kernel function. The only problem in calculating h opt is the term f (x) 2 dx which depends on the unknown true density. But of course this integral may be estimated by ˆf h (x)2 dx. The theory of kernel derivative estimation implies that consistent estimation of f (x) requires a bandwidth h > h opt. It obviously does not make much sense to complicate the problem by looking for an optimal h. A general idea adopted by plug-in methods is to look over a reasonable range of bandwidth h and to use corresponing inflated bandwidths h := g n (h) > h, g n(h) h as n, for estimating the functional. An estimate ĥopt,pi is then determined by solving EconometricsII-Kneip 3 19

20 the fixed point problem R(K) ĥ opt,pi = n ν 2 (K) 2 ˆf g n (ĥopt,pi) (x)2 dx A solution is obtained iteratively, starting e.g. from a normal reference bandwidth. Gasser, Kneip and Köhler (JASA, 1992) propose g n (h) = n 1/10 h. Sheather and Jones (JRSSB, 1991) use a more complicated relation based on a reference normal model. 1/5 3.6 Pointwise Confidence intervals Recall that asymptotically nh( ˆfh (x) E( ˆf h (x))) D N(0, f(x)r(k)) This asymptotic normality result allows to establish pointwise confidence intervals. For given α > 0 an approximate 1 α- confidence interval for the variability of ˆf h (x) is thus given by ˆf h (x) ± z 1 α/2 ˆfh (x)r(k) nh. Here, z 1 α/2 denotes the corresponding quantile of the standard normal distribution. If α = 0.05, then z 1 α/2 = z = Then P (E( ˆf h (x)) [ ˆf h (x) ± z 1 α/2 ˆfh (x)r(k) nh ]) 1 α as n But obviously this interval only focuses on random fluctuations of ˆf h (x), the bias is not taken into account. Recall that E( ˆf h (x)) = f(x) + o(h 2 ). When using an approximately optimal bandwidth EconometricsII-Kneip 3 20

21 h n 1/5, then the probability that the interval contains the true value f(x) may be much smaller than 1 α. A possible trick to circumvent the bias problem is to use an undersmoothing bandwidth. Instead of using h n 1/5, confidence intervals may be constructed with respect to a bandwidth h = h n β, β > 0. Then h 2 = o( 1 nh ) and therefore n h( ˆf h(x) E( ˆf h(x)) = n h( ˆf h(x) f(x)) + o(1), which yields n h( ˆf h(x) f(x)) D N(0, f(x)r(k)). We can conclude that then ˆf h(x) ± z 1 α/2 ˆf h(x)r(k) n h provides an asymptotically valid 1 α-confidence interval. Of course, a suitable choice of β is a very difficult problem in any practical application. Note: There are more sophisticated methods for constructing confidence intervals based on the bootstrap. 3.7 Testing normality Consider an i.i.d. sample X 1,..., X n. Many standard procedures in parametric statistics rely on the assumption that X i is normally distributed. In practice it is often useful to test the hypothesis of normally distributed observations. Formally this leads to the testing problem: H 0 : X i N(µ, σ 2 ) EconometricsII-Kneip 3 21

22 against the alternative H 1 : X i not normally distributed Kernel density estimation allows to define a sensible test of this problem. Consider a kernel estimator ˆf h (x) based on using the Gaussian kernel K(u) = ϕ(u). If the null hypothesis is correct, i.e. f = ϕ µ,σ, it can then be shown that for any possible bandwidth h This implies that E( ˆf h (x)) = ϕ µ, σ 2 +h 2 E( ˆfh (x) ϕ µ, σ 2 +h 2 ) 2 dx = Var( ˆf h (x))dx Consequently, if H 0 is correct, then any difference between ˆf h (x) und ϕ µ, σ2 +h is only due to random fluctuations (no systematic 2 error, no additional bias). We thus arrive at the following test procedure: Estimate mean and variance of X i by ˆµ = 1 n n i=1 X i = X und ˆσ 2 = S 2. Determine the normal reference bandwidth h NR and the corresponding kernel density estimator ˆf hnr. Calculate D = Reject H 0 if D is too large. ( 2 ˆfhNR(x) ϕˆµ, ˆσ 2 +h NR 2(x)) dx. The distribution of D under H 0 can be approximated by Monte- Carlo-simulations. In a first order approximation this distribution EconometricsII-Kneip 3 22

23 only depends on the sample size n and is independent of the values of µ, σ 2 ab. The following table presents critical values for a test of level α = 5%. n crit. value Example: In (older) economic literature it is frequently assume that the income distribution is lognormal. This means that log X i follows a normal distribution. FES data (1990): An application of the kernel based test on logincome data yields D = 0, Rejection of H 0, the income distribution is not lognormal. EconometricsII-Kneip 3 23

24 0.7 Family Expenditure Survey (1990) Normal Reference = N(5.23, ) L2 distance= ln(income) before housing costs 3.8 Boundary problems The above calculations implicitly assume that the density is supported on the entire real line. Boundary problems can arise if the support of f is only a subset of IR. In order to exemplify this point we will in the following assume that f has the support [0, ]. Then f(x) = 0 if x < 0 which means that X i 0 with probability 1. In economic applications this is an important situation, since many variables of interest (e.g. income, wages, working hours, etc.) are positive. Consider the behavior of kernel density estimator based on a second order kernel with compact support [ 1, 1]. Then the interval [0, h] is called the boundary region. Depending on the structure of f(x) for x [0, h] the kernel estimator may produ- EconometricsII-Kneip 3 24

25 ce poor estimates in the boundary region. More precisely, ˆfh (0) usually underestimates f(0). This is because ˆf h does not feel the boundary, and penalizes for the lack of data on the negative axis. From a theoretical point of view the bias increases, while the variance is still of order 1/(nh). If f(0) > 0, then as n ˆf h (0) P 1 2 f(0). If f(0) = 0, then ˆf h (0) = O P (h). If f(0) = 0 and f (0) = 0, then ˆf h (0) = O P (h 2 ). There are several possibilities to deal with boundary problems. 1) Reflection Method: If f(0) > 0, then the reflection of data method produces consistent estimates of f(x) in the boundary region. The simple idea is to create a new sample of 2n observations by simply adding X 1, X 2,..., X n to the data. The estimator then becomes f h (x) = 1 nh n i=1 ( K( x X i ) + K( x + X ) i ) h h for x 0 (and f h (x) = 0 for x < 0). Then If f (0) 0, then f h (0) f(0) = O P (h). If f (0) = 0, then f h (0) f(0) = O P (h 2 ). There are more sophisticated reflection methods which even in the general case are able to guarantee that the bias is of order h 2 (see. e.g. Cowling and Hall, JRSSB, 1996). 2) Boundary kernel method: At each point in the boundary region, use a different kernel for estimating function. These EconometricsII-Kneip 3 25

26 new kernels give up the symmetry property and put more weight on the positive axis (see e.g. Scott (1992): Multivariate density estimation, Wiley). 3) Transformation of data: Data transformations have the potential to improve the quality of density estimates (not only at possible boundaries). If f(0) = 0, i.e. if the support of f if is actually (0, ), then such transformations can avoid any boundary problem. The idea then is to define a strictly monotonically increasing, differentiable transformation T : (0, ) IR (in economic applications one will often use T (x) = log x). Let g denote the density of the transformed random variables Y i = T (X i ), i = 1,..., n. Then f and g are linked by the relation f(x) = g(t (x)) T (x), x (0, ). The density g can be estimated by ordinary kernel estimation from Y 1 = T (X 1 ),..., Y n = T (X n ), ĝ h (y) = 1 nh n ( ) y Yi K, h i=1 and an estimate of f is then given by f h (x) = ĝ h (T (x))t (x) = 1 n ( ) T (x) T (Xi ) K T (x) nh h Using such a transformation may lead to a general improvement in the quality of the density estimation if g is structurally much simpler than f (if e.g. g (y) 2 dy f (x) 2 dx). i=1 EconometricsII-Kneip 3 26

27 3.9 Multivariate density estimation kernel estimators can also be used to estimate multivariate density functions. Consider an i.i.d. sample of random vectors X i = (X i1, X i2,..., X id ) τ with underlying density f(x), x = (x 1,..., x d ) τ. Daten: Zufallsstichprobe X 1 = X 11. X 1d,..., X 1 = X n1. X nd d-dimensional kernel estimator (with kernel function K and bandwidth h 1,..., h d ) ˆf h (x) = 1 n n i=1 ( 1 x1 X i1 K,..., x ) d X id, x = h 1... h d h 1 h d The kernel function K : IR d IR is now a d-dimensional density function which is symmetric around 0. Hence, in particular IR d K(x 1, x 2,..., x d )dx 1 dx 2... dx d = 1 Frequently used kernel functions: Product kernels: Let K denote a one-dimensional secondorder kernel function. A d-dimensional kernel function can then be defined by using the corresponding products of K(X ij ), j = 1,..., d: K(x 1, x 2,..., x d ) = K(x 1 ) K(x 2 )... K(x d ) Examples: d-dimensional Gaussian kernel density of the N d (0, I)-distribution EconometricsII-Kneip 3 27 x 1. x d IRd

28 Multivariate Epanechnikov-kernel: 1 2c K(x 1,..., x d ) = d (d + 2)(1 d i=1 x2 i ) if d i=1 x2 i 1 0 else Here, c d denotes the volume of the d-dimensional unit circle: c 1 = 2, c 2 = π, c 3 = 4π/3, etc. Smooth kernels (in the case d = 2): 3 π K(x 1, x 2 ) = (1 2 i=1 x2 i )2 if 2 i=1 x2 i 1 0 else K(x 1, x 2 ) = 4 π (1 2 i=1 x2 i )3 if 2 i=1 x2 i 1 0 else In practice, usually one basic bandwidth h is selected. In order to eliminate effects of different scalings of variable the d bandwidth h 1,..., h d are then determined as h j = ˆσ j h, where ˆσ j := 1 n (X ij X j ) 2 For a a given point x the MSE of ˆf h (x) is then of order MSE( ˆf h (x)) = O(h nh d ), which means that an optimal bandwidth is of order h n 1 4+d. The corresponding rate of convergence is then ˆf h (x)) f(x) = O P (n 2 4+d ). EconometricsII-Kneip 3 28

29 Example: FES data (1984) X i1 - relative income of household i; X i2 - age of household head Kernel estimator of the joint density of (X 1, X 2 ): EconometricsII-Kneip 3 29

30 Der The curse of dimensionality Kernel estimators are a useful tool for the nonparametric estimation of one, two-, or three-dimensional densities. The accuracy of estimation, however, decreases rapidly as d increases. In really high-dimensional problems kernel estimators are practically useless. Indeed, this is a substantial problem which emerges in all types of nonparametric function estimation (density estimation as well as regression, autoregression, etc). One generally speaks of the curse of dimensionality. The reason is the emptiness of a high-dimensional space IR d. If d 1, then even for large sample sizes n there will only exist very few observations which a close-by. As an example consider the estimation of a d-variate standard normal distribution at the point x = 0. This is obviously the center of the distribution, and the density has its maximal value at x = 0. First assume that we use a kernel density estimator based on the Epanechnikov kernel with bandwidths h 1 = h 2 = h d = 1. Of course, these are fairly large bandwidths which lead to a substantial bias. If d = 1, then P ( X i 1) 0.68, i.e. one will expect, that approximately 68% of all observations satisfy K(X i /h) > 0 and thus possess a positive weight when calculating ˆf h (0) (h = 1). If d = 2, then P ( X i1 1 and X i2 1) This means that approximately 46% of all observations contribute a positive weight when calculating ˆf h (0) (h = 1). EconometricsII-Kneip 3 30

31 If d = 10, then P ( X ij 1 for all j = 1,..., 10) This means that only approximately 2% of all observations contribute a positive weight when calculating ˆf h (0) (h = 1). Now assume that an optimal bandwidths h = h opt are used. Then the following sample sizes n are necessary in order to obtain E( ˆf hopt (0) f(0)) 2 f(0) Dimension d Sample size n EconometricsII-Kneip 3 31

32 4 Nonparametric Regression We start by considering univariate regression with one single explanatory variable X IR. Data: (Y i, X i ), i = 1,..., n, where Y i response variable X i [a, b] IR explanatory variable n sufficiently large (e.g. n 40) nonparametric regression model: Y i = m(x i ) + ϵ i m(x i ) = E(Y i X = X i ) regression function ϵ 1, ϵ 2,... i.i.d., E(ϵ i ) = 0, var(ϵ i ) = σ 2 Linear regression: m(x) is a straight line m(x) = β 0 + β 1 X Possible generalizations: m(x) quadratic or cubic polynomial m(x) = β 0 + β 1 X + β 2 X 2 oder m(x) = β 0 + β 1 X + β 2 X 2 + β 3 X 3 EconometricsII-Kneip 4 1

33 Many important applications lead to regression functions possessing a complicated structure. Standard models then are too simple and do not provide useful approximations of m(x) All models are false, but some are useful (G. Box) Example: Total expenditure in dependence of age The data stem from a random sample of British households in the year income age EconometricsII-Kneip 4 2

34 Nonparametric regression: There are no specific assumptions about the structure of the regression function. It is only assumed that m is smooth. An important point in theoretical analysis is the way how the observations X 1,..., X n have been generated. One distinguishes between fixed and random design. Fixed design: The observation points X 1,..., X n are fixed (non stochastic) values. Example: Crop yield (Y ) in dependence of the amount of fertilizer (X) used. Most important special case: equidistant design - X i+1 X i = b a n. Random design: The observation points X 1,..., X n are (realizations of ) i.i.d. random variables with density f. f is called design density. Throughout this chapter it will be assumed that f(x) > 0 for all x [a, b]. Example: Sample (Y 1, X 1 ),..., (Y n, X n ) of income (Y ) and age (X) of n 7000 randomly selected British households. In the case of random design m(x) is the conditional expectation of Y given X = x, and var(ϵ i X i ) = σ 2. m(x) = E(Y X = x) EconometricsII-Kneip 4 3

35 4.1 Basis function expansions Some frequently used approaches to nonparametric regression rely on expansions of the form m(x) p β j b j (x), j=1 where b 1 (x), b 2 (x),... are suitable basis function. b 1, b 2,... have to be chosen in such a way that for any possible smooth function m the approximation error inf β m(x) p j=1 β jb j (x) tends to zero as p ( approximation theory). Examples are approximations by polynomials, spline functions, wavelets or Fourier expansions (for periodic functions). Simplest approach : For a fixed value p an estimator ˆm p is determined by p ˆm(x) = ˆβ j b j (x), j=1 where the coefficients ˆβ j are obtained by ordinary least squares 2 2 n p n Y i ˆβ j b j (X i ) p = min Y i β j b j (X i ) β 1,...,β p }{{} i=1 j=1 i=1 j=1 X ij The quality of the approximation obviously depends on the choice of p which serves as a smoothing parameter p small: variability of the estimator is small, but there may exists a high systematic error (bias) p large: bias is small, but variability of the estimator is high (Bias) EconometricsII-Kneip 4 4

36 4.1.1 Polynomial Regression Approximation theory: Every smooth function can be well approximated by a polynomial of sufficiently high degree Approach: Choose p and fit a polynomial of degree p: n ( p ) 2 Y i ˆβ j X j 1 = min i=1 j=1 ˆm p (X) = ˆβ 1 + p 1 j=2 ˆβ j X j 1 i This corresponds to an approximation with basis functions b 1 (x) = 1, b 2 (x) = x, b 3 (x) = x 2,..., b p (x) = x p 1. Note: It is only assumed that m is well approximated by a polynomial, there will usually still exist an approximation error ( bias 0). Remark: Polynomial regression is not very popular in practice. Reasons are numerical problems in fitting high dimensional polynomials. Furthermore, high order polynomials often possses an erratic, difficult to interpret behavior at boundaries. EconometricsII-Kneip 4 5

37 p = 2: 1.7 income age p = 3: 1.7 income age EconometricsII-Kneip 4 6

38 p = 5: 1.7 income age p = 7: 1.7 income age EconometricsII-Kneip 4 7

39 4.1.2 Spline Approximation A frequently used method consists in a local polynomial approximation using spline functions. A spline is a piecewise polynomial function. They are defined with respect to a pre-specified sequence of q knots a = τ 1 < τ 2 < < τ q = b. Different specifications of the knot sequence lead to different splines. More precisely, for a given knot sequence a spline functions s(x) of degree k is defined by the following properties: s(x) is a polynomial of degree k in every interval [τ j, τ j+1 ], i.e. s(x) = s 0 + s 1 x + s 2 x s k x k, s 0,..., s k IR, for all x [τ j, τ j+1 ] s(x) is k 1 times continuously differentiable at each knot point x = τ j, j = 1,..., q. s(x) is called a linear spline if k = 1, s(x) is a quadratic spline if k = 1, and s(x) is a cubic spline if k = 3. In practice, the most frequently used splines are cubic spline functions based on an equidistant sequence of q knots (τ j+1 τ j = τ j τ j 1 for all j). The space of all spline function of degree k defined with respect to a given knot sequence a = τ 1,..., τ q b is a p := q + k 1 dimensional linear function space S k,τ1,...,τ q. EconometricsII-Kneip 4 8

40 Possible basis functions are b1 (x) = 1, b 2 (x) = x,..., b k = x k 1, b k+1 = (x τ 1 ) k +,..., b k+q 1 = (x τ q 1 ) k +, where (x τ j ) k + = (x τ j ) k if x τ j 0 else Each spline function s S k,τ1,...,τ q can then be written as s(x) = k β j x j 1 + j=1 q 1 j=1 and suitable parameters β 1,..., β k+q 2. β j+k (x τ j ) k + for x [a, b] The so-called B-spline functions generate an alternative basis. Such B-spline representations are almost always used in practice, since they possess a number of advantages from a numerical point of view. The B-Spline basis functions b j,k, j = 1,..., k+q 2, for splines of order k based on a knot sequence a τ 1,..., τ q b are calculated by a recursive procedure: 1 if x [τj b j,0 (x) =, τ j+1 ], j = 1,..., q + 2k 1 0 else and b j,l (x) = x τ j τ l+j τ j b j,l 1 (x) + τ l+j+1 x τl+j+1 τ j+1 b j+1,l 1 (x), for l = 1,..., k, j = 1,..., q + k 1, and x [a, b]. Here, τ 1 = = τ k+1 = τ 1, τ k+2 = τ 2,..., τ k+q = τ q and τ k+q+1 = = τ 2k+q = τ q. EconometricsII-Kneip 4 9

41 The so-called regression spline (or B-spline ) approach to estimating a regression function m(x) is based on fitting a spline function to the data. Frequently, cubic splines (k = 3) with equidistant knots are applied. Then τ 1 = a, τ q = b and τ j+1 τ j = b a q 1. In this case only the number of knots (or more precisely p = q + 2) is a smoothing parameter which has to be selected by the statistician. An estimator ˆm p (x) is then given by ˆm p (x) = p j=1 ˆβ j b j,k (x), and the coefficients ˆβ j are determined by ordinary least squares. Here, again p = q + k 1. With Y = (Y 1,..., Y n ) T and X denoting the n p matrix with elements b j (X i ) the estimated vector ˆβ = ( ˆβ 1,..., ˆβ p ) T of coefficient can be written as ˆβ = (X T X) 1 X T Y, ˆm p (X 1 ). ˆm p (X n ) = X ˆβ = X(X T X) 1 X T Y }{{} Remark: Quite generally, the most important nonparametric regression procedures are linear smoothing methods. This means that in dependence of some smoothing parameter λ, estimates of the vector (m(x 1 ),..., m(x n )) T are obtained by multiplying a smoother matrix S λ with Y. S p EconometricsII-Kneip 4 10

42 Approximation properties of spline functions As already mentioned above, nonparametric regression does not assume that m(x) exactly corresponds to a spline function. ˆm p thus possesses a systematic error. But if the number of knots is large, then splines can approximate any smooth function with high accuracy. The accuracy of spline approximations: Results of approximation theory imply that for any spline order k and any ν times continuously differentiable function m, 1 ν k + 1, we have min s S k,τ1,...,τq max m(x) s(x) x [a,b] C ν,k max τ j+1 τ j ν j=1,...,q max x [a,b] m(ν) (x) for some universal constant C ν,k which only depends on ν and k. Let k = 3 (cubic spline function). A cubic spline function satisfying the boundary constraints s (τ 1 ) = s (τ q ) = 0 is usually called a (cubic) natural spline. Note that for any cubic natural spline the effective number of parameters to be estimated reduces to q (instead of q + 2). Now assume that for some twice continuously differentiable function m only the functional values m(τ 1 ),..., m(τ q ) at τ 1,..., τ q are known. We then have to interpolate these functional values in order to obtain some suitable reconstruction of m(x) on [τ 1, τ q ]. Spline interpolation: For all possible values m(τ 1 ),..., m(τ q ) there exists a unique cubic natural spline s m,q interpolating these values, i.e., s m,q (τ j ) = m(τ j ) for all j = 1,..., q. Spline theory now states that s m,q is the smoothest function interpolating EconometricsII-Kneip 4 11

43 these values, τq s m,q(x) 2 dx τq m (x) 2 dx τ 1 τ 1 for any other twice cont. differentiable function m with m(τ j ) = m(τ j ) for all j = 1,..., q. Literature: C. de Boor, A practical guide to splines, Springer (1978); R. Eubank, Spline smoothing and nonparametric regression, Marcel Dekker (1988) Mean average squared error The behavior of nonparametric function estimates is usually evaluated with respect to quadratic risk. To simplify notation, I will in the following write E ϵ as well as var ϵ to denote expectation and variance with respect to the r.v. ϵ i, only. In the case of random design, E ϵ and var ϵ thus denote the conditional expectation E( X 1,..., X n ) and variance given the observed X- values. For random design, these conditional expectations depend on the observed sample, and thus are random. For fixed design, such expectations are of course fixed values. It will always be assumed that the matrix X T X is invertible (under our conditions on the design density this holds with probability 1 for random design). A common measure of accuracy of a spline estimator ˆm p is the EconometricsII-Kneip 4 12

44 mean average squared error (MASE): MASE( ˆm p ) := 1 n n E ϵ (m(x i ) ˆm p (X i )) 2 i=1 = 1 n (m(x i ) E ϵ ( ˆm p (X i ))) n E ϵ (( ˆm p (X i ) E ϵ ( ˆm p (X i ))) 2 n n i=1 i=1 }{{}}{{} Bias 2 ˆm p V ar( ˆm p ) Another frequently used measure is the mean integrated squared error (MISE) MISE( ˆm p ) := Equidistant design: b a E ϵ (m(x) ˆm p (x)) 2 dx MISE( ˆm p ) = MASE( ˆm p ) + O(n 1 ) MISE and MASE are generally not asymptotically equivalent in the case of random design MASE( ˆm p ) = b a E ϵ (m(x) ˆm p (x)) 2 f(x)dx + O P (n 1 ). In the following we have to analyze bias and variance of the estimator. As already mentioned above, nonparametric regression does not assume that m(x) exactly corresponds to a spline function. ˆm p thus possesses a systematic error (bias), m p (x) := E ϵ ( ˆm p (x)) m(x). Let m = (m(x 1 ),..., m(x n )) T. Then β = E ϵ ( ˆβ) = (X T X) 1 X T m EconometricsII-Kneip 4 13

45 Consequently, β = (β 1,..., β p ) T is a solution of (m(x i ) i p β j b j,k (X i )) 2 = j=1 inf (m(x i ) ϑ 1,...,ϑ p = inf s S k,τ1,...,τq i p ϑ j b j,k (X i )) 2 j=1 (m(x i ) s(x i )) 2 i m p (x) := p β j b j (x) = E ϵ ( ˆm p (x)) j=1 is the best approximation of m(x) by spline functions in S k,τ1,...,τ q, and ˆβ j estimate the corresponding coefficients β j. By the general approximation properties of cubic splines (k = 3) with q = p 2 equidistant knots, we will thus expect that if m is twice continuously differentiable, then Bias( ˆm p ) 2 = 1 n n (m(x i ) m p (X i ))) 2 = O P (p 4 ), i=1 if m is four times continuously differentiable, then Bias( ˆm p ) 2 = 1 n n (m(x i ) m p (X i ))) 2 = O P (p 8 ), i=1 The average variance of the estimator can be obtained by the usual type of arguments applied in parametric regression. Let EconometricsII-Kneip 4 14

46 m p = ( m(x 1 ),..., m(x n )) T and ϵ = (ϵ 1,..., ϵ n ) T. Then V ar( ˆm p ) =: 1 ( ) n E ϵ X(X T X) 1 X T Y X(X T X) 1 X T m p 2 2 = 1 ( ) n E ϵ X(X T X) 1 X T ϵ 2 2 = 1 ( ) n E ϵ ϵ T X(X T X) 1 X T ϵ = 1 ( ) n trace X T X) 1 X T E(ϵϵ T )X = 1 ( ) n σ2 trace X T X) 1 X T X = σ 2 p n Remark: For any j l matrix A and any l j matrix B we have the identity trace(ab) = trace(ba) These arguments imply that there is a tradeoff between average squared bias and variance. For cubic splines with equidistant knots and a twice differentiable function m we will expect that Bias( ˆm p ) 2 = O P (p 4 ) Since V ar( ˆm p ) = σ 2 p n an optimal p, balancing bias and variance, will be of order p opt n 1/5. Then MASE( ˆm popt ) = O P (n 4/5 ) Note: For an estimator ˆm based on a valid (!) parametric model we have MASE( ˆm popt ) = O P (n 1 ). Similar results can be obtained for the mean integrated squared error (MISE): If m is twice continuously differentiable, and p opt n 1/5, then ( ) b MISE( ˆm popt ) = E ϵ (m(x) ˆm popt (x)) 2 dx = O P (n 4/5 ) a EconometricsII-Kneip 4 15

47 AMSE True model bias Estimated model variability Estimated model Bias 2 ( ˆm p ) decreases as p increases V ar( ˆm p ) increases as p increases bias variance Number of parameters Number of parameters EconometricsII-Kneip 4 16

48 Problem: m is unknown MASE and p opt cannot be directly computed Approach: Determine an estimate ˆp opt of the optimal number p of basis functions by minimizing a suitable error criterium with the following properties: - For every possible p the corresponding criterion function can be calculated from the data. - For any p the error criterion provides information about the respective M ASE Recall: With ˆm p = ( ˆm p (X 1 ),..., ˆm p (X n )) T we have ˆm p = X ˆβ = X(X T X) 1 X T Y =: S p Y and p n = tr(s p) n. For given p, the number of parameters to estimate by the spline method (one also speaks of the degrees of freedom of the smoothing procedure) is equal to p. This corresponds to the trace of the smoother matrix S p = X(X T X) 1 X T. Most frequently used error criteria: Cross-validation (CV): For a given values p cross-validation tries to approximate the corresponding prediction error. CV (p) = 1 n ( 2 Y i ˆm p, i (X i )), n i=1 Here, for any i = 1,..., n, ˆm p, i is the leave-one-out estimator of m to be obtained when a spline function is fitted to the n 1 observations (Y 1, X 1 ),..., (Y i 1, X i 1 ), (Y i+1, X i+1 ),..., (Y n, X n ). EconometricsII-Kneip 4 17

49 Motivation: We have ( E ϵ (CV (p)) = 1 n E n ( ) ) 2 ϵ m(x i ) + ϵ i ˆm p, i (X i ) i=1 ( = 1 n E n ( ) ) 2 ϵ m(x i ) ˆm p, i (X i ) i=1 }{{} MASE( ˆm p ) ( n ) n E ϵ (m(x i ) ˆm p, i (X i ))ϵ i +σ 2 i=1 } {{ } =0 Generalized cross-validation (GCV): 1 n ( GCV (p) = n(1 p Y i ˆm p (X i ) n )2 i=1 Motivation: It is easily verified that with ARSS(p) := 1 n ( Y i ˆm p (X i ) n we have i=1 ) 2 E ϵ (ARSS(p)) = MASE( ˆm p ) 2σ 2 p n + σ2 ) 2 If p such that p/n 0, a Taylor expansion yields GCV (p) =ARSS(p) + 2 p n ARSS(p) }{{} =σ 2 +o p (1) +O P (( p n )2 ) As motivated above, for large n CV (p) as well as GC(p) can be seen as estimates of MASE( ˆm p )+σ 2. More precisely, as n, p n 0, E ϵ (CV (p)) = E ϵ (GCV (p) = MASE( ˆm p ) (1 + o P (1)) + σ 2 EconometricsII-Kneip 4 18

50 There are theoretical results which show that if ˆp opt is determined by minimizing CV (p) or GC(p), then for large n MASE( ˆmˆpopt ) will be close to MASE( ˆm popt ). There are more advanced procedures which estimate p as well as a best placement of the knots τ 1,..., τ q simultaneously from the data (MARS algorithm) GCV p EconometricsII-Kneip 4 19

51 4.2 Approaches Based on Roughness Penalties A different approach to spline fitting, which is widely used in practice, is based on the use of a roughness penalty. The basic idea can be described as follows: In order to guarantee a small systematic error, spline functions are defined with respect to a large number of knots ( p n close to 1). Variability of the estimator is controlled by fitting the coefficients subject to a penalty which penalizes roughness (nonsmoothness) of the resulting function. A convenient measure of smoothness is b a m (x) 2 dx. Remark: The so-called (cubic) smoothing spline approach relies on cubic splines with knots at each observation point. More precisely, q = n and τ 1 = X 1, τ 2 = X 2,..., τ n = X n. The side conditions s (a) = 0 and s (b) = 0 are additionally imposed in order to ensure that the number of coefficients to be estimated is equal to n. In the following we will consider cubic splines (k = 3). For a smoothing parameter λ > 0 (to be selected by the statistician), an estimate ˆm λ (x) = j ˆβ j b j (x) is determined by 1 n (Y i ˆm λ (X i )) 2 + λ i = inf s S 3,τ1,...,τq { 1 n b a ˆm λ(x) 2 dx (Y i s(x i )) 2 + λ i b a s (x) 2 dx }, EconometricsII-Kneip 4 20

52 or equivalently, 1 n (Y i i j = inf ϑ 1,...,ϑ p ˆβ j b j (X i )) 2 + λ 1 (Y i n i j b a ( j ϑ j b j (X i )) 2 + λ ˆβ j b j (x)) 2 dx b a ( j ϑ j b j (x)) 2 dx. Let X denote the n p matrix with elements b j (X i ), and let B denote the p p matrix with elements n b a b j (x)b l (x)dx, j, l = 1,..., p. Then the solutions are given by ˆβ = (X T X + λb) 1 X T Y, ˆm λ = X(X T X + λb) 1 X T Y, }{{} S λ where ˆm λ = ( ˆm λ (X 1 ),..., ˆm λ (X n )) T (m (x)) 2 dx small (m (x)) 2 dx large EconometricsII-Kneip 4 21

53 In the following we will concentrate on the situation that p is large compared to n (e.g. p n) such that the bias of spline approximation is negligible. Then only the choice of λ is crucial for the quality of the estimator. We will additionally assume that the true regression function m is twice continuously differentiable. Bias 2 ( ˆm λ ) increases as λ increases extreme case: λ = straight line fit V ar( ˆm λ ) decreases as λ increases; the estimated functions ˆm λ is the smoother the larger λ. An optimal smoothing parameter λ opt will again balance bias and variance. - It can be verified that 1 n var ϵ ( ˆm λ (X i )) = 1 n E ϵ As n, nλ, i tr(sλ) 2 1 = O P ( λ ) 1/4 ( ϵ T S 2 λϵ ) = σ2 n tr(s2 λ) The degrees of freedoms of the estimation procedure are defined as df λ = tr(s λ ) (sometimes also df λ = tr(s2 λ ) is considered). These degrees of freedom can be seen as a nonparametric equivalent of the number of parameters to estimate in parametric regression. EconometricsII-Kneip 4 22

54 Bias and MASE Let m λ = ( m λ (X 1 ),..., m λ (X n )) T = E ϵ (S λ Y ) = X(X T X+λB) 1 X T m It is then easily seen that m λ is a solution of 1 n (m(x i ) m λ (X i )) 2 + λ i = inf s S k,τ1,...,τq { 1 n b a m λ(x) 2 dx (m(x i ) s(x i )) 2 + λ i b a s (x) 2 dx If the number of knots is sufficiently large bias of a best possible spline approximation s opt S 3,τ1,...,τ q to m is negligible, m s opt. The above relation then implies that for large number of knots 1 n (m(x i ) m λ (X i )) 2 + λ i b a m λ(x) 2 dx λ b a } m (x) 2 dx For a twice continuously differential function m it can indeed be shown that Bias( ˆm λ ) 2 is proportional to λ. Hence, as n, λ 0, nλ, MASE( ˆm λ ) = O P (λ + 1 nλ 1/4 ) - The above result implies that an optimal smoothing parameter balancing bias and variance will be of order λ opt n 4/5. - Then MASE( ˆm λopt ) = O P (n 4/5 ) EconometricsII-Kneip 4 23

55 Similar results can be obtained for the mean integrated squared error (MISE): ( ) b MISE( ˆm) = E ϵ (m(x) ˆm λopt (x)) 2 dx = O P (n 4/5 ) a Again, estimates of λ opt may be determined by minimizing CV (λ) or GCV (λ): Cross-validation (CV): CV (λ) = 1 n ( 2 Y i ˆm λ, i (X i )), n i=1 Here, for any i = 1,..., n, ˆm p, i is the leave-one-out estimator of m to be obtained when only the die n 1 observations (Y 1, X 1 ),..., (Y i 1, X i 1 ), (Y i+1, X i+1 ),..., (Y n, X n ) are used. Generalized cross-validation (GCV): 1 n ( 2 GCV (λ) = n(1 df Y λ i ˆm λ (X i )), n ) 2 i=1 where df λ = tr(s λ ) (= degrees of freedom) Remark: Under some regularity conditions it can be shown that MASE( ˆm λopt ) MASE( ˆmˆλopt ) = O P (n 1/2 MASE( ˆm λopt ) 1/2), where ˆλ opt denotes the smoothing parameters estimated by CV or GCV. EconometricsII-Kneip 4 24

56 Smoothing Splines (df h = 3) 1.7 income age Smoothing Splines (df h = 10) 1.7 income age EconometricsII-Kneip 4 25

57 4.3 Estimating the error variance The magnitude of the variance σ 2 of the error terms ϵ i influences the accuracy of the estimators. For simplicity it will in the following be assumed that the observations X i are ordered, X 1 X 2 X n, and that m is a smooth, twice continuously differentiable function. a) Based on an nonparametric estimate ˆm of m a simple estimate of σ 2 is obtained by averaging squared residuals: ˆσ 2 := 1 (Y i ˆm(X i )) 2 n b) The method of Rice: i ˆσ 2 := 1 2(n 1) n (Y i Y i 1 ) 2 i=2 It can be shown that E ϵ (ˆσ 2 ) = σ 2 + O P ( 1 n 2 ) and V ar ϵ (ˆσ 2 ) = O P ( 1 n ). c) The method of Gasser et.al.: In a first step pseudo-residuals ˆϵ i = X i+1 X i X i+1 X i 1 Y i 1 + X i X i 1 X i+1 X i 1 Y i+1 Y i are calculated. Then ˆσ 2 := 1 n 2 n 1 Often methods b) or c) are preferred to a). The important point is that the bias of the estimators in b) or c) is much smaller than the bias of the estimator in a). However, all procedures a), b), c) yield consistent estimators of σ 2. i=2 EconometricsII-Kneip 4 26 ˆϵ 2 i

58 Confidence Intervals for spline methods Consider spline fitting based on a roughness penalty with smoothing parameter λ. Under some suitable regularity conditions it can easily be shown that as n, λ 0, nλ, ˆm λ (x) m λ (x) varϵ ( ˆm λ (x)) L N(0, 1) holds for all x (central limit theorem). Here again m λ (x) = E ϵ ( ˆm λ (x)). Note that with cov ϵ ( ˆβ) = σ 2 (X T X + λb) 1 X T X(X T X + λb) 1 }{{} Q λ This implies that with b(x) = (b 1 (x),..., b p (x)) T var ϵ ( ˆm λ (x)) = var ϵ (b(x) T ˆβ) = σ 2 b(x) T Q λ b(x). Based on an estimate ˆσ 2 of σ 2 this leads to asymptotically valid (1 α) confidence intervals for m h (x): ˆm h (x) ± z 1 α/2 ˆσ 2 b(x) T Q λ b(x), where z 1 α/2 is the 1 α/2-quantile of the standard normal distribution (e.g. z = 1.96). These intervals can be calculated for any point x confidence bands for the function m h. In the literature one speaks of confidence intervals for the variability of ˆm λ (x) (i.e. error bounds for the random fluctuation due to the error terms ϵ i ). Quite obviously, bias is not taken into account when calculating these intervals. EconometricsII-Kneip 4 27

59 4.4 The Nadaraya-Watson Kernel Estimator Idea: Approximation of m(x) by a local average of the observations Y i : n ˆm h (x) = w(x, X i, h)y i i=1 The weight function w is constructed in such a way that the weight of an observation Y i is the smaller the larger the distance x X i. A smoothing parameter ( bandwidth ) h determines the rate of decrease of the weights w(x, X i, h) as x X i increases. Kernel estimators calculate weights on the basis of a prespecified kernel function K. Usually K is chosen as a symmetric density functions. Nadaraya-Watson kernel estimator: n K( x X i h ˆm h (x) = ) 1 n i=1 j=1 K( x X j h ) Y i = nh 1 nh n i=1 K( x X i h )Y i n j=1 K( x X j h ) For every possible bandwidth h > 0 the sum of all weights w(x, X i, h) = K( x X n i )/ K( x X j ) h h j=1 is always equal to 1, i w(x, X i, h) = 1. Kernel estimators are linear smoothing methods. ˆm h = ( ˆm h (X 1 ),..., ˆm h (X n )) T = S h Y where the elements of the n n matrix S h are given by (S h ) ij = K( X i X j h ) n l=1 K( X i X l h ), tr(s h) = O( 1 h ) EconometricsII-Kneip 4 28

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Univariate Kernel Regression The relationship between two variables, X and Y where m(

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta November 29, 2007 Outline Overview of Kernel Density

More information

Nonparametric Density Estimation

Nonparametric Density Estimation Nonparametric Density Estimation Econ 690 Purdue University Justin L. Tobias (Purdue) Nonparametric Density Estimation 1 / 29 Density Estimation Suppose that you had some data, say on wages, and you wanted

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.2 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014 2 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation 3 2.1 Histogram

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Kernel density estimation

Kernel density estimation Kernel density estimation Patrick Breheny October 18 Patrick Breheny STA 621: Nonparametric Statistics 1/34 Introduction Kernel Density Estimation We ve looked at one method for estimating density: histograms

More information

4 Nonparametric Regression

4 Nonparametric Regression 4 Nonparametric Regression 4.1 Univariate Kernel Regression An important question in many fields of science is the relation between two variables, say X and Y. Regression analysis is concerned with the

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Nonparametric Inference via Bootstrapping the Debiased Estimator

Nonparametric Inference via Bootstrapping the Debiased Estimator Nonparametric Inference via Bootstrapping the Debiased Estimator Yen-Chi Chen Department of Statistics, University of Washington ICSA-Canada Chapter Symposium 2017 1 / 21 Problem Setup Let X 1,, X n be

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Chapter 9. Non-Parametric Density Function Estimation

Chapter 9. Non-Parametric Density Function Estimation 9-1 Density Estimation Version 1.1 Chapter 9 Non-Parametric Density Function Estimation 9.1. Introduction We have discussed several estimation techniques: method of moments, maximum likelihood, and least

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O

O Combining cross-validation and plug-in methods - for kernel density bandwidth selection O O Combining cross-validation and plug-in methods - for kernel density selection O Carlos Tenreiro CMUC and DMUC, University of Coimbra PhD Program UC UP February 18, 2011 1 Overview The nonparametric problem

More information

Confidence intervals for kernel density estimation

Confidence intervals for kernel density estimation Stata User Group - 9th UK meeting - 19/20 May 2003 Confidence intervals for kernel density estimation Carlo Fiorio c.fiorio@lse.ac.uk London School of Economics and STICERD Stata User Group - 9th UK meeting

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

If we want to analyze experimental or simulated data we might encounter the following tasks:

If we want to analyze experimental or simulated data we might encounter the following tasks: Chapter 1 Introduction If we want to analyze experimental or simulated data we might encounter the following tasks: Characterization of the source of the signal and diagnosis Studying dependencies Prediction

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Nonparametric Regression. Changliang Zou

Nonparametric Regression. Changliang Zou Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression ST 430/514 Recall: A regression model describes how a dependent variable (or response) Y is affected, on average, by one or more independent variables (or factors, or covariates)

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

Log-Density Estimation with Application to Approximate Likelihood Inference

Log-Density Estimation with Application to Approximate Likelihood Inference Log-Density Estimation with Application to Approximate Likelihood Inference Martin Hazelton 1 Institute of Fundamental Sciences Massey University 19 November 2015 1 Email: m.hazelton@massey.ac.nz WWPMS,

More information

Bickel Rosenblatt test

Bickel Rosenblatt test University of Latvia 28.05.2011. A classical Let X 1,..., X n be i.i.d. random variables with a continuous probability density function f. Consider a simple hypothesis H 0 : f = f 0 with a significance

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

Statistical inference on Lévy processes

Statistical inference on Lévy processes Alberto Coca Cabrero University of Cambridge - CCA Supervisors: Dr. Richard Nickl and Professor L.C.G.Rogers Funded by Fundación Mutua Madrileña and EPSRC MASDOC/CCA student workshop 2013 26th March Outline

More information

Linear Methods for Prediction

Linear Methods for Prediction Chapter 5 Linear Methods for Prediction 5.1 Introduction We now revisit the classification problem and focus on linear methods. Since our prediction Ĝ(x) will always take values in the discrete set G we

More information

1 Piecewise Cubic Interpolation

1 Piecewise Cubic Interpolation Piecewise Cubic Interpolation Typically the problem with piecewise linear interpolation is the interpolant is not differentiable as the interpolation points (it has a kinks at every interpolation point)

More information

Threshold Autoregressions and NonLinear Autoregressions

Threshold Autoregressions and NonLinear Autoregressions Threshold Autoregressions and NonLinear Autoregressions Original Presentation: Central Bank of Chile October 29-31, 2013 Bruce Hansen (University of Wisconsin) Threshold Regression 1 / 47 Threshold Models

More information

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite

More information

A Novel Nonparametric Density Estimator

A Novel Nonparametric Density Estimator A Novel Nonparametric Density Estimator Z. I. Botev The University of Queensland Australia Abstract We present a novel nonparametric density estimator and a new data-driven bandwidth selection method with

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i NADARAYA WATSON ESTIMATE JAN 0, 2006: version 2 DATA: (x i, Y i, i =,..., n. ESTIMATE E(Y x = m(x by n i= ˆm (x = Y ik ( x i x n i= K ( x i x EXAMPLES OF K: K(u = I{ u c} (uniform or box kernel K(u = u

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Alternatives. The D Operator

Alternatives. The D Operator Using Smoothness Alternatives Text: Chapter 5 Some disadvantages of basis expansions Discrete choice of number of basis functions additional variability. Non-hierarchical bases (eg B-splines) make life

More information

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012

Gaussian Processes. Le Song. Machine Learning II: Advanced Topics CSE 8803ML, Spring 2012 Gaussian Processes Le Song Machine Learning II: Advanced Topics CSE 8803ML, Spring 01 Pictorial view of embedding distribution Transform the entire distribution to expected features Feature space Feature

More information

Monte Carlo Studies. The response in a Monte Carlo study is a random variable.

Monte Carlo Studies. The response in a Monte Carlo study is a random variable. Monte Carlo Studies The response in a Monte Carlo study is a random variable. The response in a Monte Carlo study has a variance that comes from the variance of the stochastic elements in the data-generating

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

On variable bandwidth kernel density estimation

On variable bandwidth kernel density estimation JSM 04 - Section on Nonparametric Statistics On variable bandwidth kernel density estimation Janet Nakarmi Hailin Sang Abstract In this paper we study the ideal variable bandwidth kernel estimator introduced

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Non-parametric Inference, Polynomial Regression Week 9, Lecture 2 1 Bootstrapped Bias and CIs Given a multiple regression model with mean and

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

On the Robust Modal Local Polynomial Regression

On the Robust Modal Local Polynomial Regression International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression

More information

Can we do statistical inference in a non-asymptotic way? 1

Can we do statistical inference in a non-asymptotic way? 1 Can we do statistical inference in a non-asymptotic way? 1 Guang Cheng 2 Statistics@Purdue www.science.purdue.edu/bigdata/ ONR Review Meeting@Duke Oct 11, 2017 1 Acknowledge NSF, ONR and Simons Foundation.

More information

Lecture 7 Introduction to Statistical Decision Theory

Lecture 7 Introduction to Statistical Decision Theory Lecture 7 Introduction to Statistical Decision Theory I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw December 20, 2016 1 / 55 I-Hsiang Wang IT Lecture 7

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

Regression: Lecture 2

Regression: Lecture 2 Regression: Lecture 2 Niels Richard Hansen April 26, 2012 Contents 1 Linear regression and least squares estimation 1 1.1 Distributional results................................ 3 2 Non-linear effects and

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Adaptive Nonparametric Density Estimators

Adaptive Nonparametric Density Estimators Adaptive Nonparametric Density Estimators by Alan J. Izenman Introduction Theoretical results and practical application of histograms as density estimators usually assume a fixed-partition approach, where

More information

Local Modal Regression

Local Modal Regression Local Modal Regression Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Bruce G. Lindsay and Runze Li Department of Statistics, The Pennsylvania

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch

From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch From Histograms to Multivariate Polynomial Histograms and Shape Estimation Assoc Prof Inge Koch Statistics, School of Mathematical Sciences University of Adelaide Inge Koch (UNSW, Adelaide) Poly Histograms

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models Lecture 3. Hypothesis testing. Goodness of Fit. Model diagnostics GLM (Spring, 2018) Lecture 3 1 / 34 Models Let M(X r ) be a model with design matrix X r (with r columns) r n

More information

Nonparametric Statistics

Nonparametric Statistics Nonparametric Statistics Jessi Cisewski Yale University Astrostatistics Summer School - XI Wednesday, June 3, 2015 1 Overview Many of the standard statistical inference procedures are based on assumptions

More information

Variance Function Estimation in Multivariate Nonparametric Regression

Variance Function Estimation in Multivariate Nonparametric Regression Variance Function Estimation in Multivariate Nonparametric Regression T. Tony Cai 1, Michael Levine Lie Wang 1 Abstract Variance function estimation in multivariate nonparametric regression is considered

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA

University, Tempe, Arizona, USA b Department of Mathematics and Statistics, University of New. Mexico, Albuquerque, New Mexico, USA This article was downloaded by: [University of New Mexico] On: 27 September 2012, At: 22:13 Publisher: Taylor & Francis Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

A New Method for Varying Adaptive Bandwidth Selection

A New Method for Varying Adaptive Bandwidth Selection IEEE TRASACTIOS O SIGAL PROCESSIG, VOL. 47, O. 9, SEPTEMBER 1999 2567 TABLE I SQUARE ROOT MEA SQUARED ERRORS (SRMSE) OF ESTIMATIO USIG THE LPA AD VARIOUS WAVELET METHODS A ew Method for Varying Adaptive

More information

USING A BIMODAL KERNEL FOR A NONPARAMETRIC REGRESSION SPECIFICATION TEST

USING A BIMODAL KERNEL FOR A NONPARAMETRIC REGRESSION SPECIFICATION TEST Statistica Sinica 25 (2015), 1145-1161 doi:http://dx.doi.org/10.5705/ss.2014.008 USING A BIMODAL KERNEL FOR A NONPARAMETRIC REGRESSION SPECIFICATION TEST Cheolyong Park 1, Tae Yoon Kim 1, Jeongcheol Ha

More information

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density

More information

Integral approximation by kernel smoothing

Integral approximation by kernel smoothing Integral approximation by kernel smoothing François Portier Université catholique de Louvain - ISBA August, 29 2014 In collaboration with Bernard Delyon Topic of the talk: Given ϕ : R d R, estimation of

More information

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions

Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Supplement to Quantile-Based Nonparametric Inference for First-Price Auctions Vadim Marmer University of British Columbia Artyom Shneyerov CIRANO, CIREQ, and Concordia University August 30, 2010 Abstract

More information

Statistics 3858 : Maximum Likelihood Estimators

Statistics 3858 : Maximum Likelihood Estimators Statistics 3858 : Maximum Likelihood Estimators 1 Method of Maximum Likelihood In this method we construct the so called likelihood function, that is L(θ) = L(θ; X 1, X 2,..., X n ) = f n (X 1, X 2,...,

More information

Introduction to Curve Estimation

Introduction to Curve Estimation Introduction to Curve Estimation Density 0.000 0.002 0.004 0.006 700 800 900 1000 1100 1200 1300 Wilcoxon score Michael E. Tarter & Micheal D. Lock Model-Free Curve Estimation Monographs on Statistics

More information

Nonparametric Function Estimation with Infinite-Order Kernels

Nonparametric Function Estimation with Infinite-Order Kernels Nonparametric Function Estimation with Infinite-Order Kernels Arthur Berg Department of Statistics, University of Florida March 15, 2008 Kernel Density Estimation (IID Case) Let X 1,..., X n iid density

More information

Covariance function estimation in Gaussian process regression

Covariance function estimation in Gaussian process regression Covariance function estimation in Gaussian process regression François Bachoc Department of Statistics and Operations Research, University of Vienna WU Research Seminar - May 2015 François Bachoc Gaussian

More information

Analysis methods of heavy-tailed data

Analysis methods of heavy-tailed data Institute of Control Sciences Russian Academy of Sciences, Moscow, Russia February, 13-18, 2006, Bamberg, Germany June, 19-23, 2006, Brest, France May, 14-19, 2007, Trondheim, Norway PhD course Chapter

More information

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review

STATS 200: Introduction to Statistical Inference. Lecture 29: Course review STATS 200: Introduction to Statistical Inference Lecture 29: Course review Course review We started in Lecture 1 with a fundamental assumption: Data is a realization of a random process. The goal throughout

More information

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006

Least Squares Model Averaging. Bruce E. Hansen University of Wisconsin. January 2006 Revised: August 2006 Least Squares Model Averaging Bruce E. Hansen University of Wisconsin January 2006 Revised: August 2006 Introduction This paper developes a model averaging estimator for linear regression. Model averaging

More information

Nonparametric Density Estimation (Multidimension)

Nonparametric Density Estimation (Multidimension) Nonparametric Density Estimation (Multidimension) Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann February 19, 2007 Setup One-dimensional

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Density estimators for the convolution of discrete and continuous random variables

Density estimators for the convolution of discrete and continuous random variables Density estimators for the convolution of discrete and continuous random variables Ursula U Müller Texas A&M University Anton Schick Binghamton University Wolfgang Wefelmeyer Universität zu Köln Abstract

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Classification. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Classification Logistic regression models for classification problem We consider two class problem: Y {0, 1}. The Bayes rule for the classification is I(P(Y = 1 X = x) > 1/2) so

More information

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty

Non-Parametric Bootstrap Mean. Squared Error Estimation For M- Quantile Estimators Of Small Area. Averages, Quantiles And Poverty Working Paper M11/02 Methodology Non-Parametric Bootstrap Mean Squared Error Estimation For M- Quantile Estimators Of Small Area Averages, Quantiles And Poverty Indicators Stefano Marchetti, Nikos Tzavidis,

More information

Nonparametric Estimation of Luminosity Functions

Nonparametric Estimation of Luminosity Functions x x Nonparametric Estimation of Luminosity Functions Chad Schafer Department of Statistics, Carnegie Mellon University cschafer@stat.cmu.edu 1 Luminosity Functions The luminosity function gives the number

More information

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives

More information