Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction

Size: px
Start display at page:

Download "Nonparametric Regression Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction"

Transcription

1 Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann

2 Univariate Kernel Regression The relationship between two variables, X and Y where m( ) is a function. Y = m(x ) Model: y i = m(x i ) + ε i, i = 1,..., n, (1) E(Y X = x) = m(x). (2) (1): Y = m(x ) doesn t need to hold exactly for the i th observation. Error term ε. (2): The relationship holds on average.

3 Univariate Kernel Regression Goal: Estimate m( ) on basis of the observations (x i, y i ), i = 1,..., n. In a parametric approach: m(x) = α + βx and then estimate α and β. In the nonparametric approach: No prior restricions on m( ).

4 Conditional Expectation (repetition) Let X and Y be two random variables with joint pdf f (x, y). Then the conditional expectation of Y given that X = x E(Y X = x) = yf (y x) dy f (x, y) = y f X (x) dy = m(x) Note: m(x) might be quite nonlinear.

5 Example Consider the joint pdf f (x, y) = x + y for 0 x 1 and 0 y 1 with f (x, y) = 0 elsewhere, and marginal pdf f X (x) = 1 0 f (x, y) dy = x with f X (x) = 0 elsewhere. Then which is nonlinear. E(Y X = x) = = = 1 for 0 x 1 f (x, y) y f X (x) dy y x + y 0 x + 1 dy x x + 1 = m(x) 2

6 Design Random design Observations (X i, Y i ), i = 1,..., n from bivariate distribtuion f (x, y). The distribution of f X (x) is unknown. Fix design Controle the predictor variable, X, then Y is the only random variable. (Eg. the beer-example). The distribution of f X (x) is known.

7 Kernel Regression Random design The Nadaraya-Watson estimator ˆm(x) = n 1 n i=1 K h(x X i )Y i n 1 n j=1 K h(x X j ) Rewrite the Nadaraya-Watson estimator ( ) ˆm(x) = 1 n K h (x X i ) n n 1 n i=1 j=1 K h(x X j ) = 1 n W hi (x)y i n i=1 1 Weighted (local) average of Y i (note: n n i=1 W hi(x) = 1) h determines the degree of smoothness. What happens if the denominator of W hi (x) is equal to 0? Then the numerator is also equal to 0, and the estimator is not defined. This happend in regions of sparse data. Y i

8 Kernel Regression

9 Kernel Regression Fix design f X (x) is known. Weights of the form W FD hi (x) = K h(x x i ) f X (x) Simpler structure and therefore easier to analyse. One particular fixed design kernel regression estimator: Gasser-Müller For the case of ordered design points x (i), i = 1,..., n from [a, b] si Whi GM (x) = n K h (x u) du, s i 1 where s i = x (i)+x (i+1) 2, s 0 = a, s n+1 = b. Note that Whi GM (x) sums to 1.

10 Statistical Properties Are the kernel regression estimator consistent? Theorem 4.1: Consistence (Nadaraya-Watson) Assume the univariate random design model and the regularity conditions: K(u) du <, uk(u) 0 for u, EY 2 <. Suppose also h 0, nh, then 1 n n W hi (x)y i = ˆm h (x) P m(x) i=1 where for x holds f X (x) > 0 and x is a point of continuity of m(x), f X (x), and σ 2 (x) = V(Y X = x). Proof idea: Show that the numerator and the denominator of ˆm h (x) converge. Then ˆm h (x) converges (Slutsky s theorem).

11 Statistical Properties What is the speed of the convergence? Theorem 4.2: Speed of the convergence (Gasser-Müller) Assume the univariate fixed design model and the conditions: K( ) has support [ 1, 1] with K( 1) = K(1) = 0, m( ) is twice continuously differentiable, max i x i x i 1 = O(n 1 ) Assume V(ε i ) = σ 2, i = 1,..., n. Then, under h 0, nh, it holds { } 1 n MSE Whi GM (x)y i 1 n nh σ2 K h4 4 µ2 2(K){m (x)} 2 i=1 Note: AMSE = var + bias 2, ie. usual bias-variance trade-off.

12 Statistical Properties What is the speed of the convergence in the random design? Theorem 4.3: Speed of the convergence (Nadaraya-Watson) Assume the univariate random design model and the conditions K(u) du <, uk(u) 0 for u, EY 2 <. Suppose also h 0, nh, then MSE( ˆm h (x)) 1 σ 2 ( nh f X (x) K 2 2+ h4 4 µ2 2(K) m (x) + 2 m (x)f X (x) ) 2 f X (x) where for x holds f X > 0 and x is a point of continuity of m (x), m (x), f X (x), f X (x), and σ 2 (x) = V(Y X = x) Proof idea: Rewrite the Nadaraya-Watson estimator: ˆm h (x) = ˆr h(x) ˆf h (x)

13 Statistical Properties Asymptotic MSE: AMSE(n, h) = 1 nh C 1 + h 4 C 2 Minimizing wrt. h gives the optimal bandwidth h opt n 1/5 Rate of convergence af AMSE is of order O(n 4/5 ). Slower than the rate obtained by LS estimation in linear regression, but the same as in nonparametric density estimation.

14 Local Polynomial Regression The Nadaraya-Watson estimator is a special case of local polynomial regression: Local constant least squares fit. To motivate the local polynomial fit: Look at Taylor expansion of the unknown conditional expectation function: m(t) m(x) + m (x)(t x) + + m (p) (x)(t x) p 1 p! for t in a neighborhood of x. Idea: Fit a polynomial in a neighborhood of x (called local polynomial regression). min β n {Y i β 0 β 1 (X i x) β p (X i x) p } 2 K h (x X i ) }{{}}{{} polynomial local i=1 where β = (β 0, β 1,..., β p ) T.

15 Local Polynomial Regression The result is a weighted least squares estimator ˆβ(x) = (X T WX) 1 X T WY 1 X 1 x... (X 1 x) p Y 1 where X = , Y =. and 1 X n x... (X n x) p Y n K h (x X 1 ) K h (x X 2 )... 0 W = K h (x X n ) Note: This estimator varies with x (in contrast to parametric least squares)

16 Local Polynomial Regression Denote the components of ˆβ(x) by ˆβ 0 (x),..., ˆβ p (x). Then the local polynomial estimator of the regression function m is ˆm p,h (x) = ˆβ 0 (x) because m(x) β 0 (x) (recall the Taylor expansion).

17 Local Polynomial Regression Local constant estimator (Nadaraya-Watson): p = 0 Then min β n {Y i β 0 } 2 K h (x X i ) i=1 ˆm 0,h = n i=1 K h(x X i )Y i n i=1 K h(x X i )

18 Local Polynomial Regression Local linear estimator: p = 1 Then and where min β n {Y i β 0 β 1 (X 1 x)} 2 K h (x X i ) i=1 ( Sh,0 (x) S ˆβ(x) = h,1 (x) S h,1 (x) S h,2 (x) ) 1 ( Th,0 (x) T h,1 (x) ˆm 1,h = ˆβ 0 (x) = T h,0(x)s h,2 (x) T h,1 (x)s h,1 (x) S h,0 (x)s h,2 (x) S 2 h,1 (x) S h,j (x) = T h,j (x) = n K h (x X i )(X i x) j i=1 i=1 n K h (x X i )(X i x) j Y i )

19 Local Polynomial Regression Local polynomial estimator: The general formula for polynomial of order p min β n {Y i β 0 β 1 (X i x) β p (X i x) p } 2 K h (x X i ) i=1 Then ˆβ(x) = S h,0 (x) S h,1 (x)... S h,p (x) S h,1 (x) S h,2 (x)... S h,p+1 (x) S h,p (x) S h,p+1 (x)... S h,2p (x) 1 T h,0 (x) T h,1 (x). T h,p (x)

20 Local Constant Regression n = 7125 observations of net-income and expenditure on food expenditures from the Family Expenditure Survey of British household in 1973.

21 Local Linear Regression Differences between local constant and linear regression on the boundaries. Later we will see, that the local linear fit will be influenced less by outliers.

22 Local Linear Regression Asymtotic MSE for the local linear regression estimator: Note: AMSE( ˆm 1,h (x)) = 1 σ 2 (x) nh f X (x) K h4 4 {m (x)} 2 µ 2 2(K) The bias is design-independent, The bias disappears when m( ) is linear.

23 Local Linear Regression Compare the AMSE: Compare the AMSE of the local constant, the Gasser-Müller and the local linear estimators: AMSE( ˆm 0,h (x)) = AMSE( ˆm GM,h (x)) = AMSE( ˆm 1,h (x)) = 1 σ 2 { (x) nh f X (x) K h4 4 µ2 2 (K) m (x) + 2 m (x)f X (x) } 2 f X (x) 1 nh σ2 K h4 4 µ2 2 (K){m (x)} 2 1 σ 2 (x) nh f X (x) K h4 4 µ2 2 (K){m (x)} 2 LC uses one parameter less compared to LL without reducing the asymptotic variance. But the extra parameter enables LL to reduce the bias. LL fit improve the estimation i boundary compared to LC. GM improves the bias compared to LC.

24 Local Linear Regression Remarks: LL is a weighted local average of the response variables Y i (as the LC), h determines the degree of smoothness of ˆm p,h : For h 0, at X i, then ˆm p,h Y i. When h all weights are equal, ie. we obtain a parametric pth order polynomial fit. Derivatives of m: The natual approach: Estimate m by ˆm and compute the derivative ˆm. Alternative and more effecient method: Polynomial derivative estimator ˆm (ν) p,h (x) = ν! ˆβ ν (x) for the νth derivative of m( ). Usually the order of the polynomial is p = ν + 1 or p = ν + 3.

25 Nearest-Neighbor Estimator Other nonparametric smoothing techniques that differ from the kernel method. Nearest-Neighbor Estimator Kernel regression: Weighted averages Y i in a fixed neighborhood (governed by h) around x. k-nn: Weighted averages of Y i in a neighborhood around x. The width is variable. Using the k nearest observations to x. ˆm k (x) = 1 n W ki (x)y i n where with the set of indices W ki (x) = i=1 { n/k if i Jx 0 otherwise J x = {i : X i is one of the k nearest observations to x}

26 Nearest-Neighbor Estimator Remark: When we estimate m( ) at a point x with spare data, then the k nearest neighbors are rather far away from x. k is the smoothing parameter. Increasing k makes the estimate smoother. The k-nn estimator is the kernel estimator with uniform kernel K(u) = ( u 1) and variable bandwidth R = R(k) with R(k) being the distance between x and its furtherst k-nearest neighbor: ˆm k (x) = n i=1 K R(x X i )Y i n i=1 K R(x X i ) The k-nn estimator can be generalised by considering other kernel functions.

27 Nearest-Neighbor Estimator Bias and variance: Let k, k/n 0 and n. Then E{ ˆm k (x)} m(x) µ { 2(K) 8f X (x) 2 m (x) + 2 m (x)f X (x) } ( ) k 2 f X (x) n Remark: V{ ˆm k (x)} 2 K 2 σ 2 (x) 2 k The variance does not depend of f X (x) (unlike LC) because k-nn always averages over k observations. k-nn approx. identical to LC with bandwidth h wht. MSE: Choose k = 2nhf X (x).

28 Nearest-Neighbor Estimator

29 Median Smoothing Median Smoothing: Goal: Estimate the conditional median function where ˆm(x) = med{y i : i J x } J x = {i : X i is one of the k nearest observations to x} e.i. the median of the Y i s, for which X i is one of the k nearest neighbors of x. med(y X = x) is more robust to outliers than E(Y X = x)

30 Median Smoothing

31 Spline Smoothing Spline Smoothing: Consider the residual sum of squares (RSS) as a criterion for the goodness of fit of m( ) RSS = n {Y i m(x i )} 2 i=1 Minimizing RSS: Interpolating the data m(x i ) = Y i, i = 1,..., n Solution: Add a stabilizer that penalizes non-smoothness of m( ) m 2 2 = {m (x)} 2 dx

32 Spline Smoothing Minimizing problem: ˆm λ = arg min S λ(m) m where S λ (m) = n i=1 {Y i m(x i )} 2 + λ m 2 2. For the class of all twice diff. functions on [a, b] = [X (1), X (n) ] then the unique minimizer is the cubic spline, which consists of cubic polynomials p i (x) = α i + β i x + γ i x 2 + δ i x 3, i = 1,..., n 1 between X (i), X (i+1). λ controls the weight given to the stabilizer: High λ more weigth to m 2 2 smooth estimate. λ 0 ˆm λ ( ) interpolation. λ ˆm λ ( ) linear function.

33 Spline Smoothing The estimator must be twice continuously differentiable: p i (X (i) ) = p i 1 (X (i) ), (no jumps in the function) p i (X (i)) = p i 1 (X (i)), (no jumps in the first derivative) p i (X (i)) = p i 1 (X (i)), (no jumps in the second derivative) p 1 (X (1)) = p n 1 (X (n) ) = 0, (boundary condition) These restrictions and the conditions for minimizing S λ (m) wrt. the coefficients of p i define a system of linear equations which can be solved in O(n) calculations.

34 Spline Smoothing The spline smoother is a linear estimator in Y i ˆm λ (x) = 1 n n W λ,i (x)y i i=1 The spline smoother is asymptotically equivalent to the kernel smoother (under certain conditions) with the spline kernel K S (u) = 1 ( 2 exp u ) ( u sin + π ) with local bandwidth h(x i ) = λ 1/4 n 1/4 f (X i ) 1/4.

35 Spline Smoothing

36 Orthogonal Series Suppose that m( ) can be represented by a Fourier series m(x) = β j ϕ j (x) j=0 where {ϕ j (x)} j=0 is a known basis of functions and {β} j=0 are the unknown Fourier coefficients. Goal: Estimate the unknown Fourier coefficients. Cannot estimate a infinite number of coefficients from a finite number of observations. Therefore choose N that will be included in the Fourier series representaion.

37 Orthogonal Series Estimation proceeds in three steps: Select a basis of functions, Select N < n (integer), Estimate N unknown coefficients N: Smoothing parameter. Large N many terms in the Fourier series interpolation, Small N few terms smooth estimates.

38 Orthogonal Series

39 Smoothing Parameter Selection How can we choose the smoothing parameter of the kernel regression estimators? Which criteria do we have, that measures how close the estimate is to the true curve?

40 Smoothing Parameter Selection Mean Squared Error (MSE): MSE( ˆm h (x)) = E[{ ˆm h (x) m(x)} 2 ] m(x): Unknown constant, ˆm h (x): Random variable, ei. E is taken wrt. this rv. ˆm h (x) is a function of {X i, Y i } n i=1, therefore MSE( ˆm h (x)) =... { ˆm h (x) m(x)} 2 f (x 1,..., x n, y 1,..., y n ) dx 1...dx n dy 1,...dy n Local measure. The AMSE-optimal bandwidth is a complicated function of m (x) and σ 2 (x).

41 Smoothing Parameter Selection Integrated Squared Error (ISE): ISE( ˆm h ) = { ˆm h (x) m(x)} 2 w(x) f X (x) dx Global measure. ISE rv. different samples gives different values of ˆm h (x) and ISE. Weight function: w(x). Used to assign less weight to observations in regions of sparse data (to reduce variance) or at the tail (to reduce the boundary effects).

42 Smoothing Parameter Selection Mean Integrated Squared Error (MISE): MISE( ˆm h ) = E{ISE( ˆm h )} [ =... Expected value of ISE. ] { ˆm h (x) m(x)} 2 w(x) f X (x) dx f (x 1,..., x n, y 1,..., y n ) dx 1...dx n dy 1,...dy n

43 Smoothing Parameter Selection Averaged Squared Error (ASE): ASE( ˆm h ) = 1 n n { ˆm h (X j ) m(x j )} 2 w(x j ) j=1 Gloabal measure and a rv. Discrete approximation to ISE.

44 Smoothing Parameter Selection Mean Averaged Squared Error (MASE): MASE( ˆm h ) = E{ASE X 1 = x 1,..., X n = x n } 1 n = E { ˆm h (X j ) m(x j )} 2 w(x j ) n j=1 Conditional expectation of ASE If we view X 1,..., X n as rv. then MASE is a rv. MASE ia asymptotic version of AMISE

45 Smoothing Parameter Selection Which measure should we use to derive a bandwidth selection method? A natural choise MISE/AMISE: Used in the density case. But more unknown quantities than in the density case. Bandwidth selection: Cross-validation, Penalty terms, Plug-in methods (Fan & Gijbels, 1996) NW-est.: ASE, ISE and MISE lead asymptotically to the same smoothing. Therefore use the easiest: ASE (discrete ISE).

46 Smoothing Parameter Selection Averages Squared Error ASE( ˆm h ) = 1 n n { ˆm h (X i ) m(x i )} 2 w(x i ) i=1 The conditional expectation, MASE MASE(h) = E{ASE(h) X 1 = x 1,...X n = x n } = 1 n E[{ ˆm h (X i ) m(x i )} 2 X 1 = x 1,...X n = x n ] w(x i ) n i=1

47 Smoothing Parameter Selection Bias 2 (thin solid line), variance (dashed line), MASE (thick line). Simulated data set: m(x) = {sin(2πx 3 )} 3 Y i = m(x i ) + ε i, X i U(0, 1), ε i N(0, 0.01)

48 Smoothing Parameter Selection However, MASE and ASE contain m( ) which is unknown. Resubstitution estimate: Replace m( ) with the observations Y p(h) = 1 n n {Y i ˆm h (X i )} 2 w(x i ) i=1 The weighted residual sum of squares (RSS). Problem: Y i is used in ˆm h (X i ) to predict it self. Therefore p(h) arbitrarily small by letting h 0 (interpolation).

49 Smoothing Parameter Selection Cross-validation: Leave-one-out-estimator: ˆm h, i (X i ) = j i K h(x i X j )Y j j i K h(x i X j ) Cross-validation function: CV (h) = 1 n n {Y i ˆm h, i (X i )} 2 w(x i ) i=1 Choose h that minimizes CV(h)

50 Smoothing Parameter Selection Left: Local constant regression estimator (Nadaraya-Watson) with cross-validation bandwidth (ĥ cv = 0.15). Quartic kernel. Right: Local linear regression estimator with cross-validation bandwidth (ĥcv = 0.56). More stable in regions with sparse data and outperforms the LC-est. at the boundaries.

51 Plug-in method (Fan & Gijbels, 1996) Bandwidth selection methods: Constant (global) bandwidth, Variable bandwidth. Reduction of bias at peaked regions, and variance at flat regions. Local variable bandwidth, h(x 0 ) Varies with the location point x 0. Global variable bandwidth, h(x i ) Varies with the data points. Focus on constant and local variable bandwidth.

52 Plug-in method (Fan & Gijbels, 1996) Local bandwidth: Optimal local bandwidth by minimizing the conditional MSE [Bias{ ˆm ν (x 0 ) X}] 2 + V{ ˆm ν (x 0 ) X} where ˆm ν ( ) is the ν th derivative of ˆm( ). Use the general expressions for bias and variance to obtain the (local) AMSE-optimal bandwidth [ hopt LOC σ 2 ] 1/(2p+3) (x 0 ) (x 0 ) = C ν,p (K) {m (p+1) (x 0 )} 2 n 1/(2p+3) f (x 0 ) where C ν,p (K) is just a contant C ν,p (K) = [ (p + 1)! 2 (2ν + 1) Kν 2 ] 1/(2p+3) (t) dt 2(p + 1 ν){ t p+1 Kν 2 (t) dt} 2

53 Plug-in method (Fan & Gijbels, 1996) Constant bandwidth: Optimal global bandwidth by minimizing the conditional MISE ([Bias{ ˆmν (x 0 ) X}] 2 + V{ ˆm ν (x 0 ) X} ) w(x) dx where w( ) 0 som weight function. Use (again) the asymptotic expression of bias and variance to obtain the optimal bandwidth h GLO opt [ σ 2 ] 1/(2p+3) (x)w(x)/f (x) dx = C ν,p (K) {m (p+1) (x)} 2 n 1/(2p+3) w(x) dx

54 Plug-in method (Fan & Gijbels, 1996) The optimal bandwidths hopt LOC quantities: Design density: f ( ) and h GLO opt The conditional variance: σ 2 ( ) = V(Y X = ) The derivative function: m (p+1) ( ) depends on unknown [ hopt LOC σ 2 ] 1/(2p+3) (x 0 ) (x 0 ) = C ν,p(k) {m (p+1) (x 0 )} 2 n 1/(2p+3) f (x 0 ) [ σ hopt GLO 2 ] 1/(2p+3) (x)w(x)/f (x) dx = C ν,p (K) {m (p+1) (x)} 2 n 1/(2p+3) w(x) dx Solution: Substitute by pilot estimates plug-in bandwidth.

55 Plug-in method (Fan & Gijbels, 1996) Rule of thumb for constant bandwidth selection: We find pilot estimates of f ( ), σ 2 ( ) and m (p+1) ( ) by fitting a polynomial of order p + 3 globally to m(x) (parametric fit): ˇm(x) = ˇα ˇα p+3 x p+3 ˇσ 2 : The standardized RSS from this fit. ˇm (p+1) : Derivative function on quadratic form.

56 Plug-in method (Fan & Gijbels, 1996) We want to estimate m (ν) ( ) and we take w(x) = f (x)w 0 (x). Subsitute the pilot estimates into the optimal constant bandwidth. Rule-of-thumb Bandwidth [ ˇσ ȟ ROT = C ν,p (K) 2 w 0 (x) dx n i=1 { ˇm(p+1) (X i )} 2 w 0 (X i ) ] 1/(2p+3) [ (p+1)! where C ν,p(k) = 2 (2ν+1) ] 1/(2p+3) Kν 2 (t) dt 2(p+1 ν){ t p+1 Kν 2 (t) dt}2 is a constant (tabulated for different kernels and different values of ν and p).

57 Smoothing Parameter Selection Local linear regression (p = 1) and the derivative estimation (p = 2) and Rule-of-thumb bandwidth, h.

58 Confidence Regions How close is the smoothed curve to the true curve? Theorem 4.5: Asymp. distribution of the Nadaraya-Watson est. Suppose m and f X are twice differentiable. K(u) 2+κ du < for some κ > 0, x is a continuity point of σ 2 (x), E( Y 2+κ X = x) and f X (x) > 0. Take h = cn 1/5. Then n 2/5 { ˆm h (x) m(x)} L N(b x, v 2 x ) where { m b x = c 2 (x) µ 2 (K) + m (x)f X (x) } 2 f X (x) v 2 x = σ2 (x) K 2 2 cf X (x)

59 Confidence Regions Suppose bias is of negligible magnitude compare to the variance, e.g. h sufficiently small. Approx. confidence intervals: [ ˆm h (x) z 1 α 2 K 2ˆσ 2 (x) n h ˆf h (x), ˆm h(x) + z 1 α 2 ] K 2ˆσ 2 (x) n h ˆf h (x) where z 1 α is the (1 α 2 2 )-quantile of the normal distribution and the estimate of σ 2 (x) is ˆσ 2 (x) = 1 n n W hi (x){y i ˆm h (x)} 2 i=1 where W h i is the weights from the Nadaraya-Watson est.

60 Confidence Regions The Nadaraya-Watson kernel regression and 95% confidence intervals, h = 0.2.

61 Hypothesis Testing What would we like to test? Is there indeed an impact of X on Y? Is the estimated function m( ) significantly different from the traditional parametrerization (eg. linear or log-linear models)? The optimal rates of convergence are different for nonparametric estimation and nonparametric testing. Therefore, the choice of smoothing parameter is an issue to be discussed separately.

62 Hypothesis Testing Specifying the hypothesis: H 0 : The hypothesis, H 1 : The alternative. Nonparametric regression: H 0 : m(x) 0 (X has no impact on Y. Assume EY = 0, otherwise, Ỹ i = Y i Ȳ ). H 1 : m(x) 0. Estimate m(x) by the Nadaraya-Watson estimate: ˆm( ) = ˆm h ( )

63 Hypothesis Testing A measure for the deviation from zero: T 1 = n h { ˆm(x) 0} 2 w(x) dx where w(x) is a weight function (trim boundaries or regions of sparse data). If w(x) = f X (x)w(x) Then the empirical version of T 1 is T 2 = h n { ˆm(x) 0} 2 w(x) i=1

64 Hypothesis Testing Under H 0 : T 1 0 and T 2 0 for n, Under H 1 : T 1 and T 2 for n. Under H 0 : ˆm( ) does not have any bias (Th. 4.3, Speed of the convergens for NW-est.) Under H 0 : T 1 and T 2 converge to N(0, V ) where σ 4 (x) w(x) V = 2 fx 2(x) dx (K K) 2 (x) dx

65 Hypothesis Testing General hypotesis: H 0 : m(x) m θ (x) (specific parametric model) H 1 : m(x) m θ (x). Test statistics, where ˆθ is a consistent estimator. n h { ˆm(x) mˆθ (x)}2 w(x) i=1 Problem: mˆθ ( ) is asymptotically unbiased and converging at rate n. ˆm(x) has a kernel smoothing bias and converges at rate nh.

66 Hypothesis Testing Solution: Introduce an artificial bias by replacing mˆθ ( ) with n ˆmˆθ (x) = i=1 K h(x X i )mˆθ (X i) n i=1 K h(x X i ) in the test statistics T = h n { ˆm(x) ˆmˆθ (x)}2 w(x) i=1 Now the bias of ˆm( ) and ˆmˆθ ( ) cancels out and the same for the convergence rates.

67 Hypothesis Testing We reject H 0 if T is large, but what is large? Theorem 4.7: Asymptotic distribtuion of T Assume the conditions of Th. 4.3, further that ˆθ is a n-consistent estimator for θ and h = O(n 1/5 ). Then T 1 σ 2 (x)w(x) dx h K 2 2 D N ( 0, 2 σ 4 (x)w 2 (x) dx ) (K K) 2 dx

68 Hypothesis Testing Problem: Slow convergence of T towards the normal distribution. Solution: Bootstrap: Simulate the distribution of the test statistics under the hypotesis and determine the critical values based on the simulated distribution. The most popular method is wild bootstrap: Resample from the residuals ˆɛ i, i = 1,..., n under H 0. Each bootstrap residual ɛ i drawn from the same distribution as ˆɛ i up to the first three moments.

69 Hypothesis Testing Wild bootstrap: 1. Estimate mˆθ ( ) under H 0 and construct ˆɛ i = Y i mˆθ (X i), 2. For each X i draw a bootstrap residual ɛ i so that E(ɛ i ) = 0 E(ɛ i 2 ) = ˆɛ 2 i, E(ɛ i 3 ) = ˆɛ 3 i 3. Generate a bootstrap sample {Y i, X i } i=1,...,n by setting Y i = mˆθ(x i ) + ɛ i 4. From this sample, calculate the bootstrap test statistics T 5. Repeat (2.)-(4.) n boot times and use the n boot generated test statistics T to determine the quantiles of the test statistic under H 0. This gives the approximative values of the critical values for your test statistic T.

70 Multivariate Kernel Regression How does Y depends on a vector of variables, X?. We want to estimate E(Y X) = E(Y X 1,..., X d ) = m(x) where X = (X 1,..., X d ) T. We have E(Y X) = yf (y x) dy = yf (y, x) dy f X (x)

71 Multivariate Kernel Regression Multivariate Nadaraya- Watson estimator: (Local constant estimator) Replace with kernel densities ˆm H (x) = n i=1 K H(X i x)y i n i=1 K H(X i x) Weighted sum of the observed Y i.

72 Multivariate Kernel Regression Local linear estimator: The minimization problem min β 0,β 1 n i=1 { Y i β 0 β T 1 (X i x)} 2 KH (X x) Solution ˆβ = ( ˆβ 0, ˆβ T 1 ) T = (X T WX) 1 X T WY 1 (X 1 x) T where X =.., Y = Y 1 and 1 (X n x) T. Y n W = diag(k H (X 1 x),..., K H (X n x)) ˆβ 0 estimates the regression function. ˆm 1,H (x) = ˆβ 0 (x). ˆβ 1 estimates the partial derivatives wrt. the components x.

73 Multivariate Kernel Regression Statistical properties of the Nadaraya-Watson estimator: Theorem 4.8: Asymptotic bias and variance The conditional asymptotic bias and variance of the multivariate Nadaraya-Watson kernel regression estimatora are Bias( ˆm H X 1,..., X n ) µ 2 (K) m(x) T HH T f (x) f X (x) V( ˆm H X 1,..., X n ) µ 2(K)tr{H T H m (x)h} 1 σ(x) n det(h) K 2 2 f X (x) in the interior of the support of f X.

74 Multivariate Kernel Regression Statistical properties of the local linear estimator: Theorem 4.9: Asymptotic bias and variance The conditional asymptotic bias and variance of the multivariate local linear kernel regression estimator are Bias( ˆm 1,H X 1,..., X n ) 1 2 µ 2(K)tr{H T H m (x)h} V( ˆm 1,H X 1,..., X n ) in the interior of the support of f X. 1 σ(x) n det(h) K 2 2 f X (x)

75 Multivariate Kernel Regression Practical aspects: d=2: Comparison of Nadaraya-Watson and the local linear two-dimensional estimate for simulated data. 500 design points uniformly distributed in [0, 1] [0, 1] and m(x) = sin(2πx 1 ) + x 2 and error term N(0, 1 4 ). Bandwidth h 1 = h 2 = 0.3

76 Multivariate Kernel Regression Curse of dimensionality: Nonparametric regression estimators are based on the idea of local (weighted) averaging. In higher dimension the observations are sparsely distributed even for lage sample sizes, and consequently estimators based on local averaging perform unsatisfactorily. Explain the effect by looking at the AMSE where H = h I. The optimal bandwidth AMSE(n, h) = 1 nh d C 1 + h 4 C 2 h opt n 1/(4+d) The rate of convergence of the AMSE is n 4/(4+d). The speed of conv. decreases dramatically for higher dimensions d.

4 Nonparametric Regression

4 Nonparametric Regression 4 Nonparametric Regression 4.1 Univariate Kernel Regression An important question in many fields of science is the relation between two variables, say X and Y. Regression analysis is concerned with the

More information

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β

Introduction. Linear Regression. coefficient estimates for the wage equation: E(Y X) = X 1 β X d β d = X β Introduction - Introduction -2 Introduction Linear Regression E(Y X) = X β +...+X d β d = X β Example: Wage equation Y = log wages, X = schooling (measured in years), labor market experience (measured

More information

Nonparametric Density Estimation (Multidimension)

Nonparametric Density Estimation (Multidimension) Nonparametric Density Estimation (Multidimension) Härdle, Müller, Sperlich, Werwarz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann February 19, 2007 Setup One-dimensional

More information

Nonparametric Regression. Changliang Zou

Nonparametric Regression. Changliang Zou Nonparametric Regression Institute of Statistics, Nankai University Email: nk.chlzou@gmail.com Smoothing parameter selection An overall measure of how well m h (x) performs in estimating m(x) over x (0,

More information

Local Polynomial Regression

Local Polynomial Regression VI Local Polynomial Regression (1) Global polynomial regression We observe random pairs (X 1, Y 1 ),, (X n, Y n ) where (X 1, Y 1 ),, (X n, Y n ) iid (X, Y ). We want to estimate m(x) = E(Y X = x) based

More information

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd

ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation. Petra E. Todd ECON 721: Lecture Notes on Nonparametric Density and Regression Estimation Petra E. Todd Fall, 2014 2 Contents 1 Review of Stochastic Order Symbols 1 2 Nonparametric Density Estimation 3 2.1 Histogram

More information

Modelling Non-linear and Non-stationary Time Series

Modelling Non-linear and Non-stationary Time Series Modelling Non-linear and Non-stationary Time Series Chapter 2: Non-parametric methods Henrik Madsen Advanced Time Series Analysis September 206 Henrik Madsen (02427 Adv. TS Analysis) Lecture Notes September

More information

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas 0 0 5 Motivation: Regression discontinuity (Angrist&Pischke) Outcome.5 1 1.5 A. Linear E[Y 0i X i] 0.2.4.6.8 1 X Outcome.5 1 1.5 B. Nonlinear E[Y 0i X i] i 0.2.4.6.8 1 X utcome.5 1 1.5 C. Nonlinearity

More information

Nonparametric Regression

Nonparametric Regression Nonparametric Regression Econ 674 Purdue University April 8, 2009 Justin L. Tobias (Purdue) Nonparametric Regression April 8, 2009 1 / 31 Consider the univariate nonparametric regression model: where y

More information

Nonparametric Methods

Nonparametric Methods Nonparametric Methods Michael R. Roberts Department of Finance The Wharton School University of Pennsylvania July 28, 2009 Michael R. Roberts Nonparametric Methods 1/42 Overview Great for data analysis

More information

Econ 582 Nonparametric Regression

Econ 582 Nonparametric Regression Econ 582 Nonparametric Regression Eric Zivot May 28, 2013 Nonparametric Regression Sofarwehaveonlyconsideredlinearregressionmodels = x 0 β + [ x ]=0 [ x = x] =x 0 β = [ x = x] [ x = x] x = β The assume

More information

Time Series and Forecasting Lecture 4 NonLinear Time Series

Time Series and Forecasting Lecture 4 NonLinear Time Series Time Series and Forecasting Lecture 4 NonLinear Time Series Bruce E. Hansen Summer School in Economics and Econometrics University of Crete July 23-27, 2012 Bruce Hansen (University of Wisconsin) Foundations

More information

Nonparametric Regression. Badr Missaoui

Nonparametric Regression. Badr Missaoui Badr Missaoui Outline Kernel and local polynomial regression. Penalized regression. We are given n pairs of observations (X 1, Y 1 ),...,(X n, Y n ) where Y i = r(x i ) + ε i, i = 1,..., n and r(x) = E(Y

More information

41903: Introduction to Nonparametrics

41903: Introduction to Nonparametrics 41903: Notes 5 Introduction Nonparametrics fundamentally about fitting flexible models: want model that is flexible enough to accommodate important patterns but not so flexible it overspecializes to specific

More information

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao

Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics. Jiti Gao Model Specification Testing in Nonparametric and Semiparametric Time Series Econometrics Jiti Gao Department of Statistics School of Mathematics and Statistics The University of Western Australia Crawley

More information

3 Nonparametric Density Estimation

3 Nonparametric Density Estimation 3 Nonparametric Density Estimation Example: Income distribution Source: U.K. Family Expenditure Survey (FES) 1968-1995 Approximately 7000 British Households per year For each household many different variables

More information

Additive Isotonic Regression

Additive Isotonic Regression Additive Isotonic Regression Enno Mammen and Kyusang Yu 11. July 2006 INTRODUCTION: We have i.i.d. random vectors (Y 1, X 1 ),..., (Y n, X n ) with X i = (X1 i,..., X d i ) and we consider the additive

More information

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction Tine Buch-Kromann Construction X 1,..., X n iid r.v. with (unknown) density, f. Aim: Estimate the density

More information

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego

Model-free prediction intervals for regression and autoregression. Dimitris N. Politis University of California, San Diego Model-free prediction intervals for regression and autoregression Dimitris N. Politis University of California, San Diego To explain or to predict? Models are indispensable for exploring/utilizing relationships

More information

Local Polynomial Modelling and Its Applications

Local Polynomial Modelling and Its Applications Local Polynomial Modelling and Its Applications J. Fan Department of Statistics University of North Carolina Chapel Hill, USA and I. Gijbels Institute of Statistics Catholic University oflouvain Louvain-la-Neuve,

More information

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation

Preface. 1 Nonparametric Density Estimation and Testing. 1.1 Introduction. 1.2 Univariate Density Estimation Preface Nonparametric econometrics has become one of the most important sub-fields in modern econometrics. The primary goal of this lecture note is to introduce various nonparametric and semiparametric

More information

Lecture 3: Statistical Decision Theory (Part II)

Lecture 3: Statistical Decision Theory (Part II) Lecture 3: Statistical Decision Theory (Part II) Hao Helen Zhang Hao Helen Zhang Lecture 3: Statistical Decision Theory (Part II) 1 / 27 Outline of This Note Part I: Statistics Decision Theory (Classical

More information

Transformation and Smoothing in Sample Survey Data

Transformation and Smoothing in Sample Survey Data Scandinavian Journal of Statistics, Vol. 37: 496 513, 2010 doi: 10.1111/j.1467-9469.2010.00691.x Published by Blackwell Publishing Ltd. Transformation and Smoothing in Sample Survey Data YANYUAN MA Department

More information

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity Local linear multiple regression with variable bandwidth in the presence of heteroscedasticity Azhong Ye 1 Rob J Hyndman 2 Zinai Li 3 23 January 2006 Abstract: We present local linear estimator with variable

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update 3. Juni 2013) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend

More information

On the Robust Modal Local Polynomial Regression

On the Robust Modal Local Polynomial Regression International Journal of Statistical Sciences ISSN 683 5603 Vol. 9(Special Issue), 2009, pp 27-23 c 2009 Dept. of Statistics, Univ. of Rajshahi, Bangladesh On the Robust Modal Local Polynomial Regression

More information

Nonparametric Econometrics

Nonparametric Econometrics Applied Microeconometrics with Stata Nonparametric Econometrics Spring Term 2011 1 / 37 Contents Introduction The histogram estimator The kernel density estimator Nonparametric regression estimators Semi-

More information

Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design

Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design Optimal bandwidth selection for differences of nonparametric estimators with an application to the sharp regression discontinuity design Yoichi Arai Hidehiko Ichimura The Institute for Fiscal Studies Department

More information

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression

Local regression I. Patrick Breheny. November 1. Kernel weighted averages Local linear regression Local regression I Patrick Breheny November 1 Patrick Breheny STA 621: Nonparametric Statistics 1/27 Simple local models Kernel weighted averages The Nadaraya-Watson estimator Expected loss and prediction

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood

Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Spatially Smoothed Kernel Density Estimation via Generalized Empirical Likelihood Kuangyu Wen & Ximing Wu Texas A&M University Info-Metrics Institute Conference: Recent Innovations in Info-Metrics October

More information

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data

New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data ew Local Estimation Procedure for onparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li The Pennsylvania State University Technical Report Series #0-03 College of Health and Human

More information

Regression, Ridge Regression, Lasso

Regression, Ridge Regression, Lasso Regression, Ridge Regression, Lasso Fabio G. Cozman - fgcozman@usp.br October 2, 2018 A general definition Regression studies the relationship between a response variable Y and covariates X 1,..., X n.

More information

Nonparametric Modal Regression

Nonparametric Modal Regression Nonparametric Modal Regression Summary In this article, we propose a new nonparametric modal regression model, which aims to estimate the mode of the conditional density of Y given predictors X. The nonparametric

More information

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management Quantitative Economics for the Evaluation of the European Policy Dipartimento di Economia e Management Irene Brunetti 1 Davide Fiaschi 2 Angela Parenti 3 9 ottobre 2015 1 ireneb@ec.unipi.it. 2 davide.fiaschi@unipi.it.

More information

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones

Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones Introduction to machine learning and pattern recognition Lecture 2 Coryn Bailer-Jones http://www.mpia.de/homes/calj/mlpr_mpia2008.html 1 1 Last week... supervised and unsupervised methods need adaptive

More information

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods

Alternatives to Basis Expansions. Kernels in Density Estimation. Kernels and Bandwidth. Idea Behind Kernel Methods Alternatives to Basis Expansions Basis expansions require either choice of a discrete set of basis or choice of smoothing penalty and smoothing parameter Both of which impose prior beliefs on data. Alternatives

More information

Linear models and their mathematical foundations: Simple linear regression

Linear models and their mathematical foundations: Simple linear regression Linear models and their mathematical foundations: Simple linear regression Steffen Unkel Department of Medical Statistics University Medical Center Göttingen, Germany Winter term 2018/19 1/21 Introduction

More information

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE

DEPARTMENT MATHEMATIK ARBEITSBEREICH MATHEMATISCHE STATISTIK UND STOCHASTISCHE PROZESSE Estimating the error distribution in nonparametric multiple regression with applications to model testing Natalie Neumeyer & Ingrid Van Keilegom Preprint No. 2008-01 July 2008 DEPARTMENT MATHEMATIK ARBEITSBEREICH

More information

Bandwidth selection for kernel conditional density

Bandwidth selection for kernel conditional density Bandwidth selection for kernel conditional density estimation David M Bashtannyk and Rob J Hyndman 1 10 August 1999 Abstract: We consider bandwidth selection for the kernel estimator of conditional density

More information

Optimal Bandwidth Choice for the Regression Discontinuity Estimator

Optimal Bandwidth Choice for the Regression Discontinuity Estimator Optimal Bandwidth Choice for the Regression Discontinuity Estimator Guido Imbens and Karthik Kalyanaraman First Draft: June 8 This Draft: September Abstract We investigate the choice of the bandwidth for

More information

Bootstrap of residual processes in regression: to smooth or not to smooth?

Bootstrap of residual processes in regression: to smooth or not to smooth? Bootstrap of residual processes in regression: to smooth or not to smooth? arxiv:1712.02685v1 [math.st] 7 Dec 2017 Natalie Neumeyer Ingrid Van Keilegom December 8, 2017 Abstract In this paper we consider

More information

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i

NADARAYA WATSON ESTIMATE JAN 10, 2006: version 2. Y ik ( x i NADARAYA WATSON ESTIMATE JAN 0, 2006: version 2 DATA: (x i, Y i, i =,..., n. ESTIMATE E(Y x = m(x by n i= ˆm (x = Y ik ( x i x n i= K ( x i x EXAMPLES OF K: K(u = I{ u c} (uniform or box kernel K(u = u

More information

STA 2201/442 Assignment 2

STA 2201/442 Assignment 2 STA 2201/442 Assignment 2 1. This is about how to simulate from a continuous univariate distribution. Let the random variable X have a continuous distribution with density f X (x) and cumulative distribution

More information

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model.

Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model. Minimax Rate of Convergence for an Estimator of the Functional Component in a Semiparametric Multivariate Partially Linear Model By Michael Levine Purdue University Technical Report #14-03 Department of

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/100 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression

More information

Introduction to Regression

Introduction to Regression Introduction to Regression p. 1/97 Introduction to Regression Chad Schafer cschafer@stat.cmu.edu Carnegie Mellon University Introduction to Regression p. 1/97 Acknowledgement Larry Wasserman, All of Nonparametric

More information

Defect Detection using Nonparametric Regression

Defect Detection using Nonparametric Regression Defect Detection using Nonparametric Regression Siana Halim Industrial Engineering Department-Petra Christian University Siwalankerto 121-131 Surabaya- Indonesia halim@petra.ac.id Abstract: To compare

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Convergence rates for uniform confidence intervals based on local polynomial regression estimators

Convergence rates for uniform confidence intervals based on local polynomial regression estimators Journal of Nonparametric Statistics ISSN: 1048-5252 Print) 1029-0311 Online) Journal homepage: http://www.tandfonline.com/loi/gnst20 Convergence rates for uniform confidence intervals based on local polynomial

More information

Goodness-of-fit tests for the cure rate in a mixture cure model

Goodness-of-fit tests for the cure rate in a mixture cure model Biometrika (217), 13, 1, pp. 1 7 Printed in Great Britain Advance Access publication on 31 July 216 Goodness-of-fit tests for the cure rate in a mixture cure model BY U.U. MÜLLER Department of Statistics,

More information

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1

36. Multisample U-statistics and jointly distributed U-statistics Lehmann 6.1 36. Multisample U-statistics jointly distributed U-statistics Lehmann 6.1 In this topic, we generalize the idea of U-statistics in two different directions. First, we consider single U-statistics for situations

More information

12 - Nonparametric Density Estimation

12 - Nonparametric Density Estimation ST 697 Fall 2017 1/49 12 - Nonparametric Density Estimation ST 697 Fall 2017 University of Alabama Density Review ST 697 Fall 2017 2/49 Continuous Random Variables ST 697 Fall 2017 3/49 1.0 0.8 F(x) 0.6

More information

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University

CURRENT STATUS LINEAR REGRESSION. By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University CURRENT STATUS LINEAR REGRESSION By Piet Groeneboom and Kim Hendrickx Delft University of Technology and Hasselt University We construct n-consistent and asymptotically normal estimates for the finite

More information

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model

Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model Some Theories about Backfitting Algorithm for Varying Coefficient Partially Linear Model 1. Introduction Varying-coefficient partially linear model (Zhang, Lee, and Song, 2002; Xia, Zhang, and Tong, 2004;

More information

Introduction to Regression

Introduction to Regression Introduction to Regression David E Jones (slides mostly by Chad M Schafer) June 1, 2016 1 / 102 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures

More information

Function of Longitudinal Data

Function of Longitudinal Data New Local Estimation Procedure for Nonparametric Regression Function of Longitudinal Data Weixin Yao and Runze Li Abstract This paper develops a new estimation of nonparametric regression functions for

More information

Penalized Splines, Mixed Models, and Recent Large-Sample Results

Penalized Splines, Mixed Models, and Recent Large-Sample Results Penalized Splines, Mixed Models, and Recent Large-Sample Results David Ruppert Operations Research & Information Engineering, Cornell University Feb 4, 2011 Collaborators Matt Wand, University of Wollongong

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Supervised Learning: Regression I Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com Some of the

More information

Chapter 7: Model Assessment and Selection

Chapter 7: Model Assessment and Selection Chapter 7: Model Assessment and Selection DD3364 April 20, 2012 Introduction Regression: Review of our problem Have target variable Y to estimate from a vector of inputs X. A prediction model ˆf(X) has

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University JSM, 2015 E. Christou, M. G. Akritas (PSU) SIQR JSM, 2015

More information

Test for Discontinuities in Nonparametric Regression

Test for Discontinuities in Nonparametric Regression Communications of the Korean Statistical Society Vol. 15, No. 5, 2008, pp. 709 717 Test for Discontinuities in Nonparametric Regression Dongryeon Park 1) Abstract The difference of two one-sided kernel

More information

Introduction to Regression

Introduction to Regression Introduction to Regression Chad M. Schafer May 20, 2015 Outline General Concepts of Regression, Bias-Variance Tradeoff Linear Regression Nonparametric Procedures Cross Validation Local Polynomial Regression

More information

Local Modal Regression

Local Modal Regression Local Modal Regression Weixin Yao Department of Statistics, Kansas State University, Manhattan, Kansas 66506, U.S.A. wxyao@ksu.edu Bruce G. Lindsay and Runze Li Department of Statistics, The Pennsylvania

More information

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators

Estimation theory. Parametric estimation. Properties of estimators. Minimum variance estimator. Cramer-Rao bound. Maximum likelihood estimators Estimation theory Parametric estimation Properties of estimators Minimum variance estimator Cramer-Rao bound Maximum likelihood estimators Confidence intervals Bayesian estimation 1 Random Variables Let

More information

Day 3B Nonparametrics and Bootstrap

Day 3B Nonparametrics and Bootstrap Day 3B Nonparametrics and Bootstrap c A. Colin Cameron Univ. of Calif.- Davis Frontiers in Econometrics Bavarian Graduate Program in Economics. Based on A. Colin Cameron and Pravin K. Trivedi (2009,2010),

More information

Single Index Quantile Regression for Heteroscedastic Data

Single Index Quantile Regression for Heteroscedastic Data Single Index Quantile Regression for Heteroscedastic Data E. Christou M. G. Akritas Department of Statistics The Pennsylvania State University SMAC, November 6, 2015 E. Christou, M. G. Akritas (PSU) SIQR

More information

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density

Bayesian estimation of bandwidths for a nonparametric regression model with a flexible error density ISSN 1440-771X Australia Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Bayesian estimation of bandwidths for a nonparametric regression model

More information

UNIVERSITÄT POTSDAM Institut für Mathematik

UNIVERSITÄT POTSDAM Institut für Mathematik UNIVERSITÄT POTSDAM Institut für Mathematik Testing the Acceleration Function in Life Time Models Hannelore Liero Matthias Liero Mathematische Statistik und Wahrscheinlichkeitstheorie Universität Potsdam

More information

A nonparametric method of multi-step ahead forecasting in diffusion processes

A nonparametric method of multi-step ahead forecasting in diffusion processes A nonparametric method of multi-step ahead forecasting in diffusion processes Mariko Yamamura a, Isao Shoji b a School of Pharmacy, Kitasato University, Minato-ku, Tokyo, 108-8641, Japan. b Graduate School

More information

Semiparametric modeling and estimation of the dispersion function in regression

Semiparametric modeling and estimation of the dispersion function in regression Semiparametric modeling and estimation of the dispersion function in regression Ingrid Van Keilegom Lan Wang September 4, 2008 Abstract Modeling heteroscedasticity in semiparametric regression can improve

More information

Log-Density Estimation with Application to Approximate Likelihood Inference

Log-Density Estimation with Application to Approximate Likelihood Inference Log-Density Estimation with Application to Approximate Likelihood Inference Martin Hazelton 1 Institute of Fundamental Sciences Massey University 19 November 2015 1 Email: m.hazelton@massey.ac.nz WWPMS,

More information

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1 B Technical details B.1 Variance of ˆf dec in the ersmooth case We derive the variance in the case where ϕ K (t) = ( 1 t 2) κ I( 1 t 1), i.e. we take K as

More information

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data

Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data Computational treatment of the error distribution in nonparametric regression with right-censored and selection-biased data Géraldine Laurent 1 and Cédric Heuchenne 2 1 QuantOM, HEC-Management School of

More information

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester

Physics 403. Segev BenZvi. Parameter Estimation, Correlations, and Error Bars. Department of Physics and Astronomy University of Rochester Physics 403 Parameter Estimation, Correlations, and Error Bars Segev BenZvi Department of Physics and Astronomy University of Rochester Table of Contents 1 Review of Last Class Best Estimates and Reliability

More information

Estimation of cumulative distribution function with spline functions

Estimation of cumulative distribution function with spline functions INTERNATIONAL JOURNAL OF ECONOMICS AND STATISTICS Volume 5, 017 Estimation of cumulative distribution function with functions Akhlitdin Nizamitdinov, Aladdin Shamilov Abstract The estimation of the cumulative

More information

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix

Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Nonparametric Identification of a Binary Random Factor in Cross Section Data - Supplemental Appendix Yingying Dong and Arthur Lewbel California State University Fullerton and Boston College July 2010 Abstract

More information

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures

9. Model Selection. statistical models. overview of model selection. information criteria. goodness-of-fit measures FE661 - Statistical Methods for Financial Engineering 9. Model Selection Jitkomut Songsiri statistical models overview of model selection information criteria goodness-of-fit measures 9-1 Statistical models

More information

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract Journal of Data Science,17(1). P. 145-160,2019 DOI:10.6339/JDS.201901_17(1).0007 WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION Wei Xiong *, Maozai Tian 2 1 School of Statistics, University of

More information

Efficient estimation of a semiparametric dynamic copula model

Efficient estimation of a semiparametric dynamic copula model Efficient estimation of a semiparametric dynamic copula model Christian Hafner Olga Reznikova Institute of Statistics Université catholique de Louvain Louvain-la-Neuve, Blgium 30 January 2009 Young Researchers

More information

Local Polynomial Regression with Application to Sea Surface Temperatures

Local Polynomial Regression with Application to Sea Surface Temperatures Clemson University TigerPrints All Theses Theses 8-2012 Local Polynomial Regression with Application to Sea Surface Temperatures Michael Finney Clemson University, finney2@clemson.edu Follow this and additional

More information

Optimal global rates of convergence for interpolation problems with random design

Optimal global rates of convergence for interpolation problems with random design Optimal global rates of convergence for interpolation problems with random design Michael Kohler 1 and Adam Krzyżak 2, 1 Fachbereich Mathematik, Technische Universität Darmstadt, Schlossgartenstr. 7, 64289

More information

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity

Local linear multivariate. regression with variable. bandwidth in the presence of. heteroscedasticity Model ISSN 1440-771X Department of Econometrics and Business Statistics http://www.buseco.monash.edu.au/depts/ebs/pubs/wpapers/ Local linear multivariate regression with variable bandwidth in the presence

More information

Math 494: Mathematical Statistics

Math 494: Mathematical Statistics Math 494: Mathematical Statistics Instructor: Jimin Ding jmding@wustl.edu Department of Mathematics Washington University in St. Louis Class materials are available on course website (www.math.wustl.edu/

More information

Linear Models and Estimation by Least Squares

Linear Models and Estimation by Least Squares Linear Models and Estimation by Least Squares Jin-Lung Lin 1 Introduction Causal relation investigation lies in the heart of economics. Effect (Dependent variable) cause (Independent variable) Example:

More information

ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS

ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS ERROR VARIANCE ESTIMATION IN NONPARAMETRIC REGRESSION MODELS By YOUSEF FAYZ M ALHARBI A thesis submitted to The University of Birmingham for the Degree of DOCTOR OF PHILOSOPHY School of Mathematics The

More information

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA

DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Statistica Sinica 18(2008), 515-534 DESIGN-ADAPTIVE MINIMAX LOCAL LINEAR REGRESSION FOR LONGITUDINAL/CLUSTERED DATA Kani Chen 1, Jianqing Fan 2 and Zhezhen Jin 3 1 Hong Kong University of Science and Technology,

More information

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A.

Fall 2017 STAT 532 Homework Peter Hoff. 1. Let P be a probability measure on a collection of sets A. 1. Let P be a probability measure on a collection of sets A. (a) For each n N, let H n be a set in A such that H n H n+1. Show that P (H n ) monotonically converges to P ( k=1 H k) as n. (b) For each n

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Stat 5101 Lecture Notes

Stat 5101 Lecture Notes Stat 5101 Lecture Notes Charles J. Geyer Copyright 1998, 1999, 2000, 2001 by Charles J. Geyer May 7, 2001 ii Stat 5101 (Geyer) Course Notes Contents 1 Random Variables and Change of Variables 1 1.1 Random

More information

Ultra High Dimensional Variable Selection with Endogenous Variables

Ultra High Dimensional Variable Selection with Endogenous Variables 1 / 39 Ultra High Dimensional Variable Selection with Endogenous Variables Yuan Liao Princeton University Joint work with Jianqing Fan Job Market Talk January, 2012 2 / 39 Outline 1 Examples of Ultra High

More information

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo

Sparse Nonparametric Density Estimation in High Dimensions Using the Rodeo Outline in High Dimensions Using the Rodeo Han Liu 1,2 John Lafferty 2,3 Larry Wasserman 1,2 1 Statistics Department, 2 Machine Learning Department, 3 Computer Science Department, Carnegie Mellon University

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)

More information

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University Chap 1. Overview of Statistical Learning (HTF, 2.1-2.6, 2.9) Yongdai Kim Seoul National University 0. Learning vs Statistical learning Learning procedure Construct a claim by observing data or using logics

More information

9. Robust regression

9. Robust regression 9. Robust regression Least squares regression........................................................ 2 Problems with LS regression..................................................... 3 Robust regression............................................................

More information

Motivational Example

Motivational Example Motivational Example Data: Observational longitudinal study of obesity from birth to adulthood. Overall Goal: Build age-, gender-, height-specific growth charts (under 3 year) to diagnose growth abnomalities.

More information

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors

Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Nonparametric Estimation of Regression Functions In the Presence of Irrelevant Regressors Peter Hall, Qi Li, Jeff Racine 1 Introduction Nonparametric techniques robust to functional form specification.

More information

Independent and conditionally independent counterfactual distributions

Independent and conditionally independent counterfactual distributions Independent and conditionally independent counterfactual distributions Marcin Wolski European Investment Bank M.Wolski@eib.org Society for Nonlinear Dynamics and Econometrics Tokyo March 19, 2018 Views

More information

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University

Integrated Likelihood Estimation in Semiparametric Regression Models. Thomas A. Severini Department of Statistics Northwestern University Integrated Likelihood Estimation in Semiparametric Regression Models Thomas A. Severini Department of Statistics Northwestern University Joint work with Heping He, University of York Introduction Let Y

More information

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend

Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend Week 5 Quantitative Analysis of Financial Markets Modeling and Forecasting Trend Christopher Ting http://www.mysmu.edu/faculty/christophert/ Christopher Ting : christopherting@smu.edu.sg : 6828 0364 :

More information