A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation

A Jump-Preserving Curve Fitting Procedure Based On Local Piecewise-Linear Kernel Estimation Peiua Qiu Scool of Statistics University of Minnesota 313 Ford Hall 224 Curc St SE Minneapolis, MN 55455 Abstract It is known tat te fitted regression function based on conventional local smooting procedures is not statistically consistent at jump positions of te true regression function In tis article, a curve-fitting procedure based on local piecewise-linear kernel estimation is suggested In a neigborood of a given point, a piecewise-linear function wit a possible jump at te given point is fitted by te weigted least squares procedure wit te weigts determined by a kernel function Te fitted value of te regression function at tis point is ten defined by one of te two estimators provided by te two fitted lines (te left and rigt lines) wit te smaller value of te weigted residual sum of squares It is proved tat te fitted curve by tis procedure is consistent in te entire design space In oter words, tis procedure is jump-preserving Several numerical examples are presented to evaluate its performance in small-to-moderate sample size cases Key Words: Jump-preserving curve fitting; Local piecewise-linear kernel estimation; Local smooting; Nonparametric regression; Strong consistency 1 Introduction Regression analysis provides a tool to build functional relationsips between dependent and independent variables In some applications, regression models wit jumps in te regression functions appear to be more appropriate to describe te data For example, it was confirmed by several statisticians tat te annual volume of te Nile river ad a jump around year 1899 (Cobb, 1978) 1

Te December sea-level pressure in Bombay India was found to ave a jump discontinuity around year 1960 (Sea at al 1994) Some pysiological parameters can likewise jump after pysical or cemical socks As an example, te percentage of time a rat in rapid-eye-movement state in eac five-minute interval will most probably ave an abrupt cange after te ligting condition is suddenly canged (Qiu et al 1999) Te objective of tis article is to provide a metodology to fit regression curves wit jumps preserved Suppose tat te regression model concerned is y i = f(x i ) + ɛ i, for i = 1, 2,, n, (11) were 0 < x 1 < x 2 < < x n < 1 are design points, ɛ i are iid random errors wit mean 0 and variance σ 2 Te regression function f( ) is continuous in [0, 1] except at positions 0 < s 1 < s 2 < < s m < 1 were f( ) as jumps wit magnitudes d j 0 for j = 1, 2,, m Figure 11 below presents a case wen m = 2 It is known tat te fitted curve by te conventional local smooting procedures is not statistically consistent at positions were f( ) as jumps For example, te local linear kernel smooter is based on te following minimization procedure (cf Fan and Gijbels 1996): min a 0,a 1 {y i [a 0 + a 1(x i x)]} 2 K( x i x ), (12) were K( ) is a kernel function wit support [ 1/2, 1/2] and is a bandwidt parameter Ten te solution of (12) for a 0 is defined as te local linear kernel estimator of f(x) In Figure 11, te solid curve denotes te true regression function It as two jumps at x = 03 and x = 07 Te dased curve denotes te conventional fit by te local linear kernel smooting procedure It can be seen tat blurring is present in te curve fitting around te two jumps As a comparison, te fitted curve by te procedure suggested in tis paper is represented by te dotted curve Te two jumps are preserved well by our procedure More explanation of tis plot is given in Section 4 A major reason for te local linear kernel smooting procedure (12) not to preserve jumps is tat it uses a local continuous function (a linear function) to approximate te true regression function in a neigborood of a given point x even if tere is a jump at x overcome tis limitation is to fit a local piecewise-linear function at x as follows: A natural idea to 2

y 00 05 10 15 20 25 30 true curve our fit conventional fit 00 02 04 06 08 10 Figure 11: Small dots denote noisy data Te solid curve represents te true regression model Te dased and dotted curves denote te conventional fit by te local linear kernel smooting procedure and te fit by te procedure suggested in tis paper x min a l,0,a l,1 ;a r,0,a r,1 {y i [a l,0 + a l,1 (x i x)] [(a r,0 a l,0 )I(x i x) + (a r,1 a l,1 )(x i x)i(x i x)]} 2 K( x i x ), (13) were I( ) is an indicator function defined by I(a) = 1 if a 0 and = 0 oterwise Te minimization procedure (13) fits a piecewise-linear function a l,0 + a l,1 (u x) + (a r,0 a l,0 )I(u x) + (a r,1 a l,1 )(u x)i(u x) in u [x /2, x + /2] wit a possible jump at x Tis is equivalent to fitting two different lines a l,0 + a l,1 (u x) and a r,0 + a r,1 (u x) in [x /2, x) and [x, x + /2], respectively Let {â l,j (x), â r,j (x), j = 0, 1} denote te solution of (13) Ten â l,0 (x) and â r,0 (x) are estimated from observations in [x /2, x) and [x, x + /2], respectively Tus tey are good estimators of f (x) and f + (x), te left and rigt limits of f( ) at x, in te case wen x is a jump point Wen tere is no jump in [x /2, x + /2], bot of tem estimate f(x) well In te case wen x itself is not a jump point but a jump point exists in its neigborood [x /2, x + /2], only one of â l,0 (x) and â r,0 (x) provides a good estimator of f(x) Terefore we need to coose one of tem as an estimator of f(x) in suc case By combining all tese considerations, we define f(x) = â l,0 (x)i (RSS r (x) RSS l (x)) + â r,0 (x)i (RSS l (x) RSS r (x)) (14) as an estimator of f(x) for x [ /2, 1 /2], were I (a) is defined by I (a) = 1 if a > 0, 1/2 if a = 0 and 0 if a < 0; RSS l (x) and RSS r (x) are te weigted residual sums of squares (RSS) wit 3

respect to observations in [x /2, x) and [x, x + /2], respectively Tat is, RSS l (x) = x i <x RSS r (x) = x i x {y i â l,0 (x) â l,1 (x)(x i x)} 2 K( x i x ); {y i â r,0 (x) â r,1 (x)(x i x)} 2 K( x i x ) Basically f(x) is defined by one of â l,0 (x) and â r,0 (x) wit te smaller RSS value In te literature, tere are several existing procedures to fit regression curves wit jumps preserved McDonald and Owen (1996) proposed an algoritm based on tree local ordinary least squares estimates of te regression function, corresponding to te observations on te rigt, left and bot sides of a given point, respectively Tey ten constructed teir split linear fit as a weigted average of tese tree estimates, wit weigts determined by te goodness-of-fit values of te estimates Hall and Titterington (1992) suggested an alternative but simpler metod by establising some relations among tree local linear smooters and using tem to detect te jumps Te regression curve was ten fitted as usual in regions separated by te detected jumps Our procedure is different from tese two procedures in tat we put te problem to fit regression curves wit jumps preserved in te same framework as tat of local linear kernel estimation except tat a local piecewise-linear function is fitted at a given point in our procedure, making te curve estimator (14) easier to use Most oter jump-preserving curve fitting procedures in te literature consist of two steps: (i) detecting possible jumps under te assumption tat te number of jumps is known (it is often assumed to be 1) and (ii) fitting te regression curve as usual in design subintervals separated by te detected jump points Various jump detectors are based on one-sided constant kernel smooting (Müller 1992, Qiu et al 1991, Wu and Cu 1993), one-sided linear kernel smooting (Loader 1996), local least squares estimation (Qiu and Yandell 1998), wavelet transformation (Wang 1995), semiparametric modeling (Eubank and Speckman 1994) and smooting spline modeling (Koo 1997, Siau et al 1986) Te case wen te number of jumps is unknown is considered by several autors including Qiu (1994) and Wu and Cu (1993) Tey first estimated te number of jumps and jump positions by performing a series of ypotesis tests and ten fitted te regression curve in subintervals separated by te detected jump points Comparing wit te above mentioned metods, te metod presented in tis paper automatically accommodates te jumps in fitting te regression curve witout knowing te number of jumps and witout performing any ypotesis tests 4

Tis paper is organized as follows In next section, we discuss te jump-preserving curve fitting procedure (14) in some detail Properties of te fitted curve are discussed in Section 3 In Section 4, we present some numerical examples concerning te goodness-of-fit and bandwidt selection Te procedure is applied to a real-life dataset in Section 5 Section 6 contains some concluding remarks 2 Te Jump-Preserving Curve Fitting Procedure First we notice tat te minimization procedure (13) is equivalent to te combination of: and min a r,0,a r,1 min a l,0,a l,1 {y i a l,0 a l,1 (x i x)} 2 K l ( x i x ) (21) {y i a r,0 I(x i x) a r,1 (x i x)i(x i x)} 2 K r ( x i x ), (22) were K l ( ) is defined by K l (x) = K(x) if x [ 1/2, 0) and 0 oterwise and K r ( ) is defined by K r (x) = K(x) if x [0, 1/2] and 0 oterwise Clearly, (21) is equivalent to te local linear kernel smooting procedure to fit f (x) by te observations in [x /2, x), te left alf of [x /2, x + /2], and (22) is equivalent to te local linear kernel smooting procedure to fit f + (x) by te observations in [x, x + /2], te rigt alf of [x /2, x + /2] Te subscripts l and r in notations {a l,j, a r,j, j = 0, 1}, K l ( ) and K r ( ) represent left and rigt, respectively, wic are also used in oter notation defined below Solutions of (21) and (22) can be written as: â l,0 (x) = â l,1 (x) = â r,0 (x) = â r,1 (x) = y i K l ( x i x ) w l,2 w l,1 (x i x) w l,0 w l,2 w 2 l,1 y i K l ( x i x ) w l,0(x i x) w l,1 w l,0 w l,2 w 2 l,1 y i K r ( x i x ) w r,2 w r,1 (x i x) w r,0 w r,2 w 2 r,1 y i K r ( x i x ) w r,0(x i x) w r,1 w r,0 w r,2 w 2 r,1 were w l,j = n K l ( x i x )(x i x) j and w r,j = n K r ( x i x )(x i x) j for j = 0, 1, 2 5

Figure 21 presents â l,0 ( ), â r,0 ( ) and f( ) by te dotted, dased and solid curves in te case of Figure 11 except tat te noise in te data as been ignored by setting σ = 0 It can be seen tat blurring occurs in [x 0, x 0 + /2] if â l,0 ( ) is used to fit f( ) and point x 0 is a jump point Similarly, blurring occurs in [x 0 /2, x 0 ) if â r,0 ( ) is used to fit f( ) and point x 0 is a jump point Our estimator f( ), owever, can preserve te jumps well because f( ) is defined as â l,0 ( ) in [x 0 /2, x 0 ) and as â r,0 ( ) in [x 0, x 0 + /2] wen x 0 is a jump point y 00 05 10 15 20 25 30 fat aat_{l,0} aat_{r,0} 00 02 04 06 08 10 Figure 21: Te dotted, dased and solid curves denote â l,0 ( ), â r,0 ( ) and f( ) in te case of Figure 11 except tat te noise in data as been ignored by setting σ = 0 x Wen x is in boundary regions [0, /2) and (1 /2, 1], estimator of f(x) is not defined by (14) In suc case tere are several possible approaces to estimate f(x) if no jumps exist in [0, ) and (1, 1] For example, f(x) could be defined by te conventional local linear kernel estimator constructed from observations in [0, x + /2] or [x /2, 1] depending on weter x [0, /2) or x (1 /2, 1] In te following sections, we define f(x) = â r,0 (x) wen x [0, /2) and f(x) = â l,0 (x) wen x (1 /2, 1] for simplicity If tere are jump points in [0, ) (or (1, 1]), owever, estimation of f(x) in boundary region [0, /2) (or (1 /2, 1]) is still an open problem In te literature, tere are several existing data-driven bandwidt selection procedures suc as te plug-in procedures, te cross-validation procedure, te Mellow s C p criterion and te Akaike s information criterion (cf eg, Cu and Marron 1991; Loader 1999) Since te exact expressions for te mean and variance of te jump-preserving estimator f( ) in (14) are not available at tis moment, te plug-in procedures are not considered ere In te numerical examples presented in Sections 4 and 5, we determine te bandwidt by te cross-validation procedure Tat is, te 6

optimal is cosen by minimizing te following cross-validation criterion: CV ( ) = 1 n ( y i f i (x i )) 2, (23) were f i (x) is te leave-1-out estimator of f(x) wit bandwidt Namely, te observation (x i, y i ) is left out in constructing f i (x), for i = 1, 2,, n A numerical example in Section 4 sows tat te cosen bandwidt based upon (23) performs well 3 Strong Consistency Te conventional local smooting estimators of f( ) suc as te one from (12) are not statistically consistent at jump positions In tis section we establis te almost sure consistency of te jumppreserving estimator f( ) wic says tat f( ) converges almost surely to te true regression function in te entire design space [0, 1] under some regularity conditions Tat is, f( ) is jump-preserving First we ave te following result for â l,0 ( ) and â r,0 ( ) Teorem 31 Suppose tat f( ) as a continuous second-order derivative in [0, 1]; max 1 i n+1 (x i x i 1 ) = O(1/n) were x 0 = 0 and x n+1 = 1; te kernel function K( ) is Lipscitz (1) continuous; te bandwidt = O(n 1/5 ) Ten were g [a,b] denotes max a x b g(x) log n log log n â l,0 f [n/2,1] = o(1), as (31) log n log log n â r,0 f [0,1 n/2] = o(1), as (32) Teorem 31 establises te almost sure uniform consistency of â l,0 ( ) and â r,0 ( ) wen f( ) is continuous in te design space [0, 1] Its proof is given in Appendix A Wen f( ) as jumps in [0, 1] as specified by model (11), Teorem 31 also gives almost sure consistency of f( ) in continuous regions D 1 := [0, 1]\ m j=1 (s j /2, s j + /2) since f f D1 max( â l,0 f D1, â r,0 f D1 ) by (14) In te neigborood of jump points D 2 := m j=1 (s j /2, s j + /2), we ave te following result Teorem 32 Suppose tat x is a given point in (0, 1); max 1 i n+1 (x i x i 1 ) = O(1/n) were x 0 = 0 and x n+1 = 1; te kernel function K( ) is Lipscitz (1) continuous; lim n = 0 and 7

lim n n = If f( ) as a continuous first-order derivative in [x, x + /2], ten RSS r (x) = v r,0 σ 2 n + o(n ), as (33) If f( ) as a jump in [x, x + /2] at x := x + wit magnitude d were 0 1/2 and f( ) as a continuous first-order derivative in [x, x + /2] except at x at wic f( ) as a rigt (wen = 0) or left (wen = 1/2) or bot (wen 0 < < 1/2) first-order derivatives f + (x ) and f (x ), ten RSS r (x) = (v r,0 σ 2 + d 2 C2 )n + o(n ), as, (34) were C 2 = 1 (v r,0 v r,2 v 2 r,1 )2 1 (v r,0 v r,2 v 2 r,1 )2 [ 0 [ 1/2 0 2 (v r,2 v r,1 x)k r (x)dx + u (v r,0 x v r,1 )K r (x)dx] K r (u)du + 2 (v r,2 v r,1 x)k r (x)dx u (v r,0 x v r,1 )K r (x)dx] K r (u)du and v r,j = 0 x j K r (x)dx for j = 0, 1, 2 Similar results could be derived for RSS l (x) It can be cecked tat C 2 is positive wen (0, 1/2) and 0 wen = 0 or 1/2 If te kernel function K( ) is cosen to be te Epanecnikov function defined by K(x) = 15(1 4x 2 ) wen x [ 1/2, 1/2] and 0 oterwise (cf Section 326, Fan and Gijbels 1996), ten C 2 as a function of is displayed in Figure 31 00 0010 0020 0030 00 01 02 03 04 05 tau Figure 31: C 2 as a function of wen K( ) is cosen to be te Epanecnikov function By (33) and (34), if tere is a jump in [x /2, x + /2], a neigborood of a given point x, and tis jump point is located on te rigt side of x, ten RSS l (x) < RSS r (x), as, wen n is 8

large enoug Consequently, f(x) = â l,0 (x), as, wen n is large enoug On te oter and, if te jump point is located on te left side of x, ten RSS l (x) > RSS r (x), as, and f(x) = â r,0 (x), as, wen n is large enoug By combining tis fact and (31)-(32) in Teorem 31, we ave te following results Teorem 33 Suppose tat f( ) as a continuous second-order derivative in [0, 1] except at te jump positions {s j, j = 1, 2,, m} were f( ) as left and rigt second-order derivatives; max 1 i n+1 (x i x i 1 ) = O(1/n) were x 0 = 0 and x n+1 = 1; te kernel function K( ) is Lipscitz (1) continuous; and te bandwidt = O(n 1/5 ) Ten (i) log n log log n f f D1 = o(1), as; (ii) for eac x D 2, log n log log n ( f(x) f(x)) = o(1), as; (iii) for any small number 0 < δ < 1/4, log n log log n f f D2,δ = o(1), as, were D 2,δ = m j=1 {[s j (1/2 δ), s j δ ] [s j + δ, s j + (1/2 δ) ]} Teorem 33 says tat f( ) is uniformly consistent in continuous regions D 1 wit rate o(n 2/5 log n log log n) In te neigborood of jump points, it is consistent pointwise wit te same rate Because C 2 as a positive lower bound wen [δ, 1/2 δ] for any given number 0 < δ < 1/4, f( ) is also uniformly consistent wit rate o(n 2/5 log n log log n) in D 2,δ wic equals to D 2 \D δ were D δ = m j=1 [(s j /2, s j (1/2 δ) ) (s j δ, s j + δ ) (s j + (1/2 δ), s j + /2)] 4 Simulation Study We present some simulation results regarding bandwidt selection and te numerical performance of te jump-preserving curve fitting procedure (14) in tis section Let us revisit te example of Figure 11 first Te true regression function in tis example is f(x) = 3x + 2 wen x [0, 03); f(x) = 3x+3 sin((x 03)π/02) wen x [03, 07); and f(x) = 05x+155 wen x [07, 1] It 9

as two jump points at x = 03 and x = 07 Bot jump magnitudes are equal to 1 Observations are generated from model (11) wit ɛ i N(0, σ 2 ) for i = 1, 2,, n Te bandwidt used in procedure (14) is assumed to ave te form = k/n, were k is an odd integer, for convenience Witout confusion, k is sometimes called te bandwidt in tis section Figure 41 presents te MSE values of te fitted curve by te jump-preserving procedure (14) wit several k values wen n = 200 and σ = 02 To remove some randomness in te results, all MSE values presented in tis section are actually averages of 1000 replications It can be seen from te plot tat te MSE value first decreases and ten increases wen k increases Te bandwidt k works as a tuning parameter to balance underfit and overfit as in te conventional local smooting procedures Te best bandwidt in tis case is k = 29 wic makes te MSE reac te minimum Te dotted curve in Figure 11 sows one realization of te fitted curve wit te best bandwidt k = 29 Te dased curve sows te conventional local linear kernel estimator wit te same bandwidt MSE 0020 0024 0028 20 30 40 50 Figure 41: MSE values of te fitted curve by te jump-preserving procedure (14) wit several k values wen n = 200 and σ = 02 k We ten perform simulations wit several different n and σ values Te optimal bandwidts and te corresponding MSE values are presented in Figures 42(a) and 42(c), respectively From te plots, it can be seen tat (1) te optimal k increases wen sample size n increases or σ increases and (2) te corresponding MSE value decreases wen n increases or σ decreases Te first finding suggests tat te bandwidt sould be cosen larger wen te sample size is larger or te data is noisier, wic is intuitively reasonable Te second finding migt reflect te consistency of te fitted curve 10

k 0 50 100 150 sigma=005 sigma=01 sigma=02 sigma=05 k 0 50 100 150 sigma=005 sigma=01 sigma=02 sigma=05 200 400 600 800 1000 200 400 600 800 1000 sample size (a) sample size (b) MSE 00 005 010 015 sigma=005 sigma=01 sigma=02 sigma=05 CV 00 01 02 03 04 sigma=005 sigma=01 sigma=02 sigma=05 200 400 600 800 1000 200 400 600 800 1000 sample size (c) sample size (d) Figure 42: (a) Te optimal bandwidts by te MSE criterion; (b) te optimal bandwidts by te CV criterion; (c) te corresponding MSE values wen te bandwidts in plot (a) are used; (d) te corresponding CV values wen te bandwidts in plot (b) are used As a comparison, te optimal bandwidts by te cross-validation procedure are presented in Figure 42(b) Te corresponding CV values (defined by equation (23)) are sown in Figure 42(d) By comparing Figures 42(a) and 42(b), it can be seen tat bandwidts selected by te cross-validation procedure are close to te optimal bandwidts based on te MSE criterion From Figure 11, it can be seen tat bluring occurs around te jump points if f( ) is estimated by te conventional local linear kernel estimator Te jump-preserving estimator (14) preserves te jumps quite well, wic is furter confirmed by Figure 43 In Figure 43(a), te solid curve denotes te true regression model, te dotted curve denotes te averaged estimator by te jumppreserving procedure wic is calculated from 1000 replications Te lower and upper dased 11

curves represent te 25 and 975 percentiles of tese 1000 replications We can see tat te two sarp jumps are preserved well by te procedure (14) As a comparison, te averaged estimator and te corresponding percentiles by te conventional local linear kernel smooting procedure wit te same bandwidt are presented in Figure 43(b) It can be seen tat te two jumps are blurred 00 05 10 15 20 25 30 00 05 10 15 20 25 30 00 02 04 06 08 10 x (a) 00 02 04 06 08 10 x (b) Figure 43: Te solid curve denotes te true regression model, te dotted curve denotes te averaged estimator wic is calculated from 1000 replications Te lower and upper dased curves represent te 25 and 975 percentiles of tese 1000 replications (a) Results from te jump-preserving procedure (14); (b) results from te conventional local linear kernel smooting procedure 5 An Application In tis section, we apply te jump-preserving curve fitting procedure (14) to a sea-level pressure dataset In Figure 51, small dots denote te December sea-level pressures during 1921-1992 observed by te Bombay weater station in India Meteorologists (cf Sea et al 1994) noticed a jump around year 1960 in tis dataset and te existence of tis jump was confirmed by Qiu and Yandell (1998) wit teir local polynomial jump detection algoritm In Figure 51, te solid curve denotes te fitted regression curve by our jump-preserving curve fitting procedure (14) In te procedure, te bandwidt is cosen to be k = 25 wic is determined by te cross-validation procedure (23) As indicated by te plot, te jump around year 1960 is preserved well by our procedure 12

PRESSURE 1011 1013 1015 1017 1920 1940 1960 1980 YEAR Figure 51: Small dots denote te December sea-level pressures during 1921-1992 observed by te Bombay weater station in India Te solid curve is te jump-preserving estimator by te procedure (14) 6 Concluding Remarks We ave presented a jump-preserving curve fitting procedure wic automatically accommodates possible jumps of te regression curve witout knowing te number of jumps Te fitted curve is proved to be statistically consistent in te entire design space Numerical examples sow tat it works reasonably well in applications Te following issues related to tis topic need furter investigation First, te procedure (14) works well in boundary regions [0, /2) and (1 /2, 1] only under te condition tat tere are no jumps in [0, ) and (1, 1] Tis condition can always be satisfied wen te sample size is large Wen te sample size is small, owever, tis condition may not be true in some cases and it is still an open problem to fit f( ) wen jumps exist in te boundary regions Second, te plog-in procedures to coose bandwidt of a local smooter are often based on te bias-variance trade-off of te fitted regression model Exact expressions for te mean and variance of te jump-preserving procedure (14) are not available yet, wiceeds furter researc Acknowledgement: Te autor would like to tank Mr Alexandre Lambert of te Institut de Statistique at Universite catolique de Louvain in Belgium for pointing out a mistake in te expression of C 2 appeared in (34) 13

Appendix A Proof of Teorem 31 We only prove equation (31) ere Equation (32) can be proved similarly First of all, E(â l,0 (x)) = f(x i )K l ( x i x ) w l,2 w l,1 (x i x) n w l,0 w l,2 wl,1 2 (A1) We notice tat te summation on te rigt and side of (A1) is only for tose x i in [x /2, x) By Taylor s expansion, f(x i ) = f(x) + f (x)(x i x) + 1 2 f (x)(x i x) 2 + o( 2 n ), (A2) were x i [x /2, x) By combining (A1) and (A2), we ave E(â l,0 (x)) = f(x) + f (x) w2 l,2 w l,1w l,3 2(w l,0 w l,2 w 2 l,1 ) + o(2 n), (A3) were w l,3 = n K l ( x i x )(x i x) 3 Furtermore it can be cecked tat w l,0 n = v l,0 + o(1), w l,1 n 2 n = v l,1 + o(1), w l,2 n 3 n = v l,2 + o(1), w l,3 n 4 n = v l,3 + o(1), (A4) were v l,j = 0 1/2 xj K l (x)dx for j = 0, 1, 2, 3 By combining (A3) and (A4), we ave E(â l,0 (x)) = f(x) + f (x) v2 l,2 v l,1v l,3 2(v l,0 v l,2 v 2 l,1 )2 n + o(2 n ) Terefore Now let E(â l,0 (x)) f(x) = f (x) v2 l,2 v l,1v l,3 2(v l,0 v l,2 vl,1 2 )2 n + o(2 n ) (A5) ɛ i = ɛ i I(i 1/2 ɛ i ), i = 1, 2,, n g n (x) = K l ( x i x ) w l,2 w l,1 (x i x) n w l,0 w l,2 wl,1 2 ɛ i g n (x) = K l ( x i x ) w l,2 w l,1 (x i x) n w l,0 w l,2 wl,1 2 ɛ i =: g n (i) 14

For any ε > 0, P ( log n log log n [ g n(x) E( g n (x))] > ε) exp(log n ε(log log n)1/2 )E(Π n exp( (log log n) 1/2 [ g n(i) E( g n (i))])) n ε(log log n 4/5 n)1/2 exp( V ar( g n (i))) log log n by an application of te Cebysev s inequality of te exponential form Now were C l,1 (K) is a constant So for all x [ /2, 1] V ar( g n (i)) σ 2 n Kl 2 (x i x )[ w l,2 w l,1 (x i x) n w l,0 w l,2 wl,1 2 ] 2 = σ2 0 Kl 2 (x)( v2 l,2 v l,1x n 1/2 v l,0 v l,2 vl,1 2 ) 2 dx = σ2 n C l,1 (K), P ( log n log log n [ g n(x) E( g n (x))] > ε) = O(n ε(log log n)1/2 ) (A6) We now define D n = {x : x n 1/δ + 1, x R}, for some δ > 0 Let E n be a set suc tat, for any x D n, tere exists some Z(x) E n suc tat x Z(x) < n 2, and E n as at most N n = [2n 2 (n 1/δ + 1)] + 1 elements, were [x] denotes te integral part of x Ten were log n log log n g n E( g n ) [n/2,1] D n S 1n + S 2n + S 3n, S 1n = S 2n = S 3n = sup log n log log n g n (x) g n (Z(x)) x [/2,1] D n sup log n log log n g n (Z(x)) E( g n (Z(x))) x [/2,1] D n sup log n log log n E( g n (Z(x))) E( g n (x)) x [/2,1] D n From (A6), P (S 2n > ε) = O(N n n ε(log log n)1/2 ) By te Borel-Cantelli Lemma, lim S 2n = 0, as n (A7) 15

Now S 1n = sup log n log log n [K l ( x i x ) w l,2 w l,1 (x i x) x [/2,1] D n n w l,0 w l,2 wl,1 2 K l ( x i Z(x) ) w l,2 w l,1 (x i Z(x)) w l,0 w l,2 wl,1 2 ] ɛ i n 1/2 1 sup log n log log n [K l ( x i x ) v l,2 v l,1 (x i x)/ x [/2,1] D n n n v l,0 v l,2 vl,1 2 K l ( x i Z(x) +1/2 C l,2 (K) log n log log n n 2, ) v l,2 v l,1 (x i Z(x))/ v l,0 v l,2 vl,1 2 ] were C l,2 (K) is a constant In te last inequality above, we ave used te Lipscitz (1) property of K l ( ) Terefore lim S 1n = 0, as n (A8) Similarly, lim S 3n = 0 n (A9) By combining (A7)-(A9), we ave log n log log n g n E( g n ) [n/2,1] D n = o(1), as (A10) Now, g n E(g n ) [n/2,1] g n g n [n/2,1] + g n E( g n ) [n/2,1] + E( g n ) E(g n ) [n/2,1] Since E(ɛ 2 1 ) <, tere exists a full set Ω 0 suc tat for eac ω Ω 0 tere exists a finite positive integer N ω and for n N ω, So for all n N ω, g n (x) g n (x) 1 n N ω C(N ω) n, were C(N ω ) is a constant Terefore, ɛ n (ω) = ɛ n (ω) K l ( x i x ) v l,2 v l,1 (x i x)/ v l,0 v l,2 vl,1 2 ɛ i ɛ i log n log log n g n g n [n/2,1] = o(1), as (A11) 16

Similarly, By (A10)-(A12), we ave log n log log n E( g n) E(g n ) [n/2,1] = o(1) log n log log n g n E(g n ) [n/2,1] = o(1), as (A12) (A13) By (A5) and (A13), we get equation (31) B Proof of Teorem 32 By te definition of RSS r (x), RSS r (x) = [y i â r,0 (x) â r,1 (x)(x i x)] 2 K r ( x i x ) n = [ɛ i + f(x i ) â r,0 (x) â r,1 (x)(x i x)] 2 K r ( x i x ) n = ɛ 2 i K r( x i x ) + 2 ɛ i [f(x i ) â r,0 (x) â r,1 (x)(x i x)]k r ( x i x ) + n n [f(x i ) â r,0 (x) â r,1 (x)(x i x)] 2 K r ( x i x ) n =: I 1 + I 2 + I 3 Let us first prove equation (33) under te condition tat f( ) as continuous first-order derivative in [x, x + /2] By similar auguments to tose in Appendix A, I 1 = v r,0 σ 2 n + o(n ), as (B1) Now I 2 = 2 ɛ i [f(x) + f (x)(x i x) â r,0 (x) â r,1 (x)(x i x) + o( )]K r ( x i x ) = 2(f(x) â r,0 (x)) ɛ i K r ( x i x ) + 2(f (x) â r,1 (x)) = o(n ) + o( 1 n ) O(n) O( ) + o(n ) ɛ i K r ( x i x )(x i x) + o(n ) = o(n ) (B2) In te tird equation above, we ave used te results tat f(x) â r,0 (x) = o(1), as, and f (x) â r,1 (x) = o(1/ ), as, were te first result is from Teorem 31 and te second result can be 17

derived by similar arguments to tose in Appendix A It can be similarly cecked tat I 3 = o(n ), as (B3) By combining (B1)-(B3), we get equation (33) Next we prove equation (34) under te condition tat f( ) as a jump in [x, x + /2] at x = x + were 0 1/2 is a constant First, â r,0 (x) = y i K r ( x i x ) w r,2 w r,1 (x i x) n w r,0 w r,2 wr,1 2 = f(x i )K r ( x i x ) w r,2 w r,1 (x i x) x i <x w r,0 w r,2 wr,1 2 + f(x i )K r ( x i x ) w r,2 w r,1 (x i x) x i x n w r,0 w r,2 wr,1 2 + ɛ i K r ( x i x ) w r,2 w r,1 (x i x) n w r,0 w r,2 wr,1 2 = (f (x ) + o(1))k r ( x i x ) w r,2 w r,1 (x i x) x i <x w r,0 w r,2 wr,1 2 + (f (x ) + d + o(1))k r ( x i x ) w r,2 w r,1 (x i x) x i x n w r,0 w r,2 wr,1 2 + o(1), as = f (x ) + d K r (x)(v r,2 v r,1 x)dx v r,0 v r,2 vr,1 2 + o(1), as (B4) In te last equation above, we ave used (A4) Similarly we can ceck tat Ten I 2 = 2 â r,1 (x) = d x i <x ɛ i [f(x i ) f (x ) d 2 ɛ i [f(x i ) f (x ) d x i x 2 ɛ i (x i x)k r ( x i x ) d = 2 d K r (x)(v r,0 x v r,1 )dx (v r,0 v r,2 v 2 r,1 ) + o(1/ ), as (B5) K r (x)(v r,2 v r,1 x)dx v r,0 v r,2 vr,1 2 K r (x)(v r,2 v r,1 x)dx v r,0 v r,2 vr,1 2 ]K r ( x i x ) + K r (x)(v r,2 v r,1 x)dx v r,0 v r,2 v 2 r,1 ]K r ( x i x ) K r (x)(v r,0 x v r,1 )dx (v r,0 v r,2 v 2 r,1 ) + o(n ) ɛ i K r ( x i x ) + x i <x K r (x)(v r,2 v r,1 x)dx 2d (1 v r,0 v r,2 vr,1 2 ) ɛ i K r ( x i x ) + o(n ), as x i x n = o(n ), as (B6) 18

and I 3 = x i <x [f(x i ) f (x ) d d 1/2 (v r,0 v r,2 vr,1 2 ) K r (x)(v r,0 x v r,1 )dx x i x [f(x i ) f (x ) d d K r (x)(v r,2 v r,1 x)dx v r,0 v r,2 vr,1 2 (x i x)] 2 K r ( x i x ) + K r (x)(v r,2 v r,1 x)dx v r,0 v r,2 vr,1 2 K r (x)(v r,0 x v r,1 )dx (v r,0 v r,2 vr,1 2 ) (x i x)] 2 K r ( x i x ) + o(n ), as = n [ d K r (x)(v r,2 v r,1 x)dx 0 v r,0 v r,2 vr,1 2 + d n [ d 0 K r(x)(v r,2 v r,1 x)dx v r,0 v r,2 vr,1 2 d o(n ), as K r (x)(v r,0 x v r,1 )dx v r,0 v r,2 vr,1 2 u] 2 K r (u)du + K r (x)(v r,0 x v r,1 )dx v r,0 v r,2 vr,1 2 u] 2 K r (u)du + = d 2 C 2 n + o(n ), as (B7) By combining (B1), (B6) and (B7), we get equation (34) 19

References Cu, CK, and Marron, JS (1991), Coosing a kernel regression estimator, Statistical Science 6, 404-436 Cobb, GW(1978), Te problem of te Nile: conditional solution to a cangepoint problem, Biometrika 65, 243-251 Eubank, RL, and Speckman, PL(1994), Nonparametric estimation of functions wit jump discontinuities, IMS Lecture Notes, vol 23, Cange-Point Problems (E Carlstein, HG Müller and D Siegmund eds), 130-144 Fan, J, and Gijbels, I (1996), Local Polynomial Modelling and Its Applications, Capman & Hall: London Hall, P, and Titterington, M(1992), Edge-preserving and peak-preserving smooting, Tecnometrics 34, 429-440 Hastie, T, and Loader, C (1993), Local regression: automatic kernel carpentry, Statistical Science 8, 120-143 Koo, JY (1997), Spline estimation of discontinuous regression functions, Journal of Computational and Grapical Statistics 6, 266-284 Loader, CR (1996), Cange point estimation using nonparametric regression, Te Annals of Statistics 24, 1667-1678 Loader, CR (1999), Bandwidt selection: classical or plug-in?, Te Annals of Statistics 27, 415-438 McDonald, JA, and Owen, AB(1986), Smooting wit split linear fits, Tecnometrics 28, 195-208 Müller, HG(1992), Cange-points in nonparametric regression analysis, Te Annals of Statistics 20,737-761 Qiu, P(1994), Estimation of te number of jumps of te jump regression functions, Communications in Statistics-Teory and Metods 23, 2141-2155 20

Qiu, P, Asano, Ci, and Li, X(1991), Estimation of jump regression functions, Bulletin of Informatics and Cybernetics 24, 197-212 Qiu, P, Cappell, R, Obermeyer, W, and Benca, R (1999), Modelling daily and subdaily cycles in rat sleep data, Biometrics 55, 930-935 Qiu, P, and Yandell, B (1998), A local polynomial jump detection algoritm in nonparametric regression, Tecnometrics 40, 141-152 Sea, DJ, Worley, SJ, Stern, IR, and Hoar, TJ (1994), An introduction to atmosperic and oceanograpic data, NCAR/TN-404+IA, Climate and Global Dynamics Division, National Center For Atmosperic Researc, Boulder, Colorado Siau, JH, Waba, G, and Jonson, DR (1986), Partial spline models for te inclusion of tropopause and frontal boundary information in oterwise smoot two- and tree-dimensional objective analysis, Journal of Atmosperic and Oceanic Tecnology 3, 714-725 Wang, Y (1995), Jump and sarp cusp detection by wavelets, Biometrika 82, 385-397 Wu, JS, and Cu, CK(1993), Kernel type estimators of jump points and values of a regression function, Te Annals of Statistics 21, 1545-1566 21