Root-Unroot Methods for Nonparametric Density Estimation and Poisson Random-Effects Models

Size: px

Start display at page:

Download "Root-Unroot Methods for Nonparametric Density Estimation and Poisson Random-Effects Models"

Dominick Strickland
5 years ago
Views:

University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 00 Root-Unroot Methods for Nonparametric Density Estimation and Poisson Random-Effects Models Lwawrence D.

1 University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 00 Root-Unroot Methods for Nonparametric Density Estimation and Poisson Random-Effects Models Lwawrence D. Brown University of Pennsylvania Ren Zhang Linda H. Zhao University of Pennsylvania Follow this and additional works at: Part of the Statistics and Probability Commons Recommended Citation Brown, L. D., Zhang, R., & Zhao, L. H. (00). Root-Unroot Methods for Nonparametric Density Estimation and Poisson Random- Effects Models. Retrieved from This paper is posted at ScholarlyCommons. For more information, please contact

2 Root-Unroot Methods for Nonparametric Density Estimation and Poisson Random-Effects Models Disciplines Statistics and Probability This technical report is available at ScholarlyCommons:

3 Root-Unroot Methods for Nonparametric Density Estimation And Poisson random-effects models 1 Lawrence D. Brown University of Pennsylvania Statistics Department Philadelphia, PA (lbrown@wharton.upenn.edu) Joint work with: Ren Zhang Linda Zhao 1. For departmental Statistics seminar 1/16/00. Revision and expansion of invited talk delivered at the Joint Statistical Meetings in Atlanta, GA, August, 001 1

4 Outline of Presentation 0. Preamble (motivation). 1.Description of the root-unroot method for density estimation. (Really, a methodette.).motivation via Poissonization and variance stabilization. 3.Properties of the root-unroot step. 4.Some simulation pictures. 5.Comments about signal-to-noise ratio. 6.An empirical example from a telephone call-in service center. 7.A two-way random-effects analysis of call arrival rates.

5 Motivation Nonparametric regression and nonparametric density estimation are siblings. (Maybe even non-identical twins.) But I like better working with the regression formulation. (I feel a better sense of rapport and understanding!) So my goal is to convert a density problem into a regression one; so I can work with it as such. 3

6 Description of the Methodette 1. DATA: {X 1,, X n } i.i.d. from a density f on [0, 1].. BIN: Create K (equal width) bins. Let N k = # of {X i } in the kth bin, Bin k T k = Center of Bin k. 3. ROOT : Calculate Y k = K N k + 1 n ESTIMATE: Apply your favorite nonparametric regression estimator to {T k, Y k }. This produces an estimate: ĥ(t) of h(t) f(t). Since E( Y ) f(t ) = h(t ). k If helpful, capitalize on k k Var(Y ) k K for all k. 4n 4

7 5. UN-ROOT : Calculate ˆ h(t) and renormalize to be a density. For equal width bins this gives ˆ ˆ (h(t k ) + ) f(t k ) = K K ˆ (h(t ) ) j=1 j +. 5

8 Data Example 1. DATA: Histogram of Data: n = 18 It s accidental that n is a power of ; this is irrelevant to the methodette. This data is a random sample from the two-humps density in Wahba (1983). 6

9 & 3. BIN AND ROOT: SqRt(N+1/4) T Scatterplot of SqRt(N k +1/4) versus T k 7

10 4. ESTIMATE: SqRt(N+1/4) T Smoothing Spline Fit Smoothing Spline Fit, lambda= R-Square 0.94 Sum of Squares Error This fit has about 8 df for error ( 8 ); and consequently about 4 df for the 1/4 model. 8

11 6. UN-ROOT: New Pred% X Estimate of f(x) 9

12 How Well Did We Do? (1) Y X Plot of Estimate and True Density True density = thick curve Estimate = narrow curve 10

13 How Well Did We Do? () Y X Comparison of Estimators Thin curve: Root-unroot estimator Medium curve: Kernel density estimator Heavy curve: True density Both estimators capture the qualitative two-humps feature. Neither estimator provides a tremendously good fit. (More on this later.) Fair and revealing Monte Carlo comparisons are difficult to construct; the comparisons depend much more on the appropriateness of the respective density and regression estimators than on the integrity of direct density estimation versus the root-unroot regression paradigm. More, later. 11

14 Poissonization: Motivation Suppose n = N, a Poisson(λ) random variable. Then N k Poisson(λ k ) with λ k = λ f(t k ) andf(t k ) = Average of f over Bin k. These N k are independent. λ We then wish to make inference aboutf(t ) = k. λ k N k is ancillary to λ k /λ. A natural way to proceed is to estimate λ k ˆ k (nonparametrically) by λ, and then estimate f(t k ) by λˆ λ ˆ. k j 1

15 Variance Stabilization: An asymptotically unbiased, second-order mean-stabilizing, first-order variance stabilizing transformation for the independent counts, N k, is N These variables are approximately normal, with mean = λ k, and variance = 1/4, as well as independent. k Variance stabilization and Poissonization each have extensive statistical pedigrees. Density estimation schemes closely related to the above have also been often proposed. See Fan & Gijbels (1996), and Vidakovic (1999) for two recent versions, each of which is somewhat different from the above Nussbaum (1996), Nussbaum and Klemela (1999), Carter (001) and Brown, Low and Zhang(001) all contain asymptotic equivalence theorems formally justifying (sometimes complicated) versions of the above scheme, under suitable regularity conditions. 13

16 Properties of the Root-Unroot Step Stabilizes the mean: Y lambda { } Mean of ( ) λ E Poiss +c for c = 0 and c = 1/4 Also shown is the line Y = λ. 14

17 Detail from Previous Plot Y lambda { } E Poiss +c for c = 0 and c = 1/4 µ λ = ( λ ) Also shown is the line Y = λ. (Lowest curve is for c = 0. Highest curve is for c = 1/4.) 15

18 Stabilizes the variance: Y λ lambda σ = Var( Poiss λ + c) for c = 0 and c =1/4 The nominal (and limiting) value is 1/4. Upper curve is for c = 0; Other curve is for c = 1/4; CONCLUSION: It is suitable to use the root-unroot method with values of n/k as small as 3 or 4. As n increases it seems suitable to let n/k increase slowly. 16

19 Notes: Apparently, Bartlett (1936) was the first to propose the transformation Y = Poiss as the variance stabilizing transformation. He then proposed using it in a homoscedastic linear model. Anscombe (1948) proposed improving the variance stabilizing properties by instead using Y = Poiss + 3/8. (He credits this result to A. H. L. Johnson.) He apparently thought that this transformation was also optimal with respect to its mean. (See his equation (.1).) In fact this transformation is slightly more biased than our Y = Poiss + 1/4, as is shown by asymptotic analysis or by the plot below. I do not know who should be credited as the first to propose Y = Poiss + 1/4 as the asymptotically unbiased, nearly variance-stabilizing transformation. On the grounds that bias is more important here than variance, we prefer to use Y = Poiss + 1/4. This transformation has such nice properties that it doesn t seem worthwhile to try to make further improvements at the expense of complicating the formula. 17

20 Plots for Anscombe s Transformation Y lambda { } µ λ = ( ) λ E Poiss +c for c = 1/4 and c = 3/8 Also shown is the line Y = λ. Lower curve is for c = 1/4. Higher curve is for c = 3/8. Revealed in the plot is the limiting bias of 1/8 for { } Anscombe s ( 3/8) λ E Poiss +. 18

21 Y λ lambda σ = Var( Poiss λ + c) for c = 1/4 and c = 3/8 The nominal (and limiting) value is 1/4. Upper curve is for c = 1/4; Other curve is for c = 3/8; 19

22 Our transformation does pretty well with higher moments too: Skewness Y lambda* Skewness = λ 3 λ λ 3 E((Poiss - µ ) ) σ Lower curve is for c = 0; Other curve is for c = ¼; The nominal (and limiting) value is 0. 0

23 6 5 4 Y lambda* Kurtosis = The more variable curve is for c = 0; Other curve is for c = ¼; The nominal (and limiting) value is 3. λ 4 λ λ 4 E((Poiss - µ ) ) σ 1. A digression because this is about one Poisson observation, but the principal goal here is density estimation and related inference.. 1

24 Digression 1 : Confidence Intervals For single Poisson variables The transformation N + 1 produces good individual 4 confidence intervals. Here is a plot of the coverage of the nominal 95% interval N 1.96, 1 N for λ, where N~Poiss(λ) A digression because this is about one Poisson observation, but the principal goal here is density estimation and related inference..

25 Y lambda Y = Coverage of 95% root-unroot interval as a function of λ 3

26 A naïve 95% confidence interval could be formed by exploiting the fact that N is the MLE of Var(N) = λ. This interval would be N 1.96 N, N N. ( ) Note that the 95% root-unroot interval is N 1.96 N +, N 1.96 N so long as N Thus, for N 4 the two intervals have the same length, but the root-unroot interval is (always) shifted to the right. Here is a comparison of the coverage of the two intervals: 4

27 Y lambda Y = Coverage of 95% root-unroot interval and of 95% conventional Wald interval as a function of λ End of Digression 5

28 An Asymptotic Comparison Fair and revealing Monte Carlo comparisons are difficult to construct; the comparisons depend much more on the appropriateness of the respective density and regression estimators than on the integrity of direct density estimation versus the root-unroot regression paradigm. Here is an asymptotic comparison to give an idea of the situation. Consider a standard kernel estimator, with kernel W. For the ordinary density problem this is 1 1 X () ( i t f t = W ). n d d i 6

29 For the root-unroot method this uses the formula 1 1 t ˆ( ) ( k t ht = W ) Yk, K k d d and then un-roots according to fˆ() t = ( hˆ()) t. (Asymptotically ( ht ) ˆ( ) dt 1, very nearly, so we not not carry out the renormaliztion step.) Take W = Unif(-1/, 1/) for numerical simplicity, and d = C/ n 1/5. Define the IMSE Risk for an estimator f as 1 R= ( f ( t) f( t)) dt ( f ( tk) f( tk)). K Note that R can be decomposed as R = Bias + Var. 7

30 Then, for the density estimator 4 4/5 C 4 dens = ( ") + Ψ dens 1 1 n R f C +, 4 C C and for the Root-unroot estimator 4 4/5 C f 4 r u~ ( " ) + Ψ R U ( ') 1 1 n R f C +, 4 f C C since ˆ ~ ˆ h h h h 4h so that ( ) ( ) ( prob ) hˆ ( ) ˆ bias ~ 4 h bias ( f ') 4 h ( h") = ( f " ) f h =. Clearly, it is possible that R dens < R r-u (if f is linear); and also that R dens > R r-u (if h = f is linear). 8

31 Here is a table showing the results for some simple situations: Density =.95f+.05 : Ψ dens Ψ R-U R** dens f= sin πx R** R-U 30x (1-x) x(1-x) Table of Bias term, Ψ, for 3 simple density functions. The table also shows the Oracle Risk, R**, where 4/5 5 /5 R** = inf C{ n R} = 4/5 4 Ψ. Conclusion: The root-unroot estimator is very slightly superior to the original density estimator when f is smooth in the region where it s small. It is considerably inferior when f is linear in that region. 9

32 The previous considerations suggest that it might be more sensible to model the problem in terms of h = f; and to judge the quality of an estimator according to its squared Hellinger risk H R = 1 f t f t dt f t K f t. (The above general conclusion remains valid.) ( () ()) ( ( ) ( )) k k 30

33 Simulations The Root-unroot scheme enables reduction of the problem to one effectively having K = m equally spaced, homoscedastic and independent observations. This is exactly suitable for wavelet analyses. Here are plots showing the result of using the root-unroot scheme along with the Block J-S wavelet procedure in Cai and Silverman(1998+). The functions exhibited here are hard-to-fit forms adapted from Donoho and Johnstone (1995). ( Heavisine and Blocks ) 31

34 n = obs, K = 4096 bins, (Basic wavelet is s10.) f(x) binest true density x 3

35 Here is the same plot, also showing a local linear fit estimator. n = obs, K = 4096 bins f(x) binest true density locfit x 33

36 More Heavisine plots obs, K =51 bins (Basic wavelet is s10) f(x) binest true density x obs, 51 intnum, f(x) binest true density locfit x 34

37 Here are Heavisine plots with fewer observations. Note that the quality of fit is much poorer obs, 104 intnum, f(x) binest true density x 6000 obs, 104 intnum f(x) binest true density locfit x 35

38 BLOCKS: n = obs, K = 4096 (s8) f(x) binest true density x n = obs, K = 4096 (s8) f(x) binest true density locfit x Note: Use of smaller K, K = 51, barely affects the plots. 36

39 BLOCKS: n = 6000 obs, K = 104 (s8) f(x) binest true density x n = 6000 obs, K = 104 (s8) f(x) binest true density locfit x 37

40 Signal to noise ratio Question: Why are the plots for n = 6000 so poor? Why aren t the plots for n = nearly perfect? (That s the situation with the nonparametric regression data analyzed in Donoho, et. al. (1994, 1995, +) for these curves.) Answer: Because density estimation is intrinsically much harder than the nonparametric regression situations of those articles. Detail: Here is a scale-free (in Y) Difficulty of Estimation index: Average local S. D. of the error DE =. Average intensity of the signal Where denotes a scale free (in Y) measure of the local complexity of the signal (see below). For a homoscedastic regression problem with signal g and q observations each having S.D. = σ this is 38

41 σ / q 1 DE = =, SNR q ( ( gx ( ) g) ) 1/ where SNR denotes the conventional Signal to Noise ratio, SNR ( ( gx ( ) g ) ) 1/ =. σ A suitable form for is 4 = ( ( f '') ) 1 1 ( ( f f ) ) The root-ed density problem is a nonparametric regression with signal h = f, and there are K observed values of Y k, each having σ = K n. We get 39

42 DE h K n 1 = K 1/ h = 1/ ( hx ( ) h) ( hx ( ) h) n ( ) ( ) h (Note that K cancels out of this expression, as it should so long as n/k is not very small.) This is the DE for the root-ed problem. Unrooting makes the problem twice as hard since ( h+ ε ) h + ε = f + ε; The complexity factor also changes when comparing the rooted and unrooted problems, but the change is small; thus h = 1 ± ( a little) f Hence we should ascribe to the density problem the difficulty 1 DE orig prob =. ( ) 1/ ( hx ( ) h) 1 n 40

43 Here is a table showing DE orig for our density problem as compared to the DE s for the regression setup of Donoho, et. al. in which SNR = 7 and n = DE for Function density regression Heavisine Blocks DE s for n = (DE reg is for SNR=7.) According to this measure, the density problem is from 0 to 30 times harder than the regression one at the same sample size. THUS It needs from 400 to 900 as many observations to provide a comparable quality of estimate. 41

44 Empirical Example From a Bank Call-in Center in Israel The data are records of all telephone calls made during 1999 to a telephone assistance and service center operated by a bank in Israel. Today I ll present just an analysis of the time of day that these calls arrive at the center. We look only at telephone business hours - 7am to midnight on regular weekdays (Sun. through Thurs. in Israel), excluding holidays. The first plot is for REGULAR customers. (n =58,500). The second is for INTERNET customers (n = 14,30), who call a special number seeking assistance with on-line banking. (Differences for the days of the week and the seasons of the year did exist, but were relatively minor. These are ignored in the current analysis, but will be considered in the analysis in the last section of the talk.) An advantage of the root-unroot methodology is that it reduces the problem to a homoscedastic regression in which better-understood tools lead to 95% confidence (variance) interval bands in addition to estimates. The following plots were produced via a root-unroot scheme, using the free-knots nonparametric regression spline methodology of Zhao (1999, rev. 001). 4

45 Arrival time of REGULAR calls 43

46 Arr ival tim e of IN TE RN ET call s Note that the arrival patterns of regular and internet customers are significantly different. (The internet calls arrive more uniformly across the day, but with a statistically significant, noticeable late night local-mode at 10-11pm.) 44

Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½

University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 1998 Asymptotic Nonequivalence of Nonparametric Experiments When the Smoothness Index is ½ Lawrence D. Brown University