Discussion Paper No. 28

Similar documents
Directional Statistics

arxiv: v2 [stat.me] 26 Feb 2014

By Bhattacharjee, Das. Published: 26 April 2018

Circular Distributions Arising from the Möbius Transformation of Wrapped Distributions

On Smooth Density Estimation for Circular Data 1

Nonparametric Modal Regression

and Srikanth K. Iyer Department of Mathematics Indian Institute of Technology, Kanpur, India. ph:

NONPARAMETRIC BAYESIAN DENSITY ESTIMATION ON A CIRCLE

On Development of Spoke Plot for Circular Variables

Comparing measures of fit for circular distributions

1 Isotropic Covariance Functions

Directional Statistics by K. V. Mardia & P. E. Jupp Wiley, Chichester, Errata to 1st printing

Boosting kernel density estimates: A bias reduction technique?

On robust and efficient estimation of the center of. Symmetry.

Speed and change of direction in larvae of Drosophila melanogaster

n=0 ( 1)n /(n + 1) converges, but not

Smooth simultaneous confidence bands for cumulative distribution functions

From Histograms to Multivariate Polynomial Histograms and Shape Estimation. Assoc Prof Inge Koch

Local linear multiple regression with variable. bandwidth in the presence of heteroscedasticity

Inference for High Dimensional Robust Regression

Boundary Correction Methods in Kernel Density Estimation Tom Alberts C o u(r)a n (t) Institute joint work with R.J. Karunamuni University of Alberta

A NOTE ON TESTING THE SHOULDER CONDITION IN LINE TRANSECT SAMPLING

SYMMETRIC UNIMODAL MODELS FOR DIRECTIONAL DATA MOTIVATED BY INVERSE STEREOGRAPHIC PROJECTION

CONVERGENCE OF THE FOURIER SERIES

Histogram Härdle, Müller, Sperlich, Werwatz, 1995, Nonparametric and Semiparametric Models, An Introduction

Part IB. Further Analysis. Year

Nonparametric Econometrics

Non-parametric Inference and Resampling

POLAR FORMS: [SST 6.3]

FFTs in Graphics and Vision. Homogenous Polynomials and Irreducible Representations

Nonparametric Regression. Changliang Zou

Kernel Density Estimation

Adaptive Nonparametric Density Estimators

On prediction and density estimation Peter McCullagh University of Chicago December 2004

A Primer on Asymptotics

Nonparametric Inference via Bootstrapping the Debiased Estimator

Adaptive Kernel Estimation of The Hazard Rate Function

Bickel Rosenblatt test

Nonparametric Methods

LASR 2015 Geometry-Driven Statistics and its Cutting Edge Applications: Celebrating Four Decades of Leeds Statistics Workshops

EXACT MEAN INTEGRATED SQUARED ERROR OF HIGHER ORDER KERNEL ESTIMATORS

A complex exponential Fourier transform approach to gradient density estimation

High Dimensional Empirical Likelihood for Generalized Estimating Equations with Dependent Data

NONPARAMETRIC DENSITY ESTIMATION WITH RESPECT TO THE LINEX LOSS FUNCTION

Economics 583: Econometric Theory I A Primer on Asymptotics

Bootstrap Goodness-of-fit Testing for Wehrly Johnson Bivariate Circular Models

WEIGHTED QUANTILE REGRESSION THEORY AND ITS APPLICATION. Abstract

Log-Density Estimation with Application to Approximate Likelihood Inference

A kernel indicator variogram and its application to groundwater pollution

Stochastic Optimization with Inequality Constraints Using Simultaneous Perturbations and Penalty Functions

Nonparametric Density Estimation

Approximation by Conditionally Positive Definite Functions with Finitely Many Centers

SMOOTHED BLOCK EMPIRICAL LIKELIHOOD FOR QUANTILES OF WEAKLY DEPENDENT PROCESSES

MTH 3102 Complex Variables Final Exam May 1, :30pm-5:30pm, Skurla Hall, Room 106

Density Estimation with Replicate Heteroscedastic Measurements

A CIRCULAR CIRCULAR REGRESSION MODEL

Quantile prediction of a random eld extending the gaussian setting

Multivariate and Time Series Models for Circular Data with Applications to Protein Conformational Angles

Bayesian Indirect Inference and the ABC of GMM

Tube formula approach to testing multivariate normality and testing uniformity on the sphere

Defect Detection using Nonparametric Regression

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 2: PROBABILITY DISTRIBUTIONS

Density estimation Nonparametric conditional mean estimation Semiparametric conditional mean estimation. Nonparametrics. Gabriel Montes-Rojas

A nonparametric method of multi-step ahead forecasting in diffusion processes

Infinite Series. 1 Introduction. 2 General discussion on convergence

ON SOME TWO-STEP DENSITY ESTIMATION METHOD

SUPPLEMENTARY MATERIAL FOR PUBLICATION ONLINE 1

arxiv: v1 [math.ca] 31 Dec 2018

3 a = 3 b c 2 = a 2 + b 2 = 2 2 = 4 c 2 = 3b 2 + b 2 = 4b 2 = 4 b 2 = 1 b = 1 a = 3b = 3. x 2 3 y2 1 = 1.

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

A Novel Nonparametric Density Estimator

Worksheet 9. Topics: Taylor series; using Taylor polynomials for approximate computations. Polar coordinates.

A Comparison of Robust Estimators Based on Two Types of Trimming

You can learn more about the services offered by the teaching center by visiting

Modeling data using directional distributions: Part II

Expectation Maximization - Math and Pictures Johannes Traa

1 Introduction A central problem in kernel density estimation is the data-driven selection of smoothing parameters. During the recent years, many dier

a k 0, then k + 1 = 2 lim 1 + 1

Numerical Methods. King Saud University

covariance function, 174 probability structure of; Yule-Walker equations, 174 Moving average process, fluctuations, 5-6, 175 probability structure of

arxiv:math.ca/ v2 17 Jul 2000

Maximum Smoothed Likelihood for Multivariate Nonparametric Mixtures

Davy PAINDAVEINE Thomas VERDEBOUT

The Title of an Article

Part IB. Complex Analysis. Year

Qualifying Exams I, 2014 Spring

Quantitative Economics for the Evaluation of the European Policy. Dipartimento di Economia e Management

Local Whittle Likelihood Estimators and Tests for non-gaussian Linear Processes

The Mixture Approach for Simulating New Families of Bivariate Distributions with Specified Correlations

Histogram density estimators based upon a fuzzy partition

Measurement Error in Nonparametric Item Response Curve Estimation

Linear Methods for Prediction

Asymptotic inference for a nonstationary double ar(1) model

Estimation and Simulation in Directional and Statistical Shape Models

An Empirical Characteristic Function Approach to Selecting a Transformation to Normality

Local Polynomial Modelling and Its Applications

Asymptotics of minimax stochastic programs

Math 489AB A Very Brief Intro to Fourier Series Fall 2008

The Laplacian in Polar Coordinates

Likelihood Ratio Criterion for Testing Sphericity from a Multivariate Normal Sample with 2-step Monotone Missing Data Pattern

Transcription:

Discussion Paper No. 28 Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on the Circle Yasuhito Tsuruta Masahiko Sagae

Asymptotic Property of Wrapped Cauchy Kernel Density Estimation on the Circle Yasuhito Tsuruta Masahiko Sagae 2016/6/10 Abstract We discuss theoretical properties of the Wrapped Cauchy (WC) kernel. In this paper, we show that the WC kernel has the convergent rate of O(n 2/3 ) and holds the asymptotic normality. The rate of the WC kernel is not better than the rate O(n 4/5 ) of the von Mises (VM) kernel. However, some numerical experiments show the better behavior of the WC kernel rather than the VM kernel under the condition of the multimodality and/or the heavy tail. 1 Introduction Directional data are represented by an angle (θ [0, 2π), θ = θ + 2mπ, m Z) or a unit vector (x = (cos(θ), sin(θ)) T ). Some examples of directional data are wind directions and electric power over a period of 24 hours and so on. For the preceding work of the kernel density estimation on the circle, see Hall et al. (1987), C. C. Taylor (2008), Di Marzio et al. (2009, 2011). Di Marzio et al. (2009, 2011) introduced the moments of sin-order p for the kernel on the circle. Its idea is somewhat similar to the moments of order p on the real line. The kernel of sin-order 2 includes the von Mises (VM) kernel, the wrapped Cauchy (WC) kernel and the wrapped normal kernel. Di Marzio et al. (2009, 2011) derived asymptotic mean integrated square error (AMISE) via the definition of the kernel functions with sin-order p. They show the convergent rate of the VM kernel is O(n 4/5 ). This rate is equivalent to the convergent rate of the kernels of second order on the real Graguate School of Human and Socio-Environment Studies, Kanazawa University School of Economics, Kanazawa University 1

line. However, the convergent rate of the AMISE can not be derived without specifying the kernel. We show that the WC kernel has the optimal rate O(n 2/3 ) of the AMISE and the asymptotic normality. Di Marzio et al. (2011) mentioned the another expression about the asymptotic normality of the circular kernel function. The WC kernel is inferior to the VM kernel with respect to the theoretical aspect of the MISE. However, our numerical experiments show that the WC kernel is superior to the VM kernel under the condition of the multimodal and/or the heavy tail. 2 Properties of AMISE of a circular kernel density estimation Following Di Marzio et al. (2009, 2011), this section briefly refers definitions of the kernel function and properties of the AMISE. 2.1 Definition Definiton 1. (Kernel function) Let K κ (θ) be a circular kernel function and κ be a concentration parameter (κ is a smoothing parameter corresponding to the inverse of bandwidth on the real line), where K κ (θ) : [0, 2π) R is such that (a) it admits an uniformly convergent Fourier series, K κ (θ) = (2π) 1 {1 + γ j (κ) cos(jθ)}, j=1 θ [0, 2π), where γ j (θ) is strictly monotonic function of κ, (b) 2π 0 K κ = 1, and if K κ (θ) takes negative values, there exists 0 < M < such that, 2π 0 K κ dθ M, for all κ > 0, (c) lim κ 2π δ δ K κ (θ) dθ = 0, for all 0 < δ < π. Definiton 2. (Sin-order moment of the circular kernel) Let η j (K κ ) := 2π 0 sin j (θ)k κ (θ)dθ. K κ of sin-order p is chosen so that, η 0 (K κ ) = 1, η j (K κ ) = 0, 0 < j < p, and η p (K κ ) 0. 2

If η j (K κ ) is j = 2s(s = 1, 2,...), then η j (K κ ) is also represented by the sum of Fourier series γ j (κ) : η j (K κ ) = 1 ( ) j/2 j 1 ( ) j 2 2s 1 + ( 1) j+s γ 2s (κ). (1) j/2 j/2 + s s=1 Definiton 3. (Kernel density estimator) Let Θ 1,..., Θ n be random sample from the unknown circular density f(θ). Given a circular kernel K κ, the kernel estimator of f is defined as, ˆf(θ; κ) := 1 n K κ (θ Θ i ). n 2.2 Theoretical properties Theorem 1. (AMISE) Under the following conditions: (A) κ increases as n. for each j Z +,lim n γ j (κ) = 1, (B) lim n n 1 j=1 γ2 j (κ) = 0, i=1 (C) f is continuous and square-integrable, the second sin-order kernel has, AMISE[ ˆf( ; κ)] = η2 2 (K κ) 4 R ( f ) + R (K κ), (2) n where 2π 0 {g(θ)}2 dθ = R(g), η 2 (K κ ) = 1 2 (1 γ 2(κ)) and R(K κ ) = (1 + 2 i=1 γ2 j (κ))/(2π). The right-hand side terms of (2) correspond to ISB and IV, respectively. The convergent rate of AMISE can not be derived without choosing specific kernel, since (2) depends on the kernel function K κ. Di Marzio et al. (2009, 2011) derived AMISE of the VM kernel. The VM kernel is defined as, K κ (θ) := 1 exp{κ cos θ}, 0 < κ <, 2πI 0 (κ) where I p (κ) denotes the modified Bessel function of the first kind with order p. The characteristic function of VM kernel is defined as, ϕ p = I p(κ) I 0 (κ) p = 0, ±1,..., ±n,. 3

Proposition 1. (AMISE of VM kernel) Noting γ j (κ) = I j (κ)/i 0 (κ), the AMISE of the VM kernel is given as the following form : AMISE VM [ ˆf( ; κ)] = 1 { 1 I } 2 2(κ) R ( f ) + 1 + 2 j=1 {I j(κ)/i 0 (κ)} 2. 16 I 0 (κ) 2nπ (3) Under the condition that κ is sufficiently large, the asymptotic form (3) is reduced to, AMISE VM [ ˆf( ; κ)] = 1 4κ 2 R ( f ) + κ1/2. (4) 2nπ1/2 We obtain the optimum κ : ( κ = 2π 1/2 R ( f ) 2/5 n). (5) From (4) and (5), AMISE VM = O(n 4/5 ) is derived. Note that that O(n 4/5 ) is equivalent to the convergent rate of the second-order kernels on the real line. 3 Properties of the Wrapped Cauchy kernel The WC kernel is defined as, K ρ (θ) = 1 1 ρ 2 2π 1 + ρ 2 2ρ cos(θ), 0 < ρ 1. where ρ is concentration parameter. The characteristic function of the WC kernel is equal to, ϕ p = ρ p, p = 0, ±1,..., ±n,. Theorem 2. (AMISE of WC kernel) The AMISE of the WC kernel is given as the follows, AMISE WC [ ˆf( ; ρ)] = {1 ρ2 } 2 R (f ) 16 Let be 1 ρ 2 = h, 0 h < 1. Then, (6) is expressed as, AMISE WC [ ˆf( ; h)] = h2 R (f ) 16 4 + 1 nπ(1 ρ 2 ). (6) + 1 nπh. (7)

See Appendix-A for the details. In the similar way of Propositon 1. we obtain the optimum h : ( ) h 8 1/3 = πr (f, (8) ) n under the condition of n > 8(πR (f )) 1. The optimal following convergent rate of AMISE WC is given by (7) and (8) as n is sufficiently large: AMISE WC = O(n 2/3 ). (9) We should use ρ which minimize (6) as the practically optimal concentrating parameter, since (8) is larger than one if n 8(πR (f )) 1. ρ is given (7) and (8) as follows, { ρ = arg min AMISE WC [ ˆf( ; } ρ)]. (10) 0<ρ 1 We put ˆf ρ as ˆf h with h = (1 ρ 2 ). Theorem 3. (Asymptotic Normality) i.i.d. Let be Θ 1, Θ 2, Θ n f(θ), h = cn α and 0 h < 1 If α > 1/3, hold, then nh[ ˆfh (θ) f(θ)] d N(0, f(θ)/π), n. (11) See Appendix-B for the details. The optimal convergent rate O(n 2/3 ) of the AMISE of the WC kernel differs from the rate O(n 4/5 ) of the VM kernel, although Both the WC kernel and the VM kernel are the second sin-order kernel. The Cauchy kernel on the real line has the optimal convergent rate O(n 2/3 ) by Davis (1975). The rate O(n 2/3 ) of the WC kernel correspond to the rates of kernel family of order 0 such as the histogram and the Cauchy kernel. We consider that the corespondence between the characteristic functions of the WC kernel and the Cauchy kernel causes the correspondence between the two rates. Mardia and Jupp (1999, p48 (3.5.59)) said the characteristic function of the WC Kernel corresponds that of Cauchy kernel ϕ(p) = e a p. However, each of the inversion theorem is different. The inversion theorem of the WC kernel is Fourier series expansion, while that Cauchy kernel is Fourier transform. 5

4 Simulation The optimal concentration parameter depends on R (f ). The plug-in rule is the procedure to estimate the optimal concentration parameter with ˆR(f ) as an estimator for R (f ). The simplest rule among some plug-in rules is the procedure to assume that the true density f is the von Mises density f VM. The simplest plug-in rule uses ˆR(f VM ( ; ˆκ)) as the estimator for R (f ), where κ is the concentration parameter of f VM and ˆκ is the maximum likelihood estimator of κ. ˆR(f VM ( ; ˆκ)) corresponds to the following form: ˆR(f VM( ; ˆκ)) = ˆκ[3ˆκI 2(2ˆκ) + 2I 1 (2ˆκ)] 8πI 2 0 (ˆκ). (12) The normal procedure of plug-in rule is shown as, i) Estimate ˆκ from the sample and calculate ˆR(f VM ( ; ˆκ)) from ˆκ, ii) WC kernel: Estimate ρ which minimize (6) by substituting for R (f ) in (6), ˆR(f VM ( ; ˆκ)) iii) VM kernel : Estimate κ by substituting (5). ˆR(f VM ( ; ˆκ)) for R (f ) in Our simulation is given as, 1. The WC kernel: (a) Let a true density be a mixture of the von Mises density: f(θ) MVM = 1 2 f VM1(θ; µ 1, κ 1 ) + 1 2 f VM2(θ; µ 2, κ 2 ), (13) where let be µ 1 = π/2, µ 2 = 3π/2 and κ 1 = κ 2 = κ. Generate the random sample of the size n distributed as (13), (b) Estimate ρ by the plug-in rule, (c) Let ISE = 2π 0 { ˆf(θ; ρ) f(θ)} 2 dθ, Let ISE( ˆf( ; ρ)) be the numerical integration of ISE. Calculate ISE( ˆf( ; ρ), (d) Repeat (a)-(c) 1000 times and compute MISE( ˆf( ; ρ)) = 1000 i=1 ISE i( ˆf( ; ρ))/1000. 2. The VM kernel : With the same procedures as (a) - (d) of the WC kernel, compute MISE( ˆf( ; κ )), 6

Table 1: MISE( ˆf κ ) MISE( ˆf ρ ). n is sample size, κ is represented as concentrate parameter of true density f. n = 10 50 100 200 500 1000 κ = 0.3 0.011 0.002 0.001 0 0 0 0.5 0.012 0.002 0.001 0 0 0 0.7 0.011 0.002 0.001 0 0 0 1 0.012 0.001 0 0 0 0 2 0.007-0.004-0.005-0.005-0.005-0.005 5-0.017-0.023-0.025-0.024-0.024-0.023 10-0.032-0.036-0.037-0.037-0.037-0.037 15-0.037-0.044-0.043-0.042-0.042-0.042 20-0.039-0.045-0.045-0.045-0.048-0.046 3. Calculate MISE( ˆf( ; κ )) MISE( ˆf( ; ρ )), where it round off decimal point forth place. The result of Table 1 indicates that the WC kernel is superior to the VM kernel under that n is not sufficiently large (n 100) and f has the multimodal and/or the heavy tail. This difference between the WC kernel and the VM kernel tends to become smaller as n become larger. 5 Conclusion This paper shows that the WC kernel has the convergent rate O(n 2/3 ) of the AMISE and holds the asymptotic normality. However, the convergent rate of the VM kernel is O(n 4/5 ). In other words, the rate of the WC kernel and that of the VM kernel do not equal in spite of the same second sin-order kernel. The convergent rate of the AMISE of the WC kernel corresponds to the rate of Cauchy kernel, since the characteristic function of the WC kernel corresponds to the one of Cauchy kernel. The WC kernel is not better than the VM kernel with respect to the AMISE convergent rate. However, the result of our simulation shows that the WC kernel is superior to the VM kernel if a true density f has the multimodal and/or the heavy tail. 7

6 Appendix 6.1 Appendix-A proof of Theorem 2. The WC kernel has γ j (ρ) = ρ j and η 2 (K ρ ) = (1 ρ 2 )/2. Since for small values of u, sin(u) u, we use the expansion f(θ + u) f(θ) + f (θ) sin(u) + f (θ) sin 2 (u)/2 + O(sin 3 (u)). E f [K ρ (θ Y )] = K ρ (θ y)f(y)dy = K ρ (u)f(θ + u)du = K ρ (u)[f(θ) + f (θ) sin(u) + f (θ) sin 2 (u)/2 + O(sin 3 (u))] = f(θ) + 1 2 η 2(K ρ )f (θ) + o(1) = f(θ) + 1 4 (1 ρ2 )f (θ) + o(1). (14) R(K ρ ) = 1 + 2 j=1 (ρj ) 2 2π = 1 + 2ρ2 1 ρ 2 2π It follows from (15) that = 1 + ρ2 2π(1 ρ 2 ) 1 = π(1 ρ 2 ) 1 2π. (15) n 1 Var f [K ρ (θ Y )] = n 1 { E f [Kρ(θ 2 Y )] E f [K ρ (θ Y )] 2} [ = n 1 1 π(1 ρ 2 ) 1 ] {f(θ) + o(1)} n 1 [f(θ) + o(1)] 2π f(θ) = πn(1 ρ 2 ) + o(n 1 (1 ρ 2 ) 1 ). (16) 8

6.2 Appendix-B proof of Theorem 3. Put h = (1 ρ 2 ), Write (14) and (16) as, E f [K h (θ Θ 1 )] f(θ) + 1 4 hf (θ) + o(1), (17) Var f [K h (θ Θ 1 )] = f(θ) πh + o(h 1 ). (18) nh[ ˆfh (θ) f(θ)] = nh{ ˆf h (θ) E[ ˆf h (θ)]} + nhbias[ ˆf h (θ)]. (19) The first term of (19) is equal to, nh{ ˆfh (θ) E f [ ˆf h (θ)]} = { n n 1 } n h 1/2 K h (θ Θ i ) E f [h 1/2 K h (θ Θ 1 )]. i=1 (20) E f [h 1/2 K h (θ Θ 1 )] = h 1/2 E f [K h (θ Θ 1 )] [ ] h 1/2 f(θ) + f ( ) (θ) h, (21) 4 it is shown that 0 E[h 1/2 K h (θ Θ 1 )] < from (21). From (18), We obtain the following form: Var f [h 1/2 K h (θ Θ 1 )] = hvar[k h (θ Θ 1 )] [ ] f(θ) = h πh + o(h 1 ) = f(θ) π + o(1) f(θ) π, (22) it is shown that Var f [h 1/2 K h (θ Θ 1 )] < from (22). Since (20) satisfies the condition of Lindeberg (Feller (1968, p.244)) from (21) and (22), it is given as the follows, nh{ ˆfh (θ) E f [ ˆf h (θ)]} d N(0, f(θ)/π), n. (23) 9

The order of the second term of (19) is equal to, nhbias[ ˆfh ] = nho(h) with h = cn α, we obtain the follows, nh 3 n 1/2 n 3α/2 = O( nh 3 ). (24) = n (1 3α)/2. When α > 1/3 is chosen, then (24) is given as the following form: nhbias[ ˆfh (θ)] = O( nh 3 ) = o(1). (25) For α > 1/3 and as n, Theorem 3 completes the proof from (23) and (25). References [1] Davis, K.B. (1975). Mean Square Error Properties of Density Estimates. The Annals of Statistics 3, 1025-1030. [2] Di Marzio, M., Panzera, A. and Taylor, C. C. (2009). Local polynomial regression for circular predictors. Statistics & Probability Letters 79, 2066-2075. [3] Di Marzio, M., Panzera, A. and Taylor, C. C. (2011). Kernel density estimation on the torus. Journal of Statistical Planning and Inference 141, 2156-2173. [4] Feller, W. (1968). An Introduction to Probability Theory and Its Applications. I (Third ed.). John Wiley & Sons, Inc. pp.244. [5] Garcia-Portugues, E. (2013). Exact risk improvement of bandwidth selectors for kernel density estimation with directional data. Electronic Journal of Statistics 7, 1655-1685. [6] Hall, P., Watson, G. S. and Cabrera, J. (1987). Kernel density estimation with spherical data. Biometrika, 74, 751-762. [7] Mardia, K. V. and Jupp, P. E. (1999). Directional Statistics. John Wiley & Sons, Ltd, London. 10

[8] Scott, D. W. (1992) Multivariate Density Estimation: Theory, Practice, and Visualization, Wiley. [9] Taylor, C. C. (2008). Automatic bandwidth selection for circular density estimation. Computational Statistics & Data Analysis 52, 3493-3500. 11