Kernel density estimation for heavy-tailed distributions using the Champernowne transformation Buch-Larsen, Nielsen, Guillen, Bolance, Kernel density estimation for heavy-tailed distributions using the Champernowne transformation, Statistics, Vol. 39, No. 6, December 2005, 503-518 Tine Buch-Kromann
Limitations of the kernel density estimator Simulated lognormal data set Kernel density estimator Density 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 True, (lognormal) Kernel density est. 0 5 10 15 20 X
Limitations of the kernel density estimator Simulated lognormal data set Kernel density estimator Density 0.00 0.02 0.04 0.06 0.08 0.10 True, (lognormal) Kernel density est. 0 5 10 15 20 X
Motivation Combine the advantages of non-parametric and parametric statistics. Non-parametric statistics: + No assumptions of the shape of the distribution. The estimation of the distribution is uncertain when few data. Parametric statistics: + The estimated distribution converge faster to the true distribution than the non-parametric distribution. The distribution assumption might be wrong.
Characteristics Non par. model Method Par. model Few data A lot of data When data are few, the method should be close to a parametric model, When the amount of data increases, the method should become more non-parametric.
The Champernowne distribution The Champernowne distribution: The Champernowne cdf is defined for x 0 and has the form T α,m (x) = x α x α + M α x R + with parameters α > 0 and M > 0 and density t α,m (x) = αmα x α 1 (x α + M α ) 2 x R +
The Champernowne distribution The Champernowne distribution converges to a Pareto distribution: t α,m (x) αmα x α+1 as x Notice that the Champernowne distribution is defined on [0, ) in contrast to the Pareto distribution ( ) M α G(x) = 1 x with density g(x) = αmα, α > 0, M > 0 x α+1 which is only defined for x [M, ). This makes the Pareto distribution inappropriate as an underlying parametric distribution.
The Champernowne distribution The tail of the Champernowne distribution is advantageous because it is heavy. However the shape near 0 is unfortunately quite inflexible: α < 1 1 t α,m (0) = M α = 1 0 α > 1
The Champernowne distribution The effect of the parameter α For α < α : T α,m (x) > T α,m(x) T α,m (x) = T α,m(x) T α,m (x) < T α,m(x) if 0 x < M if x = M if M < x < α is not at scale parameter, but it has some properties which are similar to a scale parameter s: α > 1: Increasing α results in a more steep derivation of the cdf in the value M. (Scale parameter effect: To narrow the density. Moreover, the mode moves to the right, and the tail becomes ligher.) α < 1, increasing α results in a less steep shape of the density near 0.
The Champernowne distribution The effect of the parameter M For M < M : T α,m (x) > T α,m (x) x R + Increasing M results in decreasing cdf. α > 1: The mode of the density moves to the right and becomes lower.
The Modified Champernowne distribution The Champernowne heavy tailed distribution, but the shape near 0 is inflexible and depends on α which also determines the tail. The modifiend Champernowne cdf is defined for x 0 and has the form T α,m,c (x) = (x + c) α c α (x + c) α + (M + c) α 2c α x R + with parameters α > 0, M > 0 and c 0 and density t α,m,c (x) = α(x + c)α 1 ((M + c) α c α ) ((x + c) α + (M + c) α 2c α ) 2 x R +
The Modified Champernowne distribution The modified Champernowne distribution converges to Pareto distribution: t α,m,c (x) α ( ((M + c) α c α ) 1/α) α x α+1 x Note that the modified Champernowne distribution is defined for on [0, ) in contrast to the Pareto distribution ( ) α ((M + c) α c α ) 1/α G(x) = 1 x with density g(x) = α((m + c)α c α ) x α+1 which is only defined for x [((M + c) α c α ) 1/α, ).
The Modified Champernowne distribution The effect of the parameter c For c < c and α > 1 T α,m,c (x) < T α,m,c (x) T α,m,c (x) = T α,m,c (x) T α,m,c (x) > T α,m,c (x) if 0 x < M if x = M if M < x < for α < 1 T α,m,c (x) > T α,m,c (x) T α,m,c (x) = T α,m,c (x) T α,m,c (x) < T α,m,c (x) if 0 x < M if x = M if M < x <
The Modified Champernowne distribution When α 1, then c has some scale parameter properties : c changes the density in the tail. α < 1 increasing c results in lighter tails. Opposite when α > 1. c changes the density in 0. Positive c s give a positive finite density in 0. c moves the mode. α > 1 increasing c shift the mode to the left. When α = 1: c has no effect.
Parameter estimation Almost maximum-likelihood parameters Notice: T α,m,c (M) = 0.5 Therefore estimate M as the empirical median Sub-optimal parameters (but close to the optimal parameters), Simplify the computations, Robust estimator, especially for heavy-tailed distributions. Estimate (α, c) by maximizing the log likelihood function: l = N log α + N log((m + c) α c α ) + (α 1) 2 N log((x i + c) α + (M + c) α 2c α ) i=1 For fixed M the likelihood function is concave and has a maximum. N log(x i + c) i=1
The semiparametric transformation kernel density estimator Step 1: Original data Density 0.0 0.2 0.4 0.6 0.8 True Mod. Champ. 0 5 10 15 20 Data set: (X 1,..., X n ) with unknown cdf F (x) and density f (x). Parameter estimation of the mod. Champ.: Estimate (α, M, c) of the mod. Champ.: T (x). X
The semiparametric transformation kernel density estimator Step 2: Transformed data Density 0.0 0.5 1.0 1.5 Transformation: Transform (X 1,..., X n ) into (Z 1,..., Z n ) using Z i = T (X i ). 0.0 0.2 0.4 0.6 0.8 1.0 Z
The semiparametric transformation kernel density estimator Step 3: Density 0.0 0.5 1.0 1.5 Transformed data Ker. den. (no boundary corr.) 0.0 0.2 0.4 0.6 0.8 1.0 Z Correction: Compute a correction estimator by means of a kernel density estimator ˆf t (z) = 1 n n K b (z Z i ) i=1 where K b (z) is the Epanechnikov kernel function and b = 0.2 is the bandwidth.
The semiparametric transformation kernel density estimator Step 3: Transformed data Density 0.0 0.5 1.0 1.5 Ker. den. (no boundary corr.) Ker. den. (with boundary corr.) 0.0 0.2 0.4 0.6 0.8 1.0 Correction: Compute a correction estimator by means of a kernel density estimator ˆf t (z) = 1 n k(z) n K b (z Z i ) i=1 where k ( z) is the boundary correction. Z ĝ(z): The final estimator on the tranformed axis.
The semiparametric transformation kernel density estimator Step 4: Density 0.0 0.2 0.4 0.6 0.8 Original data True Mod. Champ. KMCE Inverse tranformation: The final estimator of (X 1,..., X n ) on the original axis is obtained by an inverse transformation, such that ˆf t (T (x)) ˆf (x) = (T 1 ) (T (x)) 0 2 4 6 8 10 X Summarized formula: ˆf (x) = 1 n k(t (x)) n i=1 K b (T (x) T (X i )) T (x)
Asymptotic theory Let X 1,..., X n be iid var. with density f. Let ˆf (x) be the transf. kernel density est. of f ˆf (x) = 1 n n K b (T (x) T (X i ))T (x) i=1 where T ( ) is the transf. fct. Then bias and variance of ˆf (x) are given by E[ˆf (x)] f (x) = 1 (( ) f (x) 2 µ 2(K)b 2 1 T (X ) V[ˆf (x)] = 1 nb R(K)T (x)f (x) + o T (x) ( ) 1 nb ) + o(b 2 ) as n, where µ 2 (K) = u 2 K(u) du and R(K) = K 2 (u) du
Simulation study Simulation study setup: Simulated from four distributions: Lognormal Mixture of lognormal and Pareto Weibull Truncated logitic Number of observations: n = {50, 100, 500, 1000} 2000 repetition. Epanechnikov kernel function. Bandwidth selection: Silverman s Rule-of-Thumb.
Simulation study Distributions: Lognormal Lognormal(0.7) Pareto(0.3) 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 0 2 4 6 8 10 Lognormal(0.3) Pareto(0.7) Weibull 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 0 2 4 6 8 10 Normal Truncated logistic 0.0 0.4 0.8 0.0 0.4 0.8 0 2 4 6 8 10 0 2 4 6 8 10
Simulation study Error measures: L 1 norm L 2 norm L 1 = 0 ˆf (x) f (x) dx L 2 = 0 ) 2 (ˆf (x) f (x) dx L 1 and L 2 measures the errors near 0 and in the tail equally.
Simulation study WISE ) 2 WISE = (ˆf (x) f (x) x 2 dx 0 E (mean excess fuctions) E = = (ê(x) e(x)) 2 f (x) dx 0 ( 2 u(f (u) ˆf (u)) du) f (x) dx 0 x WISE and E is error measures that emphasizes the tail of the distribution.
Simulation study Benchmark estimators: BGN: Transf. kernel density est. with the shifted power transf. { (x + λ1 ) y = λ 2 λ 2 0 ln(x + λ 1 ) λ 2 = 0 Aim: Transformed data symmetric. CHL: Transf. kernel density est. with the Mobius-like transf. y = x α R α x α + R α Champernowne transformation with another parameter estimation method.
Simulation study Results:
Application: Automobile claims Spanish automobile accidents: Spanish automobile accidents: Bodily injury from 1997; Data divided into two age groups: young drives (less than 30 years) and old drives (above 30 years); Young: 1061 obs. in the inverval [1;126000] with mean value 402.7, Old: 4061 obs. in the inverval [1;17000] with mean value 243.1,
Application: Automobile claims Spanish automobile accidents: Young: Estimated mod. Champ. par.: ˆα 1 = 1.116, ˆM1 = 66, ĉ 1 = 0.000 Old: Estimated mod. Champ. par.: ˆα 2 = 1.145, ˆM2 = 68, ĉ 2 = 0.000 Bandwidths: b 1 = 0.172 and b 2 = 0.134 Notice that ˆα 1 < ˆα 2, ie. young drivers has a heavier tail.
Application: Automobile claims Spanish automobile claims on the transformed axis <30 years old >30 years old 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Kernel est. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Kernel est. 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Application: Automobile claims Spanish automobile claims and the resulting KSCE estimate 0.000 0.005 0.010 0.015 0.000 0.005 0.010 0.015 Small claims, <30 years old 0 500 1000 1500 2000 Small claims, >30 years old 0 500 1000 1500 2000 0 e+00 1 e 05 2 e 05 3 e 05 4 e 05 0 e+00 1 e 05 2 e 05 3 e 05 4 e 05 Moderately size claims, <30 years old 2000 6000 10000 14000 Moderately size claims, >30 years old 2000 6000 10000 14000 0 e+00 1 e 07 2 e 07 3 e 07 4 e 07 0 e+00 1 e 07 2 e 07 3 e 07 4 e 07 Extreme claims, <30 years old 20000 60000 100000 Extreme claims, >30 years old 20000 60000 100000
Application: Automobile claims 0.9 1.0 1.1 1.2 1.3 1.4 Quotient between the KSCE estimates of <30 years old and >30 years old Small claims KSCEyoung KSCEold 0 500 1500 1.02 1.04 1.06 1.08 1.10 1.12 1.14 Moderately size claims KSCEyoung KSCEold 2000 8000 14000 1.12 1.14 1.16 1.18 1.20 1.22 Extreme claims KSCEyoung KSCEold 20000 80000 Conclusion: The young drivers has a heavier tail than the old drivers.
Application: Employer s liability Employer s liability claims: 2522 claims (Irish insurance company), Estimated mod. Champ. par.: ˆα = 1.955, ˆM = 32379, ĉ = 64759 Estimated Champ. par.(c = 0): ˆα = 0.954, ˆM = 32379 What is the effect of not including c???
Application: Automobile claims
Application: Automobile claims Conclusion: Nearly identical for small and moderate claims whereas the KCE overestimates the tail. This shows the importance of the modified Champernowne distribution.
Conclusion Estimating loss distributions, Introduced the semiparametric transformation kernel density estimator: Based on a parametric estimator that is subsequently corrected with a nonparametric estimator Lot of information close to a nonparametric estimator, Little information close to a parametric estimator. Introduced the Champernowne distribution (heavy-tailed), Generalized to the modified Champernowne distribution (flexible and heavy-tailed)