Estimation of the functional Weibulltail coefficient


 Ethan Rogers
 5 months ago
 Views:
Transcription
1 1/ 29 Estimation of the functional Weibulltail coefficient Stéphane Girard Inria Grenoble RhôneAlpes & LJK, France June 2016 joint work with Laurent Gardes, Université de Strasbourg.
2 2/ 29 1 Weibulltail distributions 2 Estimation of extreme conditional quantiles 3 Estimation of the functional Weibulltail coefficient 4 Illustration on simulations
3 Notations 3/ 29 Let (X i, Y i ), i = 1,..., n, be iid copies of a random pair (X, Y ) E R where E is an arbitrary space associated with a semimetric (or pseudometric) d, see [3], Definition 3.2. The conditional survival function of Y given X = x E is denoted by F (y x) := P(Y > y X = x) and is supposed to be continuous and strictly decreasing with respect to y. The associated conditional cumulative hazard function is defined by H(y x) := log F (y x). The conditional quantile is given by q(α x) := F 1 (α x) = H 1 (log(1/α) x), for all α (0, 1).
4 Conditional Weibulltail distributions 4/ 29 (A.1) H(. x) is supposed to be regularly varying with index 1/θ(x), i.e. H(ty x) lim y H(y x) = t1/θ(x), for all t > 0. In this situation, θ(.) is a positive function of the covariate x E referred to as the functional Weibull tailcoefficient. From [1], H 1 (. x) is regularly varying with index θ(x). Thus, there exists a slowlyvarying function l(. x) such that q(e y x) = H 1 (y x) = y θ(x) l(y x). Recall that the slowlyvarying function l(. x) is such that for all t > 0. l(ty x) lim = 1, (1) y l(y x)
5 Additional assumptions 5/ 29 (A.2) l(. x) is a normalised slowlyvarying function. In such a case, the Karamata representation (see [1]) of the slowlyvarying function can be written as { y } ε(u x) l(y x) = c(x) exp du, u where c(x) > 0 and ε(u x) 0 as u. The function ε(. x) plays an important role in extremevalue theory since it drives the speed of convergence in (1) and more generally the bias of extremevalue estimators. Therefore, it may be of interest to specify how it converges to 0: (A.3) ε(. x) is regularly varying with index ρ(x) 0. 1
6 Examples of (unconditional) Weibulltail distributions 6/ 29 Distribution θ l(y) ε(y) ρ Gaussian 1/2 N (µ, σ 2 ) σ 2σ 2 2 log y y + O(1/y) 1 log y 4 y 1 Gamma 1 Γ(α 1, λ) 1 β + α 1 β log y y + O(1/y) (1 α) log y y 1 Weibull 1/α λ 0 W(α, λ)
7 Two goals 7/ 29 Starting from an iid sample (X i, Y i ), i = 1,..., n, Estimate the extreme conditional quantiles defined as P(Y > q(α n, x) X = x) = α n, when α n 0 as n. Estimate the functional Weibulltail coefficient θ(x).
8 8/ 29 1 Weibulltail distributions 2 Estimation of extreme conditional quantiles 3 Estimation of the functional Weibulltail coefficient 4 Illustration on simulations
9 Principle 9/ 29 First, F (y x) is estimated by a kernel method. For all (x, y) E R, let ˆ F n (y x) = / n n K(d(x, X i )/h n )I{Y i > y} K(d(x, X i )/h n ), i=1 i=1 where h n is a nonrandom sequence such that h n 0 as n, K is assumed to have a support included in [0, 1] such that C 1 K(t) C 2 for all t [0, 1] and 0 < C 1 < C 2 <. It is assumed without loss of generality that K integrates to one. K is called a type I kernel, see [3], Definition 4.1. Second, q(α x) is estimated via the generalized inverse of ˆ Fn (. x): for all α (0, 1). ˆq n (α x) = ˆ F n (α x) = inf{y, ˆ F n (y x) α},
10 10/ 29 Notations: B(x, h n ) the ball of center x and radius h n, ϕ x (h n ) := P(X B(x, h n )) the small ball probability of X, µ (τ) x (h n ) := E{K τ (d(x, X )/h n )} the τth moment, Λ n (x) = (nα n (µ (1) x (h n )) 2 /µ (2) x (h n )) 1/2. It is easily shown that for all τ > 0 0 < C τ 1 ϕ x (h n ) µ (τ) x (h n ) C τ 2 ϕ x (h n ), and thus Λ n (x) is of order (nα n ϕ x (h n )) 1/2.
11 Regularity assumption 11/ 29 Since the considered estimator involves a smoothing in the x direction, it is necessary to assess the regularity of the conditional survival function with respect to x. To this end, the oscillations are controlled by F (x, α, ζ, h) := sup F (q(β x) x ) (x,β) B(x,h) [α,ζ] F (q(β x) x) 1 = sup F (q(β x) x ) 1 β, where (α, ζ) (0, 1) 2. (x,β) B(x,h) [α,ζ]
12 Asymptotic normality 12/ 29 Theorem 1 Suppose (A.1), (A.2) hold. Let 0 < τ J < < τ 1 1 where J is a positive integer. x E such that ϕ x (h n ) > 0 where h n 0 as n. If α n 0 and there exists η > 0 such that nϕ x (h n )α n, nϕ x (h n )α n ( F ) 2 (x, (1 η)τ J α n, (1 + η)α n, h n ) 0, then, the random vector { log(1/α n )Λ 1 n (x) ( )} ˆqn (τ j α n x) q(τ j α n x) 1 j=1,...,j is asymptotically Gaussian, centered, with covariance matrix θ 2 (x)σ where Σ j,j = τ 1 j j for (j, j ) {1,..., J} 2.
13 Conditions on the sequences 13/ 29 nϕ x (h n )α n : Necessary and sufficient condition for the almost sure presence of at least one point in the region B(x, h n ) [q(α n x), + ) of E R. nϕ x (h n )α n ( F ) 2 (x, (1 η)τ J α n, (1 + η)α n, h n ) 0: The biais induced by the smoothing is negligible compared to the standarddeviation.
14 14/ 29 1 Weibulltail distributions 2 Estimation of extreme conditional quantiles 3 Estimation of the functional Weibulltail coefficient 4 Illustration on simulations
15 Principle 15/ 29 We propose a family of estimators of θ(x) based on some properties of the logspacings of the conditional quantiles. Recall that q(e y x) = H 1 (y x) = y θ(x) l(y x). Let α (0, 1) small enough and τ (0, 1), log q(τα x) log q(α x) = log H 1 ( log(τα) x) log H 1 ( log(α) x) ( ) l( log(τα) x) = θ(x)(log 2 (τα) log 2 (α)) + log l( log(α) x) θ(x)(log 2 (τα) log 2 (α)) θ(x) log(1/τ) log(1/α), where log 2 (.) := log log(1/.),
16 16/ 29 Hence, for a decreasing sequence 0 < τ J < < τ 1 1, where J is a positive integer, and for all (twice differentiable) function φ : R J R satisfying the shift and location invariance condition φ(ηz) = ηφ(z), φ(ηu + z) = φ(z), for all η R \ {0}, z R J and where u = (1,..., 1) t R J, one has: θ(x) log(1/α) φ(log q(τ 1α x),..., log q(τ J α x)). φ(log(1/τ 1 ),..., log(1/τ J )) Thus, the estimation of θ(x) relies on the estimation of conditional quantiles q(. x): ˆθ n (x) = log(1/α n ) φ(log ˆq n(τ 1 α n x),..., log ˆq n (τ J α n x)), φ(log(1/τ 1 ),..., log(1/τ J )) with α n 0 as n.
17 Asymptotic normality 17/ 29 Theorem 2 Ssuppose (A.1) (A.3) hold. Let x E such that ϕ x (h n ) > 0 where h n 0 as n. If α n 0, nϕx (h n )α n ε(log(1/α n ) x) λ R and there exists η > 0 such that nϕ x (h n )α n and nϕx (h n )α n { F (x, (1 η)τ J α n, (1 + η)α n, h n ) 1/log(1/α n )} 0, then, Λ 1 d n (x)(ˆθ n (x) θ(x)) N (µ φ, θ 2 (x)v φ ) where µ φ = λv t log φ(v), V φ = ( log φ(v)) t Σ ( log φ(v)) and v = (log(1/τ 1 ),..., log(1/τ J )) t do not depend on (X, Y ) distribution.
18 Corollary Suppose (A.1) (A.3) hold. Let x E such that ϕ x (h n ) > 0 and yε(y x) as y. Assume there exist L θ, L c et L ε such that 1 θ(x) 1 θ(x ) L θ d(x, x ), log c(x) log c(x ) L c d(x, x ), sup ε(u x) ε(u x ) L ε d(x, x ), u [1,ȳ n(x)] where ȳ n (x) := sup{h(q(α n x) x ), x B(x, h n )}. Suppose ϕ 1 x (1/y)(log y) 1+ξ ρ(x) 0 (2) for some ξ > 0 as y. Then, letting λ > 0, α n = n 1+ξ and h n = ϕ 1 x ( λ(1 ξ) 2ρ(x) n ξ (ε(log n x)) 2), Theorem 2 yields Λ 1 d n (x)(ˆθ n (x) θ(x)) N (µ φ, θ 2 (x)v φ ). The key assumption (2) holds in the finite dimensional setting or for fractaltype and some exponentialtype processes, see [3], Chapter / 29
19 Example 19/ 29 ( J ) 1/p Let us focus on the functions φ (p) (z) = j=2 β j(z j z 1 ) p, where z = (z 1,..., z J ) t R J, p N \ {0} and for all j {2,..., J}, β j R. The corresponding estimator of θ writes: ( J ˆθ n (p) j=2 (x) = log(1/α n ) β j [log ˆq n (τ j α n x) log ˆq n (τ 1 α n x)] p ) 1/p J j=2 β. j[log(τ 1 /τ j )] p As a consequence of Theorem 2, the associated asymptotic mean and variance of (x) are given for an arbitrary vector β by µ = λ and ˆθ (p) n V (p) = (η(p) ) t AΣA t η (p) (η (p) ) t Avv t A t η (p), where A is a given matrix and η (p) = (β j (v j v 1 ), j = 2,..., J) t. The asymptotic bias µ does not depend neither on p and nor on the weights {β j, j = 2,..., J}. It is possible to minimize V (p) with respect to η (p).
20 20/ 29 Proposition (p) The asymptotic variance of ˆθ n (x) is minimal for η (p) proportional to η opt = (AΣA t ) 1 Av and is given by and is independent of p. V opt = 1 (Av) t (AΣA t ) 1 Av, Moreover, for a fixed value of J, it is possible to minimize numerically the optimal variance V opt with respect to parameters 0 < τ J < < τ 1 1. The resulting values of V opt are displayed in the table below: J V opt τ 1 τ 2 τ 3 τ 4 τ
21 21/ 29 1 Weibulltail distributions 2 Estimation of extreme conditional quantiles 3 Estimation of the functional Weibulltail coefficient 4 Illustration on simulations
22 Framework 22/ 29 E is a subset of L 2 ([0, 1]) made of trigonometric functions ψ z : [0, 1] [0, 1], ψ z (t) = cos(2πzt) with different periods indexed by z [1/10, 1/2]. Two semimetrics are considered: d 1 (ψ z, ψ z ) = ψz 2 2 ψ z 2 2, d 2 (ψ z, ψ z ) = ψ z ψ z 2, for all (z, z ) [1/10, 1/2] 2, where ψ z 2 2 = 1 0 ψ 2 z (t)dt = 1 2 ( 1 + sin(4πz) ). 4πz The semimetric d 2 is built on the classical L 2 norm while d 1 measures some spacing between the periods of the trigonometric functions.
23 Simulated data 23/ 29 N = 100 copies of a n = 1000 samples from a random pair (X, Y ) defined as follows: The covariate X is chosen randomly on E by considering X = ψ Z where Z is a uniform random variable on [1/10, 1/2]. For a given function x E, the generalized inverse of the conditional hazard function H(. x) is given for y 0 by with H (y x) = y θ(x) ( 1 γy ρ(x)), θ(x) = (18/5 x /50) 1 5/18, ρ(x) = 50/(60 x ) 5/2, γ = 1/10.
24 Four simulated random functions X (.) 24/ 29
25 Estimators 25/ 29 The previous estimator with optimal weights is used. Here, we limit ourselves to J = 5 and p {1, 3}. A modified biquadratic kernel is adopted (type I kernel): K(u) = 10 ( 3 ( 1 u 2 ) ) I{ u 1} h n and α n are selected simultaneously thanks to a datadriven procedure. For a fixed x, let {Z 1 (x, h n ),..., Z mn (x, h n )} be the m n random values Y i for which X i B(x, h n ). The idea is to select the sequences h n and α n such that the rescaled logspacings i log(m n /i)(log Z mn i+1,m n (x, h n ) log Z mn i,m n (x, h n )), i = 1,..., m n α n, are approximately Exp(θ(x)) distributed. The optimal sequences are obtained by minimizing a KolmogorovSmirnov distance. Comparison with the nonconditional estimator proposed in [2]: ˆθ NCE n = kn i=1 (log Y n i+1,n log Y n kn+1,n) ( log 2 (n/i) log 2 (n/k n ) ). kn i=1
26 Influence of the exponent p 26/ 29 Let e i,l,p be the relative error obtained on the ith replication using the semimetric d l and the estimator ˆθ (p). Left: histogram of e,1,3 e,1,1 (semimetric d 1 ), right: histogram of e,2,3 e,2,1 (semimetric d 2 ). Both histograms are nearly centered, small influence of p.
27 Influence of the semimetric 27/ 29 Recall that e i,l,p is the relative error obtained on the ith replication using the semimetric d l and the estimator ˆθ (p). Left: histogram of e,2,1 e,1,1 (p = 1), right: histogram of e,2,3 e,1,3 (p = 3). Both histograms are skewed to the right, the semimetric d 1 yields better result than d 2.
28 Comparison with the nonconditional estimator 28/ 29 Recall that e i,l,p is the relative error obtained on the ith replication using the semimetric d l and the estimator ˆθ (p). We moreover denote by e i the relative error obtained on the ith replication using the nonconditional NCE estimator ˆθ n. Left: histogram of e e,1,1 (p = 1), right: histogram of e e,1,3 (p = 3). Both histograms are skewed to the right, the conditional estimator yields better results than the unconditional one.
29 References 29/ 29 1 Bingham, N.H., Goldie, C.M., Teugels, J.L. (1987) Regular Variation, Cambridge University Press. 2 Daouia, A., Gardes, L., Girard, S. (2013). On kernel smoothing for extremal quantile regression. Bernoulli, 19, Ferraty, F., Vieu, P. (2006). Nonparametric functional data analysis, Springer. 4 Gardes, L., Girard, S. (2006). Comparison of Weibull tailcoefficient estimators. REVSTAT  Statistical Journal, 4(2), Gardes, L., Girard, S. (2012). Functional kernel estimators of large conditional quantiles. Electronic Journal of Statistics, 6, Gardes, L., Girard, S. (2016). On the estimation of the functional Weibull tailcoefficient. Journal of Multivariate Analysis, 146, Gardes, L., Girard, S., Lekina, A. (2010). Functional nonparametric estimation of conditional extreme quantiles. Journal of Multivariate Analysis, 101,