Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials by exploitig kowledge o the copula Joh H.J. Eimahl, Ramo va de Akker Departmet of Ecoometrics & OR ad CetER, Tilburg Uiversity, PO Box 90153, NL-5000 LE Tilburg, The Netherlads a r t i c l e i f o a b s t r a c t Article history: Received 9 November 2010 Available olie 6 May 2011 AMS subject classificatios: 62G05 62G20 Keywords: Copula Estimatio of margials Superefficiet estimatio We cosider the problem of estimatig the margials i the case where there is kowledge o the copula. If the copula is smooth, it is kow that it is possible to improve o the empirical distributio fuctios: optimal estimators still have a rate of covergece 1/2, but a smaller asymptotic variace. I this paper we show that for o-smooth copulas it is sometimes possible to costruct superefficiet estimators of the margials: we costruct both a copula ad, exploitig the iformatio our copula provides, estimators of the margials with the rate of covergece log /. 2011 Elsevier Ic. All rights reserved. 1. Itroductio Suppose oe observes a radom sample from a bivariate distributio. By Sklar s theorem (see, e.g., [5]) the distributio fuctio is determied by its copula ad the margial distributios. I semiparametric copula models, it is assumed that the copula depeds o a Euclidea parameter ad, apart from (absolute) cotiuity, o assumptios are imposed o the margials. The study of efficiet estimatio for semiparametric copula models origiated i [3,2], which focused o efficiet estimatio of the copula parameter. [3] also oted that exploitig the kowledge o the copula may help to improve o the margial empirical distributio fuctios. Followig the setup i [3], [1] ad [7, Chapter 5] provide efficiet estimators of the margials, icorporatig the iformatio the copula provides, with the stadard rate of covergece 1/2 ad a limitig distributio that has less spread tha the limitig distributio of the empirical distributio fuctios. I those models smoothess assumptios o the copula are imposed. This paper shows that i the absece of smoothess, superefficiet estimatio of the margials is possible. To this ed, we costruct, i Sectio 2, a specific copula. I Sectio 3, we costruct a estimator of the margials that exploits the iformatio our copula provides, ad show that its rate of covergece is log /. Our copula is a best copula i the sese that log / is the best possible rate of covergece. 2. The copula I this sectio we defie our copula. To this ed we itroduce idepedet Beroulli variables (B k ) k N with success probability 1/2, ad defie Beroulli variables ( Bk ) k N by Bk = B k for k odd ad Bk = 1 B k for k eve. Usig these Beroulli sequeces, we itroduce the radom pair (U, V) by B k U = 2, ad V = B k k 2. k k=1 k=1 Correspodig author. E-mail addresses: j.h.j.eimahl@uvt.l (J.H.J. Eimahl), r.vdakker@uvt.l (R. va de Akker). 0047-259X/$ see frot matter 2011 Elsevier Ic. All rights reserved. doi:10.1016/j.jmva.2011.04.015

1316 J.H.J. Eimahl, R. va de Akker / Joural of Multivariate Aalysis 102 (2011) 1315 1319 Fig. 1. The support S k of the copula C k for k = 1, 2, 3. Hece V is a oe-to-oe fuctio of U ad the iverse is the same fuctio. Note that U ad V are uiformly distributed o [0, 1]. The joit distributio of (U, V) thus defies a copula, which we will deote by C. This copula ca be iterpreted as a ifiite shuffle of mi (see [4] for shuffles of mi). We provide a secod costructio of C that might be more ituitive ad allows us to itroduce otatio that is eeded i the remaider of the paper. Defie, for k N ad p, q = 1,...,, the sets A (k) p,q = [(p 1)2 k, p2 k ) [(q 1)2 k, q2 k ). Next, we defie, for k N ad p = 1,...,, idices q (k) (p) as follows. For k = 1 we set q (1) (1) = 1 ad q (1) (2) = 2. For k 2 we set, for p = 1,..., 1, q (k) 2q (2p) = (k 1) (p), k odd; 2q (k 1) ad q (k) 2q (2p 1) = (k 1) (p) 1, k odd; (p) 1, k eve, 2q (k 1) (p), k eve. Next we itroduce, for k N, S k = 2k p=1 A(k) p,q (k) ; see Fig. 1 for a illustratio. (p) Now we are able to itroduce, for k N, radom variables (U (k), V (k) ) that are uiformly distributed o S k (the desity equals ). Note that U (k) ad V (k) are uiformly distributed o [0, 1], so the law of (U (k), V (k) ) defies a copula C k. It is easy to see that C k C poitwise, as k. I particular, we have, for all k, m N ad all p, q = 1,...,, P{(U (k), V (k) ) A (k) } = p,q P{(U(k+m), V (k+m) ) A (k) } = P{(U, V) A(k) ad this probability equals 2 k i the case q = q (k) (p) ad 0 i the case q q (k) (p). 3. The estimator ad its limitig behavior p,q Available is a radom sample (X 1, Y 1 ),..., (X, Y ) from a bivariate distributio fuctio H which has C, as defied i Sectio 2, as copula. By Sklar s theorem we have, for all (x, y) R 2, H(x, y) = C(F(x), G(y)), where F ad G are the margial distributio fuctios of X 1 ad Y 1, respectively. The oly assumptio we impose o F ad G is that they belog to F, the set of cotiuous distributio fuctios o the real lie. We itroduce our estimator of F via its quatile fuctio. First, we defie (u) o the set {p2 k p = 0,...,, k 1}. Set (0) = X 1:, (1) = X :, ad defie (p2 k ) for k N ad p {1,..., 1} odd, recursively by (we adopt the usual covetio max = ): p = max p 1 where max p i I p X i k = max X i with i I p k j:x j j: X j I p = k i {1,..., } X i, Q p, max Y j < mi p 1,X i j:x j mi Y j > max p 1,X i j:x j p 1 Y j p+1 X i, Y j p+1 X i,,, p,q }, for k odd, for k eve, ] p + 1 p + 1,, X j X i,. Next, we exted the domai to [0, 1] by (u) = sup{ (p2 k ) p2 k u}. As estimator of F we take the distributio fuctio associated with. We deote this estimator by ˆF. Note that ˆF ca be writte as ˆF (x) = i=1 p i1 (,x] (X i ), where the probability masses p i oly deped o the observatios via the raks (R X, j RY ) j of (X j, Y j ), j = 1,...,. The followig theorem is the mai result of this paper.

J.H.J. Eimahl, R. va de Akker / Joural of Multivariate Aalysis 102 (2011) 1315 1319 1317 Fig. 2. Realizatio of ˆF (solid) ad F edf (dashed) for = 100, F = Φ (dotted), ad G F. Theorem 3.1. For F, G F we have ( deotes the sup-orm): 1 2 lim if log ˆF F lim sup log ˆF F 4 a.s. (1) The theorem demostrates that ˆF is superefficiet, i.e. the rate of covergece is log / istead of the usual rate 1/2. Remark 1. I the proof of Theorem 3.1 we exploit that ay estimator F of F that cocetrates o X 1,..., X satisfies lim if log F F 1 2 a.s. (2) This property implies that our estimator ˆF achieves the best attaiable rate of covergece log /. As the boud (2) does ot deped o the copula, our copula C ca be iterpreted as a best oe (i terms of rate of covergece). Remark 2. A atural questio is whether Z = (/ log )(ˆF (x) F(x)) x R, see as a elemet of l (R), weakly coverges (if so, the limit determies the limitig distributio of (/ log ) ˆF F by a applicatio of the cotiuous mappig theorem). The aswer is egative. For F = I, where I deotes the distributio fuctio of the Uiform[0, 1] distributio, the argumet is as follows (the geeral case easily follows from the uiform case). Sice ˆF cocetrates o the observatios ad, as we exploit i the proof of Theorem 3.1, the maximal spacig of i.i.d. draws from the Uiform[0, 1] distributio satisfies (/ log ) 1 a.s. we have, for ay η (0, 1), ϵ (0, 1/2) ad ay fiite partitio k i=1 T i of [0, 1], lim P sup i sup u,u T i which shows that Z is ot tight. log ˆF (u) ˆF (u ) (u u ) > ϵ = 1 > η, As a illustratio, Fig. 2 presets a realizatio of our estimator ad the empirical distributio fuctio F edf for = 100, F = Φ, the stadard ormal distributio fuctio, ad G F, ad Fig. 3 presets the cetered versios of the estimates. Proof of Theorem 3.1. Itroduce U i = F(X i ) ad V i = G(Y i ), ad recall that mootoe trasformatios of the margials U do ot chage the copula. Let ˆF deote the distributio fuctio resultig from computig ˆF from (U i, V i ) i=1 istead of (X i, Y i ) i=1. As U ˆF (x) = ˆF (F(x)) a.s. we have ˆF U F = ˆF I a.s., which shows that it suffices to prove (1) for F = G = I. To stress that we cosider uiform margials we deote the observatios by (U i, V i ) i the remaider of the proof. As the probability of a tie i (U i ) i=1 or (V i) i=1 equals zero, we throughout work o the evet that there are o ties. Let = max i=1,...,+1 U i: U i 1:, with U 0: = 0 ad U +1: = 1, deote the maximal spacig of U 1,..., U. Observe that ay estimator F of I of the form F (u) = i=1 p i1 [0,u] (U i ) satisfies F I /2. Observe that F I = I. As it is well-kow (see, e.g., [6]) that (/ log ) 1 a.s., we see that the theorem holds oce we establish the boud I 4. As (0) 0 ad (1) 1 we have to prove (u) u 4, for all u (0, 1). (3)

1318 J.H.J. Eimahl, R. va de Akker / Joural of Multivariate Aalysis 102 (2011) 1315 1319 Fig. 3. Realizatio of ˆF F (solid) ad F edf F (dashed) for = 100, F = Φ, ad G F. Deote U = {U 1,..., U } ad itroduce the radom variable ] p 1 K = max k N p = 1,..., +1 : 2, p U k+1 +1. I the case K = we have 1/4 ad (3) trivially holds, so we oly eed to cosider K 1. We will prove, for k = 1,..., K ad p = 1,..., 2k 1 odd, p = max U i U i < p2. (4) i=1,..., k Before we prove (4) we show that (4) implies (3). From (4) it is immediate that (3) holds for u {p2 K p = 1,..., 2 K 1}; to be precise, we have, for p = 1,..., 2 K 1, p p. 2 K 2 K Let K = K+1 ad ote that the itervals ((p 1)2 K, p2 K ] ad (p2 K, (p+1)2 K ] both cotai at least oe observatio. The defiitio of ad (4) ow yield (p2 K ) [(p 1)2 K, (p+1)2 K ) ad the defiitio of K implies 2 (K +1). A combiatio of these observatios immediately yields p 2 K p 2 K 2, which shows that (3) holds for all u {p2 K p = 1,..., 2 K 1}. Fially, we cosider u (0, 1) with u2 K N. Let p such that u (p 2 K, (p + 1)2 K ). We easily obtai the boud p 4 p 1 p + 1 2 K 2 K 2 K (u) u p + 1 + 1 2 K 2 K 2 K 4. We coclude that (3) ideed holds. We coclude the proof by establishig (4). We start with k = p = 1. Sice the squares A (1) 1,1 ad A(1) 2,2 both cotai at least two observatios ad A (1) 2,2 is orth to A(1) 1,1, it follows from the defiitio of (1/2) that (1/2) max i {U i U i < 1/2}. As the square A (2) 3,4 is orth to A(2) 4,3 ad both squares cotai at least oe observatio it is also immediate that (1/2) < mi i {U i U i 1/2}. Hece (4) ideed holds for k = p = 1. Suppose that we have show (4) to hold for k = 1,..., K 1, with K K. We show that the (4) also holds for k = K. We have to discuss the cases K eve ad K odd separately. As the argumets are similar, we oly discuss the case K odd. For p odd we obtai from the iductio hypothesis that all observatios that are relevat for (p2 K ), i.e. the observatios U i that belog to the iterval ( ((p 1)2 K ), ((p+1)2 K )], correspod to observatios (U i, V i ) that fall i the sets A (K) p,q (K) ad A(K) (p) p+1,q (K) (p+1). As K K p,q (K) (p). It follows that (p2 K ) both squares cotai at least oe observatio. As K is odd A (K) p+1,q (K) is orth to A(K) (p+1) max i {U i U i < p2 k }. The mass that C assigs to the set A (K) p+1,q (K) cocetrates i the two subsets A(K+1) (p+1) 2p+1,q (K+1) (2p+1) ad A (K+1) 2(p+1),q (K+1), ad both sets cotai at least oe observatio. As K + 1 is eve the set A(K+1) (2(p+1)) 2(p+1),q (K+1) is south to (2(p+1)) A (K+1) 2p+1,q (K+1) (2p+1). This easily yields (p2 K ) < mi i {U i U i p2 k }. We coclude that (4) holds for k = K as well, which cocludes the iductio argumet.

J.H.J. Eimahl, R. va de Akker / Joural of Multivariate Aalysis 102 (2011) 1315 1319 1319 Refereces [1] X. Che, Y. Fa, V. Tsyreikov, Efficiet estimatio of semiparametric multivariate copula models, Joural of the America Statistical Associatio 101 (2006) 1228 1240. [2] C. Geest, B.J.M. Werker, Coditios for the asymptotic semiparametric efficiecy of a omibus estimator of depedece parameters i copula models, i: C.M. Cuadras, J. Fortiaa, J.A. Rodríguez-Lallea (Eds.), Distributios with Give Margials ad Statistical Modelig, Kluwer, Dordrecht, 2002, pp. 103 112. [3] C.A.J. Klaasse, J.A. Weller, Efficiet estimatio i the bivariate ormal copula model: ormal margis are least favourable, Beroulli 3 (1997) 55 77. [4] P. Mikusiski, H. Sherwood, M. Taylor, Shuffles of mi, Stochastica XIII (1992) 61 74. [5] R. Nelse, A Itroductio to Copulas, 1st ed., Spriger-Verlag, New York, 1999. [6] E. Slud, Etropy ad maximal spacigs for radom partitios, Zeitschrift für Wahrscheilichkeitstheorie ud verwadte Gebiete 41 (1978) 341 352. [7] R. Va de Akker, Iteger-valued time series, Ph.D. Thesis, CetER Dissertatio Series 197, Tilburg Uiversity, 2007. Available at: http://aro.uvt.l/show.cgi?did=306632.