Journal of Multivariate Analysis. Superefficient estimation of the marginals by exploiting knowledge on the copula

Similar documents
Chapter 3. Strong convergence. 3.1 Definition of almost sure convergence

7.1 Convergence of sequences of random variables

Convergence of random variables. (telegram style notes) P.J.C. Spreij

Product measures, Tonelli s and Fubini s theorems For use in MAT3400/4400, autumn 2014 Nadia S. Larsen. Version of 13 October 2014.

17. Joint distributions of extreme order statistics Lehmann 5.1; Ferguson 15

A RANK STATISTIC FOR NON-PARAMETRIC K-SAMPLE AND CHANGE POINT PROBLEMS

Sequences and Series of Functions

Chapter 6 Infinite Series

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.436J/15.085J Fall 2008 Lecture 19 11/17/2008 LAWS OF LARGE NUMBERS II THE STRONG LAW OF LARGE NUMBERS

Definition 4.2. (a) A sequence {x n } in a Banach space X is a basis for X if. unique scalars a n (x) such that x = n. a n (x) x n. (4.

4. Partial Sums and the Central Limit Theorem

MAT1026 Calculus II Basic Convergence Tests for Series

Resampling Methods. X (1/2), i.e., Pr (X i m) = 1/2. We order the data: X (1) X (2) X (n). Define the sample median: ( n.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 21 11/27/2013

32 estimating the cumulative distribution function


Infinite Sequences and Series

EECS564 Estimation, Filtering, and Detection Hwk 2 Solns. Winter p θ (z) = (2θz + 1 θ), 0 z 1

It is always the case that unions, intersections, complements, and set differences are preserved by the inverse image of a function.

HOMEWORK I: PREREQUISITES FROM MATH 727

Math 341 Lecture #31 6.5: Power Series

Distribution of Random Samples & Limit theorems

An Introduction to Randomized Algorithms

7.1 Convergence of sequences of random variables

Solution. 1 Solutions of Homework 1. Sangchul Lee. October 27, Problem 1.1

Chapter 6 Principles of Data Reduction

Random Variables, Sampling and Estimation

Advanced Stochastic Processes.

Notes 19 : Martingale CLT

62. Power series Definition 16. (Power series) Given a sequence {c n }, the series. c n x n = c 0 + c 1 x + c 2 x 2 + c 3 x 3 +

Math Solutions to homework 6

Integrable Functions. { f n } is called a determining sequence for f. If f is integrable with respect to, then f d does exist as a finite real number

Notes 27 : Brownian motion: path properties

ANSWERS TO MIDTERM EXAM # 2

lim za n n = z lim a n n.

Lecture 19: Convergence

Detailed proofs of Propositions 3.1 and 3.2

University of Colorado Denver Dept. Math. & Stat. Sciences Applied Analysis Preliminary Exam 13 January 2012, 10:00 am 2:00 pm. Good luck!

ECE 901 Lecture 14: Maximum Likelihood Estimation and Complexity Regularization

Mathematical Methods for Physics and Engineering

Element sampling: Part 2

Rademacher Complexity

First Year Quantitative Comp Exam Spring, Part I - 203A. f X (x) = 0 otherwise

Since X n /n P p, we know that X n (n. Xn (n X n ) Using the asymptotic result above to obtain an approximation for fixed n, we obtain

Lecture 2: Monte Carlo Simulation

Council for Innovative Research

Kolmogorov-Smirnov type Tests for Local Gaussianity in High-Frequency Data

Alternating Series. 1 n 0 2 n n THEOREM 9.14 Alternating Series Test Let a n > 0. The alternating series. 1 n a n.

Lecture 27: Optimal Estimators and Functional Delta Method

MATH301 Real Analysis (2008 Fall) Tutorial Note #7. k=1 f k (x) converges pointwise to S(x) on E if and

Estimation for Complete Data

6. Sufficient, Complete, and Ancillary Statistics

6.3 Testing Series With Positive Terms

Lecture 2. The Lovász Local Lemma

1.3 Convergence Theorems of Fourier Series. k k k k. N N k 1. With this in mind, we state (without proof) the convergence of Fourier series.

Seunghee Ye Ma 8: Week 5 Oct 28

REAL ANALYSIS II: PROBLEM SET 1 - SOLUTIONS

This exam contains 19 pages (including this cover page) and 10 questions. A Formulae sheet is provided with the exam.

MASSACHUSETTS INSTITUTE OF TECHNOLOGY 6.265/15.070J Fall 2013 Lecture 6 9/23/2013. Brownian motion. Introduction

Lecture Notes for Analysis Class

Fall 2013 MTH431/531 Real analysis Section Notes

Introduction to Probability. Ariel Yadin

SOME GENERALIZATIONS OF OLIVIER S THEOREM

B Supplemental Notes 2 Hypergeometric, Binomial, Poisson and Multinomial Random Variables and Borel Sets

Frequentist Inference

Machine Learning Theory Tübingen University, WS 2016/2017 Lecture 12

Lecture 15: Learning Theory: Concentration Inequalities

Let us give one more example of MLE. Example 3. The uniform distribution U[0, θ] on the interval [0, θ] has p.d.f.

Bull. Korean Math. Soc. 36 (1999), No. 3, pp. 451{457 THE STRONG CONSISTENCY OF NONLINEAR REGRESSION QUANTILES ESTIMATORS Seung Hoe Choi and Hae Kyung

Direction: This test is worth 250 points. You are required to complete this test within 50 minutes.

Lecture 3 The Lebesgue Integral

Optimally Sparse SVMs

Math 61CM - Solutions to homework 3

Lecture 20: Multivariate convergence and the Central Limit Theorem

LECTURE 8: ASYMPTOTICS I

Topic 9: Sampling Distributions of Estimators

EFFECTIVE WLLN, SLLN, AND CLT IN STATISTICAL MODELS

BIRKHOFF ERGODIC THEOREM

Law of the sum of Bernoulli random variables

Empirical Processes: Glivenko Cantelli Theorems

Measure and Measurable Functions

Math 155 (Lecture 3)

Notes 5 : More on the a.s. convergence of sums

1. Universal v.s. non-universal: know the source distribution or not.

Confidence interval for the two-parameter exponentiated Gumbel distribution based on record values

Efficient GMM LECTURE 12 GMM II

Problem Set 4 Due Oct, 12

Estimation of the essential supremum of a regression function

Mathematical Statistics - MS

Glivenko-Cantelli Classes

IMPROVING EFFICIENT MARGINAL ESTIMATORS IN BIVARIATE MODELS WITH PARAMETRIC MARGINALS

1+x 1 + α+x. x = 2(α x2 ) 1+x

Introduction to Extreme Value Theory Laurens de Haan, ISM Japan, Erasmus University Rotterdam, NL University of Lisbon, PT

Entropy and Ergodic Theory Lecture 5: Joint typicality and conditional AEP

Kernel density estimator

A NOTE ON INVARIANT SETS OF ITERATED FUNCTION SYSTEMS

Precise Rates in Complete Moment Convergence for Negatively Associated Sequences

Lecture 7: Properties of Random Samples

K. Grill Institut für Statistik und Wahrscheinlichkeitstheorie, TU Wien, Austria

Transcription:

Joural of Multivariate Aalysis 102 (2011) 1315 1319 Cotets lists available at ScieceDirect Joural of Multivariate Aalysis joural homepage: www.elsevier.com/locate/jmva Superefficiet estimatio of the margials by exploitig kowledge o the copula Joh H.J. Eimahl, Ramo va de Akker Departmet of Ecoometrics & OR ad CetER, Tilburg Uiversity, PO Box 90153, NL-5000 LE Tilburg, The Netherlads a r t i c l e i f o a b s t r a c t Article history: Received 9 November 2010 Available olie 6 May 2011 AMS subject classificatios: 62G05 62G20 Keywords: Copula Estimatio of margials Superefficiet estimatio We cosider the problem of estimatig the margials i the case where there is kowledge o the copula. If the copula is smooth, it is kow that it is possible to improve o the empirical distributio fuctios: optimal estimators still have a rate of covergece 1/2, but a smaller asymptotic variace. I this paper we show that for o-smooth copulas it is sometimes possible to costruct superefficiet estimators of the margials: we costruct both a copula ad, exploitig the iformatio our copula provides, estimators of the margials with the rate of covergece log /. 2011 Elsevier Ic. All rights reserved. 1. Itroductio Suppose oe observes a radom sample from a bivariate distributio. By Sklar s theorem (see, e.g., [5]) the distributio fuctio is determied by its copula ad the margial distributios. I semiparametric copula models, it is assumed that the copula depeds o a Euclidea parameter ad, apart from (absolute) cotiuity, o assumptios are imposed o the margials. The study of efficiet estimatio for semiparametric copula models origiated i [3,2], which focused o efficiet estimatio of the copula parameter. [3] also oted that exploitig the kowledge o the copula may help to improve o the margial empirical distributio fuctios. Followig the setup i [3], [1] ad [7, Chapter 5] provide efficiet estimators of the margials, icorporatig the iformatio the copula provides, with the stadard rate of covergece 1/2 ad a limitig distributio that has less spread tha the limitig distributio of the empirical distributio fuctios. I those models smoothess assumptios o the copula are imposed. This paper shows that i the absece of smoothess, superefficiet estimatio of the margials is possible. To this ed, we costruct, i Sectio 2, a specific copula. I Sectio 3, we costruct a estimator of the margials that exploits the iformatio our copula provides, ad show that its rate of covergece is log /. Our copula is a best copula i the sese that log / is the best possible rate of covergece. 2. The copula I this sectio we defie our copula. To this ed we itroduce idepedet Beroulli variables (B k ) k N with success probability 1/2, ad defie Beroulli variables ( Bk ) k N by Bk = B k for k odd ad Bk = 1 B k for k eve. Usig these Beroulli sequeces, we itroduce the radom pair (U, V) by B k U = 2, ad V = B k k 2. k k=1 k=1 Correspodig author. E-mail addresses: j.h.j.eimahl@uvt.l (J.H.J. Eimahl), r.vdakker@uvt.l (R. va de Akker). 0047-259X/$ see frot matter 2011 Elsevier Ic. All rights reserved. doi:10.1016/j.jmva.2011.04.015

1316 J.H.J. Eimahl, R. va de Akker / Joural of Multivariate Aalysis 102 (2011) 1315 1319 Fig. 1. The support S k of the copula C k for k = 1, 2, 3. Hece V is a oe-to-oe fuctio of U ad the iverse is the same fuctio. Note that U ad V are uiformly distributed o [0, 1]. The joit distributio of (U, V) thus defies a copula, which we will deote by C. This copula ca be iterpreted as a ifiite shuffle of mi (see [4] for shuffles of mi). We provide a secod costructio of C that might be more ituitive ad allows us to itroduce otatio that is eeded i the remaider of the paper. Defie, for k N ad p, q = 1,...,, the sets A (k) p,q = [(p 1)2 k, p2 k ) [(q 1)2 k, q2 k ). Next, we defie, for k N ad p = 1,...,, idices q (k) (p) as follows. For k = 1 we set q (1) (1) = 1 ad q (1) (2) = 2. For k 2 we set, for p = 1,..., 1, q (k) 2q (2p) = (k 1) (p), k odd; 2q (k 1) ad q (k) 2q (2p 1) = (k 1) (p) 1, k odd; (p) 1, k eve, 2q (k 1) (p), k eve. Next we itroduce, for k N, S k = 2k p=1 A(k) p,q (k) ; see Fig. 1 for a illustratio. (p) Now we are able to itroduce, for k N, radom variables (U (k), V (k) ) that are uiformly distributed o S k (the desity equals ). Note that U (k) ad V (k) are uiformly distributed o [0, 1], so the law of (U (k), V (k) ) defies a copula C k. It is easy to see that C k C poitwise, as k. I particular, we have, for all k, m N ad all p, q = 1,...,, P{(U (k), V (k) ) A (k) } = p,q P{(U(k+m), V (k+m) ) A (k) } = P{(U, V) A(k) ad this probability equals 2 k i the case q = q (k) (p) ad 0 i the case q q (k) (p). 3. The estimator ad its limitig behavior p,q Available is a radom sample (X 1, Y 1 ),..., (X, Y ) from a bivariate distributio fuctio H which has C, as defied i Sectio 2, as copula. By Sklar s theorem we have, for all (x, y) R 2, H(x, y) = C(F(x), G(y)), where F ad G are the margial distributio fuctios of X 1 ad Y 1, respectively. The oly assumptio we impose o F ad G is that they belog to F, the set of cotiuous distributio fuctios o the real lie. We itroduce our estimator of F via its quatile fuctio. First, we defie (u) o the set {p2 k p = 0,...,, k 1}. Set (0) = X 1:, (1) = X :, ad defie (p2 k ) for k N ad p {1,..., 1} odd, recursively by (we adopt the usual covetio max = ): p = max p 1 where max p i I p X i k = max X i with i I p k j:x j j: X j I p = k i {1,..., } X i, Q p, max Y j < mi p 1,X i j:x j mi Y j > max p 1,X i j:x j p 1 Y j p+1 X i, Y j p+1 X i,,, p,q }, for k odd, for k eve, ] p + 1 p + 1,, X j X i,. Next, we exted the domai to [0, 1] by (u) = sup{ (p2 k ) p2 k u}. As estimator of F we take the distributio fuctio associated with. We deote this estimator by ˆF. Note that ˆF ca be writte as ˆF (x) = i=1 p i1 (,x] (X i ), where the probability masses p i oly deped o the observatios via the raks (R X, j RY ) j of (X j, Y j ), j = 1,...,. The followig theorem is the mai result of this paper.

J.H.J. Eimahl, R. va de Akker / Joural of Multivariate Aalysis 102 (2011) 1315 1319 1317 Fig. 2. Realizatio of ˆF (solid) ad F edf (dashed) for = 100, F = Φ (dotted), ad G F. Theorem 3.1. For F, G F we have ( deotes the sup-orm): 1 2 lim if log ˆF F lim sup log ˆF F 4 a.s. (1) The theorem demostrates that ˆF is superefficiet, i.e. the rate of covergece is log / istead of the usual rate 1/2. Remark 1. I the proof of Theorem 3.1 we exploit that ay estimator F of F that cocetrates o X 1,..., X satisfies lim if log F F 1 2 a.s. (2) This property implies that our estimator ˆF achieves the best attaiable rate of covergece log /. As the boud (2) does ot deped o the copula, our copula C ca be iterpreted as a best oe (i terms of rate of covergece). Remark 2. A atural questio is whether Z = (/ log )(ˆF (x) F(x)) x R, see as a elemet of l (R), weakly coverges (if so, the limit determies the limitig distributio of (/ log ) ˆF F by a applicatio of the cotiuous mappig theorem). The aswer is egative. For F = I, where I deotes the distributio fuctio of the Uiform[0, 1] distributio, the argumet is as follows (the geeral case easily follows from the uiform case). Sice ˆF cocetrates o the observatios ad, as we exploit i the proof of Theorem 3.1, the maximal spacig of i.i.d. draws from the Uiform[0, 1] distributio satisfies (/ log ) 1 a.s. we have, for ay η (0, 1), ϵ (0, 1/2) ad ay fiite partitio k i=1 T i of [0, 1], lim P sup i sup u,u T i which shows that Z is ot tight. log ˆF (u) ˆF (u ) (u u ) > ϵ = 1 > η, As a illustratio, Fig. 2 presets a realizatio of our estimator ad the empirical distributio fuctio F edf for = 100, F = Φ, the stadard ormal distributio fuctio, ad G F, ad Fig. 3 presets the cetered versios of the estimates. Proof of Theorem 3.1. Itroduce U i = F(X i ) ad V i = G(Y i ), ad recall that mootoe trasformatios of the margials U do ot chage the copula. Let ˆF deote the distributio fuctio resultig from computig ˆF from (U i, V i ) i=1 istead of (X i, Y i ) i=1. As U ˆF (x) = ˆF (F(x)) a.s. we have ˆF U F = ˆF I a.s., which shows that it suffices to prove (1) for F = G = I. To stress that we cosider uiform margials we deote the observatios by (U i, V i ) i the remaider of the proof. As the probability of a tie i (U i ) i=1 or (V i) i=1 equals zero, we throughout work o the evet that there are o ties. Let = max i=1,...,+1 U i: U i 1:, with U 0: = 0 ad U +1: = 1, deote the maximal spacig of U 1,..., U. Observe that ay estimator F of I of the form F (u) = i=1 p i1 [0,u] (U i ) satisfies F I /2. Observe that F I = I. As it is well-kow (see, e.g., [6]) that (/ log ) 1 a.s., we see that the theorem holds oce we establish the boud I 4. As (0) 0 ad (1) 1 we have to prove (u) u 4, for all u (0, 1). (3)

1318 J.H.J. Eimahl, R. va de Akker / Joural of Multivariate Aalysis 102 (2011) 1315 1319 Fig. 3. Realizatio of ˆF F (solid) ad F edf F (dashed) for = 100, F = Φ, ad G F. Deote U = {U 1,..., U } ad itroduce the radom variable ] p 1 K = max k N p = 1,..., +1 : 2, p U k+1 +1. I the case K = we have 1/4 ad (3) trivially holds, so we oly eed to cosider K 1. We will prove, for k = 1,..., K ad p = 1,..., 2k 1 odd, p = max U i U i < p2. (4) i=1,..., k Before we prove (4) we show that (4) implies (3). From (4) it is immediate that (3) holds for u {p2 K p = 1,..., 2 K 1}; to be precise, we have, for p = 1,..., 2 K 1, p p. 2 K 2 K Let K = K+1 ad ote that the itervals ((p 1)2 K, p2 K ] ad (p2 K, (p+1)2 K ] both cotai at least oe observatio. The defiitio of ad (4) ow yield (p2 K ) [(p 1)2 K, (p+1)2 K ) ad the defiitio of K implies 2 (K +1). A combiatio of these observatios immediately yields p 2 K p 2 K 2, which shows that (3) holds for all u {p2 K p = 1,..., 2 K 1}. Fially, we cosider u (0, 1) with u2 K N. Let p such that u (p 2 K, (p + 1)2 K ). We easily obtai the boud p 4 p 1 p + 1 2 K 2 K 2 K (u) u p + 1 + 1 2 K 2 K 2 K 4. We coclude that (3) ideed holds. We coclude the proof by establishig (4). We start with k = p = 1. Sice the squares A (1) 1,1 ad A(1) 2,2 both cotai at least two observatios ad A (1) 2,2 is orth to A(1) 1,1, it follows from the defiitio of (1/2) that (1/2) max i {U i U i < 1/2}. As the square A (2) 3,4 is orth to A(2) 4,3 ad both squares cotai at least oe observatio it is also immediate that (1/2) < mi i {U i U i 1/2}. Hece (4) ideed holds for k = p = 1. Suppose that we have show (4) to hold for k = 1,..., K 1, with K K. We show that the (4) also holds for k = K. We have to discuss the cases K eve ad K odd separately. As the argumets are similar, we oly discuss the case K odd. For p odd we obtai from the iductio hypothesis that all observatios that are relevat for (p2 K ), i.e. the observatios U i that belog to the iterval ( ((p 1)2 K ), ((p+1)2 K )], correspod to observatios (U i, V i ) that fall i the sets A (K) p,q (K) ad A(K) (p) p+1,q (K) (p+1). As K K p,q (K) (p). It follows that (p2 K ) both squares cotai at least oe observatio. As K is odd A (K) p+1,q (K) is orth to A(K) (p+1) max i {U i U i < p2 k }. The mass that C assigs to the set A (K) p+1,q (K) cocetrates i the two subsets A(K+1) (p+1) 2p+1,q (K+1) (2p+1) ad A (K+1) 2(p+1),q (K+1), ad both sets cotai at least oe observatio. As K + 1 is eve the set A(K+1) (2(p+1)) 2(p+1),q (K+1) is south to (2(p+1)) A (K+1) 2p+1,q (K+1) (2p+1). This easily yields (p2 K ) < mi i {U i U i p2 k }. We coclude that (4) holds for k = K as well, which cocludes the iductio argumet.

J.H.J. Eimahl, R. va de Akker / Joural of Multivariate Aalysis 102 (2011) 1315 1319 1319 Refereces [1] X. Che, Y. Fa, V. Tsyreikov, Efficiet estimatio of semiparametric multivariate copula models, Joural of the America Statistical Associatio 101 (2006) 1228 1240. [2] C. Geest, B.J.M. Werker, Coditios for the asymptotic semiparametric efficiecy of a omibus estimator of depedece parameters i copula models, i: C.M. Cuadras, J. Fortiaa, J.A. Rodríguez-Lallea (Eds.), Distributios with Give Margials ad Statistical Modelig, Kluwer, Dordrecht, 2002, pp. 103 112. [3] C.A.J. Klaasse, J.A. Weller, Efficiet estimatio i the bivariate ormal copula model: ormal margis are least favourable, Beroulli 3 (1997) 55 77. [4] P. Mikusiski, H. Sherwood, M. Taylor, Shuffles of mi, Stochastica XIII (1992) 61 74. [5] R. Nelse, A Itroductio to Copulas, 1st ed., Spriger-Verlag, New York, 1999. [6] E. Slud, Etropy ad maximal spacigs for radom partitios, Zeitschrift für Wahrscheilichkeitstheorie ud verwadte Gebiete 41 (1978) 341 352. [7] R. Va de Akker, Iteger-valued time series, Ph.D. Thesis, CetER Dissertatio Series 197, Tilburg Uiversity, 2007. Available at: http://aro.uvt.l/show.cgi?did=306632.