SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED MARGINALS

Size: px
Start display at page:

Download "SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED MARGINALS"

Transcription

1 SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED MARGINALS By Wenmei Huang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2011

2 ABSTRACT SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED MARGINALS By Wenmei Huang Copulas are a flexible way of modeling dependence in a set of variables where the association between variables can be elicited separately from the marginal distributions. A semi-parametric approach for estimating the dependence structure of bivariate distributions derived from copulas is investigated when the associated marginals are mixed, that is, consisting of both discrete and continuous components. The semi-parametric likelihood approach is proposed for obtaining the estimator of the dependence parameter under unknown marginals. The consistency and asymptotic normality of the estimator is established as sample size tends to infinity. For constructing confidence intervals in practice, an estimator of the asymptotic variance is proposed and its properties are investigated via simulation. Extensions to higher dimensions are discussed. Several simulation studies and real data examples are provided for investigating the application of the developed methodology of inference. This work generalizes prior results obtained on the estimation of dependence when the marginals are continuous by Genest et al. [11].

3 To my professors and my loving family. iii

4 ACKNOWLEDGMENTS I would like to express my sincere gratitude to my dissertation advisor, Prof. Sarat C. Dass, for his patient guidance, encouragement and support during my graduate study at Michigan State University, and his precious advice for my professional career. He helped me to learn not only how to do research in statistics, but also how to write statistical papers using proper English. The education I received under his guidance will be the most valuable in my future career. I am grateful to Prof. Tapabrata Maiti, Prof. Chae Young Lim of the Department of Statistics and Probability, and Prof. Zhengfang Zhou of the Department of Mathematics, for serving on my thesis committee. I accumulated practical experience from the consulting work at the Center for Statistical Training and Consulting. Prof. Connie Page, Prof. Dennis Gilliland, and Prof. Sandra Herman shared with me their experience and knowledge in dealing with practical problems, and guided me through my projects. I would like to thank Prof. James Stapleton for his help and encouragement with respect to my teaching and study. He has been very supportive to me, as well as to all his students. I thank him for offering to review my thesis. I would like to thank Prof. Anil K. Jain at Department of Computer Science and Engineering, Michigan State University, for letting me use the biometrics data in the PRIP lab in my research. I would like to thank Dr. Yongfang Zhu for discussions on course work and research. I also would like to thank all the professors of the Department of Statistics and Probability and all my great friends at Michigan State University for making my life there so unique and wonderful. Finally, I especially thank my loving family for their encouragement and support. iv

5 TABLE OF CONTENTS List of Tables vii List of Figures viii 1 INTRODUCTION 1 2 A MODEL FOR BIVARIATE DISTRIBUTIONS Joint Distributions with Mixed Marginals Semi-Parametric Estimation of θ ASYMPTOTIC PROPERTIES OF SEMI-PARAMETRIC ESTIMATOR IN BI- VARIATE DISTRIBUTIONS Statement of the Main Theorem Consistency of ˆθn Asymptotic Behavior of Bn Asymptotic Behavior of An Asymptotic normality of Rn cc Asymptotic normality of Rn cd and Rn dc Asymptotic normality of Rn dd Asymptotic normality of Rn Proof of the Main Result A VARIANCE ESTIMATOR OF BIVARIATE DISTRIBUTIONS 46 5 EXTENSIONS TO HIGHER DIMENSIONS 51 6 JOINT DISTRIBUTIONS USING t-copula Joint Distributions Using t-copula Estimation of R and ν Estimation of R for fixed ν Selection of the Degrees of Freedom, ν Summary SIMULATION AND REAL DATA RESULTS Simulation Results Application to Multimodal Fusion in Biometric Recognition Introduction Application in Biometric Fusion v

6 8 SUMMARY AND CONCLUSION 86 Bibliography vi

7 LIST OF TABLES 4.1 Relationship among (X j, Y j ), R X(j), and R Y (j) Simulation results for Experiment 1 with ν 0 = and ρ = The absolute bias and relative absolute bias of the estimator ˆρn are provided, together with the empirical coverage of the approximate 95% confidence interval for ρ based on the asymptotic normality of ˆρn L 1 distances for paired values of ν 0 corresponding to two values of ρ 0, 0.20 and Simulation results for Experiment 2 with ν 0 = and ρ 0 = Simulation results for Experiment 3 with ν 0 = 10. The two correlation values considered are ρ 0 = 0.2 and ρ 0 = Simulation results for Experiment Results of the estimation procedure for R and ν based on the NIST database vii

8 LIST OF FIGURES 7.1 Density curves for t distribution with degrees of freedom ν = 3, 5, 10, 15, 20, 25 and normal distribution. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation Some examples of biometric traits: (a) fingerprint, (b) iris scan, (c) face scan, (d) signature, (e) voice, (f) hand geometry, (g) retina, and (h) ear Histograms of matching scores, corresponding to genuine scores for Matcher 1. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines) Histograms of matching scores, corresponding to genuine scores for Matcher 2. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines) Histograms of matching scores, corresponding to impostor scores for Matcher 1. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines). The spike corresponds to discrete components. Note how the generalized density estimator performs better compared to the continuous estimator (assuming no discrete components) Histograms of matching scores, corresponding to impostor scores for Matcher 2. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines) Performance of copula fusion on the MSU-Multimodal database Performance of copula fusion on the NIST database viii

9 CHAPTER 1 INTRODUCTION Copulas are functions that join or couple multivariate distribution functions to their onedimensional marginal distribution functions. Essentially, a copula is a function C from [0, 1] D (D 2) to [0, 1] with the following properties: 1. For every (u 1,..., u D ) [0, 1] D, C(u 1,..., u D ) = 0 if at least one of u k = 0, for k = 1,..., D (1.1) and C(u 1,..., u D ) = u k if u j = 1 for all j k, k = 1,..., D (1.2) 2. For every (u 1,..., u D ) and (v 1,..., v D ) [0, 1] D such that u k v k for k = 1,..., D, which defines a D-box V = [u 1, v 1 ] [u 2, v 2 ]... [u D, v D ] [0, 1] D. Then sgn(c)c(c) 0 (1.3) where the sum is taken over all vertices c of V and sgn(c) is given by 1, if c k = u k for an even number of k s, sgn = 1, if c k = u k for an odd number of k s. Alternatively, copulas are multivariate distribution functions whose one-dimensional marginals are uniform on the interval [0, 1]. 1

10 Since their introduction by Sklar [46], copulas have proved to be a useful tool for analyzing multivariate dependence structures due to many unique and interesting features. Let F k (x k ) = P (X k x k ) for k = 1, 2,..., D denote D continuous distribution functions on the real line, and let F denote a joint distribution function on R D whose k-th marginal distribution corresponds to F k. According to Sklar s Theorem [46], there exists a unique function C(u 1,..., u D ) from [0, 1] D to [0, 1] satisfying F (x 1,..., x D ) = C(F 1 (x 1 ),..., F D (x D )), (1.4) where x 1,..., x D are D real numbers. The function C is known as a D-copula function that couples the one-dimensional distribution functions F k, k = 1, 2,..., D to obtain F. If not all marginals are continuous, C is uniquely determined on RanF 1... RanF D, where RanF k is the range of F k. Equation (1.4) can also be used to construct D-dimensional distribution functions F whose marginals are the pre-specified distributions F k, k = 1,..., D: Choose a copula function C and define the distribution function F as in (1.4). It follows that F is a D-dimensional distribution function with marginals F k, k = 1,..., D. Investigating dependence structures of multivariate distributions has always been an important area for researchers; see, for example, [26], [27], and [35]. For example, one of the central problems in statistics concerns testing the hypothesis that random variables are independent. Prior to the very recent explosion of copula theory and application, the only models available in many fields to represent the dependence structure were the classical multivariate models, such as Gaussian multivariate model. These models entailed rigid assumptions on the marginal and joint behaviors of the variables. Therefore, they provide limited usefulness. Though the phrase copula was first used by Sklar [46] in 1959 and traces of copula theory can be found in Hoeffdings work during the 1940s, the study of copulas and their application in statistics is a rather modern phenomenon. Earlier efforts have addressed different statistical aspects of specialized as well as general copula-based joint distributions. Inference procedures for bivariate Archimedean copulas have been developed and discussed by Genest et. al [12]. Demarta et al., 2

11 [8], studied properties related to t-copulas, whereas Genest et al., [11] and [13], gave a goodness of fit procedure for general copulas. Estimation techniques and properties of the estimators have been well studied with applications ranging from statistics to mathematical finance and financial risk management; see, for example, Shih et al. [44], Embrechts et al. [9], Cherubini et al. [5], Frey et al. [10] and Chen et al. [4]. Many multivariate models for dependence can be generated by parametric families of copulas, {C θ : θ Θ}, typically indexed by a real or vector-valued parameter θ. Examples of such systems are given in [23], [30], [22], and others. The recent monograph by Hutchinson and Lai [17], which includes an extensive bibliography, constitutes a handy reference to this expanding literature. Copula-based models are natural in situations where learning about the association between the variables is important, since the effect of the dependence structure is easily separated from that of the marginals. In such situations, there is typically enough data to obtain nonparametric estimators of the marginal distributions, but insufficient information to afford nonparametric estimation of the structure of the association. In such cases, it is convenient to adopt a parametric form for the dependence function while keeping the marginals unspecified. To estimate the dependence parameter θ, two strategies could be adopted depending on the circumstances: (i) if valid parametric models are already available for the marginals, then it is straightforward in principle to write down a likelihood function for the data, which makes the estimation of θ margin-dependent, because the estimators of the parameters involved in the marginals would be indirectly affected by the copula. This is the parametric approach. (ii) when nonparametric estimators are contemplated for the marginals, however, inference about the dependence parameter θ will be margin-free. This is the semi-parametric approach. Clayton [6], Hougaard et al. [15] and Oakes [36] have pointed out that the margin-free requirement is sensible in applications where the focus of the analysis is on the dependence structure. Given a sample of n observations X j = (X 1j, X 2j,..., X Dj ), j = 1, 2,..., n from the joint distribution F (x 1, x 2,..., x D ) = C θ (F 1 (x 1 ),..., F D (x D )), the estimation procedure involves 3

12 selecting ˆθn to maximize the semi-parametric likelihood n l(θ) = log [c θ (F 1n (X 1j ),..., F Dn (X Dj ))] (1.5) j=1 where F kn denotes n/(n + 1) times the empirical distribution function of the k-th component observations X kj for j = 1, 2,..., n, and c θ is the density of C θ with respect to Lebesgue measure on [0, 1] D. The utilization of F kn instead of the empirical distribution here avoids difficulties arising from the potential unboundedness of log c θ (u 1,..., u D ) as some of the u k s tend to 1. A central assumption made in the earlier studies is that the marginals associated with the joint distribution F should be continuous. In reality, there are many situations where this assumption is not satisfied and the marginals can be mixed, that is, they contain both discrete and continuous components (see, for example, Kohn et al. [37]). Based on the continuous assumption and the regularity conditions, Genest et al. [11] have shown that n 1/2ˆθn is asymptotic normal, where ˆθn is the semi-parametric estimator of θ in (1.5). Our work extends previous methodology by accounting for mixed marginals for D = 2 and Θ R. Uniqueness is an important consideration when estimating the unknown parameters corresponding to a copula. Since our mixed marginals have discrete components, one way to achieve uniqueness is to restrict C to belong to a particular parametric family. Under this assumption, We develop an estimation technique for finding the semi-parametric maximum likelihood estimator, ˆθn, for the parametric family {C θ : θ Θ} in the bivariate situation in Chapter 2. The estimation methodology involves integrals corresponding to the discrete components and is therefore, non-standard. Consistency and asymptotic normality of the semi-parametric maximum likelihood estimator ˆθn is developed in Chapter 3. A a variance estimator of ˆθn, ϱ is developed in Chapter 4. Note that for multivariate observations, copulas allow flexible modelling of the joint distribution via its marginal distributions as well as the correlation between pairwise components of the vector of observations. Therefore, the theory presented in Chapter 2 and Chapter 3 can be extended to higher dimensions, and this is given in Chapter 5. In Chapter 6, C is restricted to a particular parametric family of copula, the t-copula, and findings related to the t-copula are provided. Numerical 4

13 simulations and application of the methodology to real biometric data are presented in Chapter 7. We summarize our work and discuss future research directions in Chapter 8. 5

14 CHAPTER 2 A MODEL FOR BIVARIATE DISTRIBUTIONS 2.1 Joint Distributions with Mixed Marginals Let F and G be two distributions on the real line that are mixed, that is, both F and G consist of a mixture of discrete as well as continuous components. The general form for F is given by d F d F (x) = p F h I {DF h x} + (1 F x p F h ) f(w) dw, (2.1) h=1 h=1 where I {A} is the indicator function of set A (that is, I {A} = 1 if A is true, and 0 otherwise); in (2.1), the distribution function F consists of d F discrete components denoted by D F h with F (D F h ) F (D F h ) = p F h for h = 1, 2,..., d F where x denotes the left limit of x, and f is the density (continuous) component of F. The discrete components D F h, h = 1, 2,..., d F correspond to jump points in the distribution function F, and are called the jump points of F. All other points on R correspond to points of continuity of F, that is, F (x) F (x ) = 0 if x D F h. The set of jump and continuity points of F are denoted by J (F ) and C(F ), respectively. In the same way, writing d G d G(y) = p Gl I {DGl y} + (1 G y p Gl ) l=1 l=1 g(w) dw, (2.2) 6

15 similar definitions can be given for the quantities d G, p Gl, D Gl and g. We denote the set of jump and continuity points of G by J (G) and C(G), respectively. Let H θ be the bivariate function defined by H θ (x, y) = C θ (F (x), G(y)) (2.3) for θ Θ with F and G as in (2.1) and (2.2), respectively. It follows from the properties of a copula function that Theorem The function H θ, as defined in (2.3), is a valid bivariate distribution function on R 2 with marginals F and G. Proof: For H θ (x, y) to be a valid distribution function on R 2, the following four conditions should be satisfied: (1) 0 H θ (x, y) 1 for all (x, y), (2) H θ (x, y) 0 as max(x, y), (3) H θ (x, y) 1 as min(x, y) +, and (4) for every x 1 x 2 and y 1 y 2, H H θ (x 2, y 2 ) H θ (x 1, y 2 ) H θ (x 2, y 1 ) + H θ (x 1, y 1 ) 0. Note that since H θ (x, y) = C θ (F (x), G(y)) = P (U F (x), V G(y)), (1) follows. To prove (2), note that F (x) 0 and G(y) 0 when max(x, y). Hence, H θ (x, y) 0 from the property of a cumulative distribution function. (3) follows similarly. To prove (4), we use the facts that F (x 1 ) F (x 2 ) for x 1 x 2 and G(y 1 ) G(y 2 ) for y 1 y 2, and H = [F (x 1 ),F (x 2 )] [G(y 1 ),G(y 2 )] c θ (u, v) du dv. (2.4) Since c θ (u, v) 0, H 0 follows. Proof completed. We turn now to give a characterization of the density, h θ (x, y), of H θ (x, y). For a fixed (x, y) R 2, the two components of (x, y) correspond to either a point of continuity or a jump point of 7

16 the corresponding marginal distribution function. For (X, Y ) H θ, h θ (x, y) has four different expressions, namely, P {X (x, x + a], Y (y, y + b]} lim a,b 0 ab if x C(F ), y C(G), h θ (x, y) = lim b 0 P {X = x, Y (y, y + b]} b lim a 0 P {X (x, x + a], Y = y} a if x J (F ), y C(G), if x C(F ), y J (G), and P {X = x, Y = y} if x J (F ), y J (G), (2.5) The following theorem gives workable expressions for h θ (x, y) in terms of the copula: Theorem For each (x, y) R 2, f(x) g(y) c θ (F (x), G(y)) if x C(F ), y C(G), y(y) [F (x ),F (x)] c θ (u, G(y)) du if x J (F ), y C(G), h θ (x, y) = f(x) [G(y ),G(y)] c θ (F (x), v) dv if x C(F ), y J (G), [F (x ),F (x)] [G(y ),G(y)] c θ (u, v) du dv if x J (F ), y J (G). (2.6) Proof: When x C(F ) and y C(G), P {X (x, x + a], Y (y, y + b]} = [F (x),f (x+a)] [G(y),G(y+b)] c θ (u, v) du dv from (2.4). This is approximately ab f(x) g(y) c θ (F (x), G(y)) for small a and b. When x J (F ) and y C(G), note that P {X = x, Y (y, y + b]} = [F (x ),F (x)] [G(y),G(y+b)] c θ (u, v) du dv, 8

17 again from (2.4), which is approximately b g(y) [F (x ),F (x)] c θ (u, G(y)) du for small b. The third case follows similarly and the fourth expression is straightforward. Proof completed. The terms in h θ (x, y) that involve the densities f and g do not depend on θ and can, hence, be ignored for the estimation of θ. Subsequently, we focus on the function c θ c θ (F (x ), F (x), G(y ), G(y)) defined by c θ = c θ (F (x), G(y)) if x C(F ), y C(G), [F (x ),F (x)] c θ (u, G(y)) du if x J (F ), y C(G), [G(y ),G(y)] c θ (F (x), v) dv if x C(F ), y J (G), [F (x ),F (x)] [G(y ),G(y)] c θ (u, v) du dv if x J (F ), y J (G). (2.7) 2.2 Semi-Parametric Estimation of θ If not all marginals are continuous, C in (1.4) is no longer unique. Uniqueness is an important consideration when estimating the unknown parameters corresponding to the copula function. Since our marginals have discrete components, one way to achieve uniqueness is to restrict C to belong to a particular parametric family. From now on, let {C θ : θ Θ} be a restricted parametric family of copulas. Suppose { (X j, Y j ) T, j = 1, 2,..., n } is the set of n independent and identically distributed bivariate random vectors arising from the joint distribution H θ (x, y) in (2.3). The parameter of interest is the bivariate dependence parameter, θ, and the log-likelihood function corresponding to the n observations is n log h θ (x j, y j ). j=1 The estimation of θ is complicated by the presence of nuisance parameters consisting of the unknown distributions F and G (only the number d F and jump points D F h, h = 1, 2,..., d F 9

18 of F (correspondingly, d G and points D Gl of G) are known). An objective function that has θ as the only unknown parameter can be obtained by replacing F and G by Fn and Gn in the loglikelihood function, where Fn (and respectively, Gn) is n/(n + 1) times the empirical cdf of F (respectively, G). The rescaling by n/(n + 1) avoids difficulties arising from the unboundedness of the log-likelihood function as the empirical cdfs tend to 0 or 1, and has been employed by Genest et al. in [11]. Thus, the unique semi-parametric maximum likelihood estimator of θ, ˆθn, is the maximizer of n L(θ) = log {c θ,j (F n(x j ), F n(x j ), Gn(y j ), G n(y j ))} (2.8) j=1 (that is, ˆθn = arg max θ Θ L(θ)), where c θ,j c θ (F n(x j ), F n(x j ), Gn(y j ), G n(y j )), ignoring terms that do not depend on θ in n j=1 log h θ (x j, y j ). In the special case when no discrete components are present, Genest et al. [11] developed a semi-parametric approach to estimate the unknown parameters based on the semi-parametric likelihood. Subsequently, it was shown that the resulting estimators were consistent and asymptotically normally distributed; see [11] for details. Expressions (2.7) and (2.8), respectively, are generalizations of the methodology of Genest et al. [11] when F and G contain discrete components. Note that the challenge in maximizing the semiparameteric log-likelihood in (2.8) is that it involves several integrals corresponding to discrete components in (x j, y j ), j = 1, 2,..., n. 10

19 CHAPTER 3 ASYMPTOTIC PROPERTIES OF SEMI-PARAMETRIC ESTIMATOR IN BIVARIATE DISTRIBUTIONS 3.1 Statement of the Main Theorem In view of its similarity with the semi-parametric maximum likelihood estimator for continuous marginals, we expect that ˆθn in the case of mixed marginals to be consistent and asymptotically normal. To prove this, we denote l(θ, u 1, u 2, v 1, v 2 ) = log {c θ (u 1, u 2, v 1, v 2 )} and use the notation l θ and l θ,θ to denote the first and second derivative of l with respect to θ. The estimator ˆθn satisfies 1 L(θ) n θ = 1 n l n θ (θ, Fn(x j ), F n(x j ), Gn(y j ), G n(y j )) = 0. (3.1) j=1 Here is some heuristics of the proof of asymptotic normality. Expanding in a Taylor s series, one obtains, 1 L(θ) n θ = 0 A n (ˆθn θ)bn (3.2) θ=ˆθn 11

20 where An = 1 n l n θ (θ, Fn(x j ), F n(x j ), Gn(y j ), G n(y j )) j=1 and Bn = 1 n l n θ,θ (θ, Fn(x j ), F n(x j ), Gn(y j ), G n(y j )). j=1 It follows from equation (3.2) that n 1/2 (ˆθn θ) n 1/2 An/Bn. From the theory of multivariate rank statistics, one can get the following limiting behaviors: Bn β = E(l θ,θ {θ, F (X ), F (X), G(Y ), G(Y )}) (3.3) almost surely, and n 1/2 An is asymptotically normal with zero mean and variance σ 2, of which the explicit form will be provided in Section 3.4. Thus, we have Theorem Under suitable regularity conditions, the semi-parametric estimator ˆθn is consistent and n 1/2 (ˆθn θ) is asymptotically normal with mean 0 and variance ϱ = σ 2 /β 2. We start with the proofs of consistency in Section 3.2, the asymptotic properties of Bn and An in Section 3.3 and Section 3.4 respectively, on which Theorem is based, and complete this Chapter by proving Theorem in Section 3.5. Here are some notations which will be used later in this Chapter. Let S F h = [F (D F h ), F (D F h )], for h = 1, 2,..., d F, S Gl = [F (D Gl ), F (D Gl )] for l = 1, 2,..., d G. Under the situation that both F and G only have one jump point, the previous notations can be simplified to D F, D G, S F = [F (D F ), F (D F )], and S G = [G(D G ), G(D G )], respectively. 12

21 3.2 Consistency of ˆθ n Nothing that ˆθn satisfying (3.1), by letting J = l θ, which is a function on (0, 1) 2, we only need to show that Rn = 1 n J(Fn(X n j ), Gn(Y j )) 0 a.s. (3.4) j=1 Since H θ involves one or more jump points, Rn can be written as n Rn = 1 n j=1 J(Fn(X j ), F n(x j ), Gn(Y j ), G n(y j )) (3.5) We introduce some notations for the subsequent presentation. Let {cc}, {cd}, {dc}, and {dd} be the events that {(X, Y ) : X C(F ), Y C(G)}, {(X, Y ) : X C(F ), Y J (G)}, {(X, Y ) : X J (F ), Y C(G)} and {(X, Y ) : X J (F ), Y J (G)}, respectively. Further, let Acc = {j : (X j, Y j ) {cc}}. Similarly, one can define A cd, A dc and A dd. Note that the sets A S for S = {cc}, {cd}, {dc}, and {dd} is a partition of the set of integers {1, 2,..., n}. We consider the decomposition of Rn based on these four partition sets, namely, Rn = R cc n +R cd n +R dc n +R dd n where Rn S = 1 J S (Fn(X n j ), F n(x j ), Gn(Y j ), G n(y j )), j A S for S = {cc}, {cd}, {dc}, and {dd}, and J S is given by J cc (u 2, v 2 ) J S J cd (u 2, v 1, v 2 ) (u 1, u 2, v 1, v 2 ) = J dc (u 1, u 2, v 2 ) J dd (u 1, u 2, v 1, v 2 ) if S = {cc} if S = {cd} if S = {dc} if S = {dd}. (3.6) The functions J S, where S = {cc}, {cd}, {dc} and {dd}, are assumed to be continuous on their respective domains. For example, J cc (u 2, v 2 ) is assumed to be continuous on [0, 1] 2 ; this is a reasonable assumption to make since candidates for J cc will be either θ log c θ (u 2, v 2 ) or 13

22 2 θ 2 log c θ (u 2, v 2 ) in this Chapter, which are continuous functions of u 2 and v 2. Jcd (u 2, v 1, v 2 ) is assumed to be continuous on [0, 1] 3. A particular candidate of J cd is J cd (u 2, v 1, v 2 ) = θ log [v 1,v 2 ] c θ (u 2, v)dv which is continuous in u 2, v 1 and v 2. Similarly for J dc and J dd. Almost sure convergence of Rn is established in the following theorem: Theorem Let J = l θ, r(u) = u(1 u), δ > 0, p and q are positive numbers satisfying 1/p + 1/q = 1. Let a and b be numbers given by a = ( 1 + δ)/p and b = ( 1 + δ)/q. Consider the conditions (C1) J cc (u 2, v 2 ) M 1 r(u 2 ) a r(v 2 ) b, (C2) J cd (u 2, v 1, v 2 ) M 2 r(u 2 ) a (independent of v 1 and v 2 ), and (C3) J dc (u 1, u 2, v 2 ) M 3 r(v 2 ) b (independent of u 1 and u 2 ). (C4) In a small neighborhood of each (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) [0, 1]4, for h = 1, 2,..., d F and l = 1, 2,..., d G, J dd (u 1, u 2, v 1, v 2 ) is finite. Under conditions (C1-C4), Rn κ(= 0) almost surely. where κ = E[N cc (X, Y ) + N cd (X, Y ) + N dc (X, Y ) + N dd (X, Y )], (3.7) with E[N cc (X, Y )], E[N cd (X, Y )], E[N dc (X, Y )] and E[N dd (X, Y )] are given as below, respectively, {(0,1) ( d F S h=1 F h ) c } {(0,1) ( d G S l=1 Gl ) c J cc (u, v)c θ (u, v) du dv, } d G {J (0,1) ( d cd F S h=1 F h ) c (u, G(D Gl } ), G(DGl )) c θ (u, v) dv du, S l=1 Gl d F {J (0,1) ( d dc G S l=1 Gl ) c (F (D F } h ), F (DF h ), v) c θ (u, v) du dv, h=1 S F h J dd (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) c θ (u, v) du dv. h,l S F h S Gl 14

23 Remark: In the case of t-copulas (as defined in (6.1)), a and b can be chosen such that a 2 ν, b 2 ν, a = 1 + δ, b = 1 + δ p q for some p, q and δ. Proof: Without loss of generality, we consider the case that there is only one point of discontinuity in both F and G, i. e., D F and D G. Let Cn(u, v) be the empirical copula defined by the sample Cn(u, v) = 1 n I n {F n(x j ) u,gn(y j ) v}. j=1 We consider the decomposition of the empirical copula measure into 4 sub-measures C cc n, C cd n, C dc n, and C dd n, where for S = {cc}, {cd}, {dc} and {dd}, and Cn S (u, v) = 1 n I n {F n(x j ) u,gn(y j ) v}, j A S Cn(u, v) = S {cc,cd,dc,dd} C S n (u, v). Then by the Glivenko-Cantelli Theorem, C cc n (u, v) converges uniformly to C 1 (u, v), C cd n (u, v) converges uniformly to C 2 (u, v), C dc n (u, v) converges uniformly to C 3 (u, v), and C dd n (u, v) converges uniformly to C 4 (u, v) respectively, and C i (u, v) for i = 1, 2, 3, 4, are given by C 1 (u, v) = P (F (X) u and G(Y ) v) if (u, v) (0, F (D F )) (0, G(D G )) P (F (X) (0, u) S F c and G(Y ) v) if (u, v) [F (D F ), 1) (0, G(D G )) P (F (X) u and G(Y ) (0, v) S G c ) if (u, v) (0, F (D F )) [G(D G ), 1) P (F (X) (0, u) S F c and G(Y ) (0, v) Sc G ) if (u, v) [F (D F ), 1) [G(D G ), 1) C 2 (u, v) = 0 if v (0, G(D G )) P (F (X) u and Y = D G ) if (u, v) (0, F (D F )) [G(D G ), 1) P (F (X) (0, u) S F c and Y = D G ) if (u, v) [F (D F ), 1) [G(D G ), 1) 15

24 C 3 (u, v) = 0 if u (0, F (D F )) P (X = D F and G(Y ) v) if (u, v) [F (D F ), 1) (0, G(D G )) P (X = D F and G(Y ) (0, v) S G c ) if (u, v) [F (D F ), 1) [G(D G ), 1) and C 4 (u, v) only put mass P (X = D F and Y = D G ) on a single point {F (D F )} {G(D G )}. Note that C i (u, v), i = 1,..., 4, are not probability measures and 4 C θ (u, v) = C i (u, v). i=1 To obtain the almost sure convergence, it takes four steps. Step 1. Let Rn cc = (0,1) 2 Jcc (u, v)dcn cc (u, v). Now we prove that Rn cc µ cc E[N cc (X, Y )] almost surely. We show that J cc is uniformly integrable with respect to the measures dcn cc (u, v) by showing (0,1) 2 Jcc (u, v) 1+ɛ dcn cc (u, v) is bounded for some ɛ > 0. Using the Hölder s inequality and the assumption (C1), one can derive the following chain of inequalities, noting that dcn cc (u, v) putting mass 1 on each continuous n point: (0,1) 2 Jcc (u, v) 1+ɛ dc cc n (u, v) M 1 (0,1) 2 r(u)a(1+ɛ) r(v) b(1+ɛ) dc cc n (u, v) M 1 { (0,1) 2 r (u)a(1+ɛ)p dc cc n (u, v) } 1/p { 1 ( ) 1/p j a(1+ɛ)p 1 M 1 r n n + 1 n j Acc j Acc = M ( ) 1 j ( 1+δ)(1+ɛ) r n n + 1 j Acc 1 1 M 1 0 {u(1 u) (1 δ)(1+ɛ) du <, } 16 } 1/q (0,1) 2 r (u)b(1+ɛ)q dcn cc (u, v) ( j r n + 1 ) 1/q b(1+ɛ)q

25 for ɛ < δ. C 1 (u, v), we have Combining the result above with the fact that C cc n (u, v) uniformly converges to Rn cc (0,1) 2 Jcc (u, v)dc 1 (u, v) = ((0,1) S F c ) ((0,1) Sc G ) Jcc (u, v)c θ (u, v) dudv almost surely, which completes step 1. Step 2. Let Rn cd = (0,1) 2 Jcd (u, v 1, v 2 )dcn cd (u, v). Now we prove that R cd n µ cd E[N cd (X, Y )] almost surely. We show that J cd (u, v 1, v 2 ) is uniformly integrable with respect to the measures dcn cd (u, v) by showing (0,1) 2 Jcd (u, v 1, v 2 ) 1+ɛ dcn cd (u, v) is bounded for some ɛ > 0. Using the Hölder s inequality and the assumption (C2), one can derive the following chain of inequalities, noting that dc cd n (u, v) putting mass 1 n on each point on ((0, 1) Sc F ) G n(d G ): (0,1) 2 Jcd (u, v 1, v 2 ) 1+ɛ dc cd n (u, v) M 2 (0,1) 2 r(u)a(1+ɛ) dc cd n (u, v) M 2 { (0,1) 2 r (u)a(1+ɛ)p dc cd n (u, v) 1 M 2 r n j A cd 1 = M 2 r n j A cd M 2 { 1 <, 0 ( j n + 1 ( j n + 1 1/p ) a(1+ɛ)p } 1/p { 1 1/p ) ( 1+δ)(1+ɛ) 1 {u(1 u) (1 δ)(1+ɛ) } du } 1/p } 1/q (0,1) 2 1(1+ɛ)q dcn cd (u, v) if ɛ < δ. Combining the result above with the fact that C cd n (u, v) uniformly converges to C 2 (u, v), 17

26 we have R cd n (0,1) 2 Jcd (u, G(D G ), G(D G ))dc2 (u, v) = ((0,1) S F c ) Jcd (u, G(D G ), G(D G )) c θ (u, v) dv du S G almost surely, which completes step 2. Step 3. Under the assumption (C3), the proof for R dc n µ dc E[N dc (X, Y )] is similar to that in step 2, so it is omitted. Step 4. The convergence of R dd n can be established using the SLLN. Let Rn dd = (0,1) 2 Jdd (u 1, u 2, v 1, v 2 )dcn dd (u, v). Now we prove that R dd n µ dd E[N dd (X, Y )] almost surely. We show that J dd (u 1, u 2, v 1, v 2 ) is uniformly integrable with respect to the measures dcn dd (u, v) by showing (0,1) 2 Jdd (u 1, u 2, v 1, v 2 ) 1+ɛ dcn dd (u, v) is bounded for some ɛ > 0. Noting that dcn dd (u, v) putting mass n dd n on the single point (F n(d F ), Gn(D G )), where n dd is the number of observations of (X j, Y j ) such that X j = D F and Y j = D G : (0,1) 2 Jdd (u 1, u 2, v 1, v 2 ) 1+ɛ dc dd n (u, v) = n dd n Jdd (u 1, u 2, v 1, v 2 ) 1+ɛ <, as long as J dd (u 1, u 2, v 1, v 2 ) is finite, which is the assumption (C4). Combining the result above with the fact that Cn dd (u, v) uniformly converges to C 4 (u, v), we have Rn dd (0,1) 2 Jdd (u 1, u 2, v 1, v 2 )dc 4 (u, v) = J dd (F (D F ), F (D F ), G(D G ), G(D G )) S F S G c θ (u, v) du dv almost surely, which completes step 4. 18

27 In summary, note that Rn = R cc n + R cd n + R dc n + R dd n, Theorem is true. 3.3 Asymptotic Behavior of B n In the case when H θ and J are both continuous, in [11], by letting J = l θ,θ, one can see of interest is the asymptotic behavior of the statistic Rn defined by Rn = 1 n J(Fn(X n j ), Gn(Y j )) (3.8) j=1 where J(, ) is a function on (0, 1) 2. Genest et al. [11] have shown that Bn E(l θ,θ {θ, F (X), G(Y )}) as n almost surely. In the present case, the appropriate statistic is similar in form to Rn but with a significant difference, H θ involves one or more jump points. Therefore, Rn can be written of the form as in (3.5), with J = l θ,θ. Almost sure convergence of Rn is established in the following theorem: Theorem Let J = l θ,θ, r(u) = u(1 u), δ > 0, p and q are positive numbers satisfying 1/p + 1/q = 1. Let a and b be numbers given by a = ( 1 + δ)/p and b = ( 1 + δ)/q. Consider the conditions (C1) J cc (u 2, v 2 ) M 1 r(u 2 ) a r(v 2 ) b, (C2) J cd (u 2, v 1, v 2 ) M 2 r(u 2 ) a (independent of v 1 and v 2 ), and (C3) J dc (u 1, u 2, v 2 ) M 3 r(v 2 ) b (independent of u 1 and u 2 ). (C4) In a small neighborhood of each (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) [0, 1]4, for h = 1, 2,..., d F and l = 1, 2,..., d G, J dd (u 1, u 2, v 1, v 2 ) is finite. Under conditions (C1-C4), Rn β almost surely where β = E[N cc (X, Y ) + N cd (X, Y ) + N dc (X, Y ) + N dd (X, Y )], (3.9) 19

28 with E[N S (X, Y )] defined as in (3.7), and S = {cc}, {cd}, {dc}. Since the proof is similar to that in Theorem 3.2.1, it is omitted here. 3.4 Asymptotic Behavior of A n By taking J = l θ in (3.5), we have the following Theorem: Theorem Let r(u) = u(1 u), δ > 0, p and q be as in Theorem Let Ju S and Jv S be the partial derivatives of J S with respect to u and v, respectively, for S = {cc}, {cd}, {dc}, and {dd}. Also, let a and b be numbers given by a = ( δ)/p and b = ( δ)/q. Consider the conditions (D1) (D2) J cc (u 2, v 2 ) M 1 r(u 2 ) a r(v 2 ) b, with partial derivatives satisfying J cc u 2 (u 2, v 2 ) M 2 r(u 2 ) a 1 r(v 2 ) b and J cc v 2 (u 2, v 2 ) M 3 r(u 2 ) a r(v 2 ) b 1, J cd (u 2, v 1, v 2 ) M 4 r(u 2 ) a with Ju cd (u 2 2, v 1, v 2 ) M 5 r(u 2 ) a 1, v2 c θ (u 2, v)dv M 6 r(u 2 ) a, v 1 { } (0,1) { h S F h } c Jη cd (u 2, G(D Gl ), G(D Gl S )) c θ (u 2, v)dv du 2 < l Gl with η = v 1 or v 2, Ju cd (u 2 2, v 1, v 2 ) is continuous w.r.t. u 2 on C(F ) almost surely, J cd v 1 (u 2, v 1, v 2 ) and J cd v 2 (u 2, v 1, v 2 ) are continuous w.r.t. v 1 and v 2 respectively in the small neighborhoods of rectangles C(F ) (G(D Gl ), G(D Gl )] almost surely for l = 1,..., D G, 20

29 (D3) J dc (u 1, u 2, v 2 ) M 7 r(v 2 ) b with Jv dc (u 2 1, u 2, v 2 ) M 8 r(v 2 ) b 1, u2 c θ (u, v 2 )du M 9 r(v 2 ) b, u 1 { } (0,1) { l S Gl } c Jη dc (F (D F h ), F (D F h ), v 2 S ) c θ (u, v 2 )du dv 2 < h F h with η = u 1 or u 2, Jv dc (u 2 1, u 2, v 2 ) is continuous w.r.t. v 2 on C(G) almost surely, J dc u 1 (u 1, u 2, v 2 ) and J dc u 2 (u 1, u 2, v 2 ) are continuous w.r.t. u 1 and u 2 respectively in the small neighborhoods of rectangles (F (D F h ), F (D F h )] C(G) almost surely for h = 1,..., D F, (D4) J dd u 1 (u 1,,, ), J dd u 2 (, u 2,, ), J dd v 1 (,, v 1, ), and J dd v 2 (,,, v 2 ) are continuous w.r.t. u 1, u 2, v 1 and v 2, respectively, in a small neighborhood of (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) for any h = 1,..., D F and l = 1,..., D G. Under conditions (D1-D4), n 1/2 (Rn κ) N(0, σ 2 ) in distribution as n, where κ is defined in (3.7), and σ 2 [ = var M cc (X, Y ) + M cd (X, Y ) + M dc (X, Y ) + M dd ] (X, Y ), (3.10) with M cc (x, y), M cd (x, y), M dc (x, y) and M dd (x, y) are given as below, respectively, J cc (F (x), G(y))I {x C(F ),y C(G)} + {R\{ h D F h }} {R\{ l D Gl }} I {x x } Jcc u (F (x ), G(y ))dh 2 θ (x, y ) + {R\{ h D F h }} {R\{ l D Gl }} I {y y } Jcc v (F (x ), G(y ))dh 2 θ (x, y ), (3.11) 21

30 d G J cd (F (x), G(D Gl ), G(D Gl ))I {x C(F ), y=d Gl } l=1 d G + R\{ l=1 h D F h } I {x x } Jcd u (F (x ), G(D 2 Gl ), G(D Gl ))dh θ (x, D Gl ) d G + R\{ l=1 h D F h } I {y<d Gl } Jcd v (F (x ), G(D 1 Gl ), G(D Gl ))dh θ (x, D Gl ) d G + l=1 R\{ h D F h } I {y D Gl } Jcd v (F (x ), G(D 2 Gl ), G(D Gl ))dh θ (x, D Gl ), d F J dc (F (D F h ), F (D F h ), G(y))I {y C(G)} h=1 d F + h=1 R\{ l D Gl } I {y y } Jdc v (F (D 2 F h ), F (D F h ), G(y ))dh θ (D F h, y ) d F + R\{ h=1 l D Gl } I {x<d F h } Jdc u (F (D 1 F h ), F (D F h ), G(y ))dh θ (D F h, y ) d F + h=1 R\{ l D Gl } I {x D F h } Jdc u (F (D 2 F h ), F (D F h ), G(y ))dh θ (D F h, y ), J dd (F (D F h ), F (D F h ), G(D Gl ), G(D Gl ))I {x=d F h, y=d Gl } h,l + [ I {y<dgl } Jdd v (F (D 1 F h ), F (D F h ), G(D Gl ), G(D Gl )) h,l + I {y DGl } Jdd v 2 (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) + I {x<df h } Jdd u 1 (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) + I {x DF h } Jdd u 2 (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) ] P (X = D F h, Y = D Gl ). In the case when both H θ and J are continuous, Genest et al. [11] showed that the statistic Rn 22

31 was a special case of multivariate rank order statistics whose asymptotic behavior was thoroughly studied by Ruymgaart et al. [41], Rumyaart [42] and Rüschendorf [40]. For continuous H θ and J, Genest et al. [11] proposed regularity conditions ensuring almost sure convergence and asymptotic normality. In the present case, H θ involves one or more jump points, which causes the J function to be discontinuous on (0, 1) 2 as well. In the rest of this Chapter, we will provide a scratch proof of Theorem Without loss of generality, we will show the theorem is true in the case that there is only one discontinuity point in both F and G, which should be completed in the subsequent sections, using arguments similar to those used in [41]. We shall need the following empirical processes Un(F (x)) = n 1/2 (Fn(x) F (x)), Un(F (x )) = n 1/2 (Fn(x ) F (x )), Vn(G(y)) = n 1/2 (Gn(y) G(y))), Vn(G(y )) = n 1/2 (Gn(y ) G(y )). Note that P (Ω 0 ) = P ({ω : Fn(F 1 (F )) = Fn, Gn(G 1 (G)) = Gn, for all x, y and n}) = 1. (3.12) The above identities, Fn(F 1 (F )) = Fn and Gn(G 1 (G)) = Gn are true even for the case when there are jump points. At the jump point of F, one can see that F 1 (F (D F )) = D F. The similar result holds for G as well Asymptotic normality of R cc n For small positive γ define the set Ωγn = {ω : sup F n F < γ/2, sup Gn G < γ/2}. (3.13) 23

32 For ω Ω 0 Ωγn, the Mean Value Theorem gives n 1/2 J cc (Fn, ) = n 1/2 J cc (F, ) + Un(F )J cc u 2 (Φn, ), for all x C(F ). In the formula above Φn is defined by Φn = F + η(fn F ), where η = η(ω, x, n) is a number between 0 and 1. Let F = [X 1n, D F ) (D F, Xnn], G = [Y 1n, D G ) (D G, Ynn], where X 1n ( or Y 1n ) = min 1 j n X j ( or Y j ), Xnn( or Ynn) = max 1 j n X j ( or Y j ). Note that where n 1/2 (R cc n µ cc ) = 3 4 A in + Bγ in + B 5n + Cn, i=1 i=1 µ cc = E[N cc (X, Y )] with l θ,θ replaced by l θ in (3.9), A 1n = A 2n = A 3n = B γ 1n = n 1/2 {R\D F } {R\D G } Jcc (F (x), G(y))d[Hn(x, y) H θ (x, y)], {R\D F } {R\D G } U n(f (x))j cc u 2 (F (x), G(y))dH θ (x, y), {R\D F } {R\D G } V n(g(y))j cc v 2 (F (x), G(y))dH θ (x, y), χ(ω c γn ){n1/2 {R\D F } {R\D G } [Jcc (Fn(x), G(y)) J cc (F (x), G(y))]dHn(x, y) A 2n }, 24

33 B γ 2n = χ(ωγn ) {R\D F } {R\D G } U n(f (x))[ju cc (Φn(x), G(y)) 2 Ju cc ] (F (x), G(y)) dhn(x, y), 2 B γ 3n = χ(ωγn ) { F G } {{R\D F } {R\D G }} U n(f (x)) J cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)], B γ 4n = χ(ωγn ) { { F G } {{R\D F } {R\D G }} U n(f (x)) J cc u 2 (F (x), G(y))dH θ (x, y) A 2n }, B 5n = n 1/2 {R\D F } {R\D G } [Jcc (F (x), Gn(y)) J cc (F (x), G(y))] dhn(x, y) A 3n, Cn = n 1/2 {R\D F } {R\D G } [Jcc (Fn(x), Gn(y)) J cc (Fn(x), G(y)) J cc (F (x), Gn(y)) + J cc (F (x), G(y))] dhn(x, y), where χ(ωγn ) denotes the indicator function of Ω γn ; H n(x, y) is the empirical cumulative distribution of H θ (x, y). Note that with a further decomposition of Cn, one has C γ 1n = χ(ω c γn )n1/2 {R\D F } {R\D G } [Jcc (Fn(x), Gn(y)) J cc (Fn(x), G(y)) C γ 2n = J cc (F (x), Gn(y)) + J cc (F (x), G(y))] dhn(x, y), χ(ωγn ) {R\D F } {R\D G } U n(f (x))[ju cc (Φn(x), Gn(y)) 2 Ju cc ] (Φn(x), G(y)) dhn(x, y). 2 We start with a few lemmas to be used in the proof. 25

34 Lemma For any ξ 0 and function r(u) = u(1 u), the function r(u) ξ is symmetric about 1 ( 2, decreasing on 0, 1 ] and has the property that for each β in (0, 1) there exits a constant 2 M = M β such that and Proof of Lemma can be found in [41]. r(βs) ξ Mr(s) ξ for 0 < s 1 2, r(1 β(1 s)) ξ Mr(s) ξ for 1 2 < s < 1. Lemma Let Φn and Ψn be functions on n1 and n2, where [X 1n, Xnn], if Xnn D df n1 = [X 1n, Xnn), if Xnn = D df and respectively, satisfying n2 = [Y 1n, Ynn], if Ynn D dg [Y 1n, Ynn), if Ynn = D dg min(f, Fn) Φn max(f, Fn) and where defined. Then uniformly for n = 1, 2,, min(g, Gn) Ψn max(g, Gn) Proof: (i) (ii) sup r(φn) ξ r(f ) ξ = Op(1), for each ξ 0, n1 sup r(ψn) η r(g) η = Op(1), for each η 0. n2 It suffices to prove (i). From Lemma A.3 in [45] and (3.12), it follows that one needs to show that for each ɛ > 0, there exists a constant β = βɛ in (0, 1) such that P (Ωn) = P ({ω : βf Fn 1 β(1 F ) any x(ω) on n1 }) > 1 ɛ, (3.14) 26

35 for all n and F. Let A = {ω : βf Fn 1 β(1 F ) any x(ω) n1 C(F )}, and B = {ω : βf Fn 1 β(1 F ) any x(ω) n1 J (F )}. Note that in our case, P ({ω : βf Fn 1 β(1 F ) any x(ω) on n1 }) = P (A B) = P (A) + P (B) = P ({ω : x(ω) n1 C(F )}) P ({ω : βf Fn 1 β(1 F ) x(ω) n1 C(F )}) +P ({ω : x(ω) n1 J (F )}) P ({ω : βf Fn 1 β(1 F ) x(ω) n1 J (F )}). In Lemma 6.1 in [41] (3.14) has been verified for all continuous F for a constant β = β ɛ, which gives us P ({ω : βf Fn 1 β(1 F ) x(ω) C(F ) n1 }) > 1 ɛ. Noting that P ({ω : x(ω) n1 C(F )}) + P ({ω : x(ω) n1 J (F )}) = 1, to prove (3.14) we only need to show when x(ω) n1 J (F ), P ({ω : βf Fn 1 β(1 F ) any x(ω) n1 J (F )}) > 1 ɛ is true for a constant β = β ɛ. This is a direct result of the Glivenko-Cantelli Theorem, which completes the proof. Lemma Uniformly in all F, we have (i) (ii) sup Un(F ) Un(F ) r(f ) ρ 1/2 p 0, as n, for each ρ 0, n1 sup (, )\{ h D F h } Un(F ) r(f ) ρ 1/2 = Op(1), as n, for each ρ 0, where Un(F (x)) = n 1/2 ( ˆFn(x) F (x)), and ˆFn is the empirical distribution of F. Note that Fn n was defined to be n + 1 ˆFn. Proof: (i) Note that Un(F ) Un(F ) r(f ) ρ 1/2 = n 1/2 ˆFn n + 1 r(f )ρ 1/2, 27

36 and that for any fixed β (0, 1), we have r ( ) β ρ 1/2 ( = r 1 β ρ 1/2 = O(n n n) ρ+1/2 ). Since F (X i ) are i.i.d. uniform random variables, given any arbitrary ɛ > 0, we can choose a β = βɛ in (0, 1) such that P ( β n F (X 1n ) F (X nn) 1 β ) > 1 ɛ n for all n and all F with F (D F ) 1, which is obvious since = = ( β P n F (X 1n ) F (X nn) 1 β ) n ( 1 β n β ) n n ( 1 2β ) n e 2β > 1 ɛ, n for sufficiently small β. Combining with all above results, we proved (i). (ii) follows from (i) and (iii) of Lemma 4.2 in [41]. Proof completed. Proof of Asymptotic normality of R cc n. (1) Note that A 1n can be written as A 1n = n 1/2 n A 1jn, j=1 where A 1jn = J cc (F (X j ), G(Y j )) I {j A cc} µcc. The A 1jn are i.i.d. with mean zero. By 28

37 the assumption (D1) and the Hölder s inequity, we have {R\D F } {R\D G } Jcc (F (x), G(y)) 2+δ 0dH θ (x, y) {R\D F } {R\D G } M2+δ 0 1 r(f (x)) a(2+δ 0 ) r(g(y)) b(2+δ 0 ) dh θ (x, y) [ ] 1/p0 [ ] 1/q0 M 2+δ 0 1 (0,1) r(u)a(2+δ 0 )p 0du (0,1) r(v)b(2+δ 0 )q 0dv [ ] 1/p0 [ ] 1/q0 = M 2+δ 0 1 (0,1) r(u)( 1 2 +δ)(2+δ 0 ) du (0,1) r(v)( 1 2 +δ)(2+δ 0 ) dv <, for some selected p 0, q 0 and δ satisfying (1 δ)(2 + δ 0 ) < 1, which means that A 1jn has a finite absolute moment of order 2+δ 0 for some δ 0 > 0. Moreover, the term on the left hand is uniformly bounded above of order 2 + δ 0. Using the CLT, we get the asymptotic normality of A 1n. (2) Note that A 2n can be written as A 2n = n 1/2 {R\D F } {R\D G } ( ˆFn(x) F (x))j cc u 2 (F (x), G(y))dH θ (x, y) +n 1/2 {R\D F } {R\D G } (F n(x) ˆFn(x))J cc u 2 (F (x), G(y))dH θ (x, y) = A 2n1 + A 2n2, where ˆFn(x) is the empirical distribution function of X. Let 0 if x < X j φ Xj (x) = 1 if x X j Then A 2n1 can be written as n 1/2 n A 2jn, where A 2jn = j=1 {R\D F } {R\D G } (φ X (x) F (x))j cc j u (F (x), G(y))dH 2 θ (x, y) are i.i.d. with mean zero. Note that φ Xj (x) F (x) 1. Under the assumption (D1), we have A 2jn M 2 {R\D F } {R\D G } ra 1 (F (x))r b (G(y))dH θ (x, y). 29

38 Note that {R\D F } {R\D G } ra 1 (F (x))r b (G(y))dH θ (x, y) < uniformly as long as we can find some p 1 and q 1 satisfying 1/p 1 + 1/q 1 = 1 and (a 1)p 1 > 1 and bq 1 > 1. Thus, we have shown that A 2n1 has an absolute moment of order 2 + δ 1, which leads to the asymptotic normality of A 2n1. Note that with the same p 1 and q 1 A 2n2 = n1/2 n + 1 {R\D F } {R\D G } n1/2 n + 1 sup x ˆFn(x) = op(1). Therefore, we have the asymptotic normality of A 2n. (3) Similarly, note that A 3n can be written as ˆFn(x)J cc u 2 (F (x), G(y))dH θ (x, y) {R\D F } {R\D G } Jcc u 2 (F (x), G(y))dH θ (x, y) A 3n = n 1/2 {R\D F } {R\D G } (Ĝn(y) G(y))J cc v 2 (F (x), G(y))dH θ (x, y) +n 1/2 {R\D F } {R\D G } (G n(y) = A 3n1 + A 3n2, Ĝn(y))J cc v 2 (F (x), G(y))dH θ (x, y) where Ĝn(y) is the empirical distribution function of Y. Similarly, the A 3n1 can be written as n 1/2 n A 3jn, where A 3jn = j=1 {R\D F } {R\D G } (φ Y (y) j G(y))Jv cc (F (x), G(y))dH 2 θ (x, y) are i.i.d. with mean zero. The same asymptotic conclusion can be drawn for A 3n. n (4) Note that the A 1n + A 2n + A 3n can be written as n 1/2 (A 1jn + A 2jn + A 3jn ) + j=1 A n 2n2 + A 3n2 n 1/2 A 1jn + A 2n2 + A 3n2, where the A 1jn = A 1jn + A 2jn + A 3jn j=1 depends on (X j, Y j ) only, hence are i.i.d. with mean zero, and the terms A 2n2 p 0 and A 3n2 p 0 as n. Using the CLT, we have that the A 1n + A 2n + A 3n is asymptotically normally distributed with mean 0. The asymptotic negligibility of the B and C terms will be stated as corollaries. 30

39 Corollary 1. For fixed γ, B γ1n p 0 and C γ1n p 0 as n. Proof: Note that P (Ω c γn ) 0 in (3.13) for any H θ by the Glivenko-Cantelli Theorem, and because the distribution of sup x Fn(x) F (x) does not depend on H θ. This completes the proof. Corollary 2. For fixed γ, B γ2n p 0 and C γ2n p 0 as n. Proof: According to Lemma (ii) with ρ = 1, for given ɛ > 0, there exists a constant 2 M such that P (Ω 1n ) = P (ω : {sup x Un(F ) M and x C(F )}) > 1 ɛ (3.15) for all n and H θ, and M such that P (Ω 2n ) = P (ω : { sup n1 r ξ (Φn)r ξ (F ) M }) > 1 ɛ, which gives P (Ω 1n Ω 2n ) > 1 2ɛ. Also, χ(ω 1 n Ω 2n ) B γ2n M J sup cc F G u (Φn(x), G(y)) J cc 2 u (F (x), G(y)) 2. +χ(ω 1 n Ω 2n ) {(,X 1n ) (Xnn, )} {(,Y 1n ) (Ynn, )} U n(f (x)) [J cc u 2 (Φn(x), G(y)) J cc u 2 (F (x), G(y))]dHn(x, y) By the Glivenko-Cantelli Theorem, P ({(, X 1n ) (Xnn, )} {(, Y 1n ) (Ynn, )}) 0 for any H θ, so the second term on the right side of the inequality above converges to 0, as n. The function J cc u 2 (u 2, v 2 ) is uniformly continuous on (0, 1) 2. Since Φn F Fn F where Φn is defined, the Glivenko-Cantelli Theorem yields J sup cc F G u (Φn(x), G(y)) J cc 2 u (F (x), G(y)) 2 p 0 uniformly for H θ. Therefore, we have shown that B γ2n p 0 as n. A similar argument may be used for C γ2n. Proof completed. 31

40 Corollary 3. For fixed γ, B γ3n p 0 as n. Proof: Noting that { F G } {{R\D F } {R\D G }} = F G, B γ3n = χ(ωγn ) = χ(ωγn Ω 1n ) +χ(ωγn Ωc 1n ) F G Un(F (x))j cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)] F G Un(F (x))j cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)] F G Un(F (x))j cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)] (3.16) where Ω 1n is as defined in (3.15). Note that the second term in (3.16) converges to 0 in probability, and χ(ωγn Ω 1n ) F G Un(F (x))j cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)] χ(ωγn Ω 1n )M F G J cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)]. χ(ωγn Ω 1n )M M F G d[hn(x, y) H θ (x, y)] where M J = sup cc F G u (F (x), G(y)) 2. From the Theorem 1.9, (i), from [43], the above integral converges to 0 in probability as n. Proof completed. Corollary 4. For fixed γ, B γ4n p 0 as n. Proof: Note that B γ4n = χ(ωγn Ω 1n ) { F G Un(F (x))j cc u 2 (F (x), G(y))dH θ (x, y) A 2n } +χ(ωγn Ωc 1n ) { F G Un(F (x))j cc u 2 (F (x), G(y))dH θ (x, y) A 2n } 32

41 = χ(ωγn Ω 1n ) {(,X 1n ) (Xnn, )} {(,Y 1n ) (Ynn, )} U n(f (x)) J cc u 2 (F (x), G(y)) dh θ (x, y) +χ(ωγn Ωc 1n ) { F G Un(F (x))j cc u 2 (F (x), G(y))dH θ (x, y) A 2n } where Ω 1n is as defined in (3.15). By the Glivenko-Cantelli Theorem, P ({(, X 1n ) (Xnn, )} {(, Y 1n ) (Ynn, )}) 0 for any H θ. Combining with the fact that J cc u 2 is uniformly continuous on (0, 1) 2, the righthand side converges to 0, as n. To see B 5n converging to 0 in probability as n, let us notice that 4 B γin = n 1/2 i=1 {R\D F } {R\D G } [Jcc (F (x), Gn(y)) J cc (F (x), G(y))]dHn(x, y) A 2n. In summary, we have shown that all these B-terms and C-term are negligible. Combining with the results of these A-terms, we have established the asymptotic normality of R cc n Asymptotic normality of R cd n and R dc n It suffices to show the asymptotic normality of R cd n. Note that n 1/2 (R cd n µ cd ) = 4 6 A in + B in, i=1 i=1 where 33

42 µ cd = E[N cd (X, Y )], A 1n = n 1/2 {R\D F } Jcd (F (x), G(D G ), G(D G ))d[h n(x, D G ) H θ (x, D G )]; A 2n = A 3n = A 4n = B 1n = B 2n = B 3n = B 4n = B 5n = B 6n = {R\D F } U n(f (x))ju cd (F (x), G(D 2 G ), G(D G ))dh θ (x, D G ); {R\D F } V n(g(d G ))Jcd v (F (x), G(D 1 G ), G(D G ))dh θ (x, D G ); {R\D F } V n(g(d G ))J cd v 2 (F (x), G(D G ), G(D G ))dh θ (x, D G ); {R\D F } U n(f (x))ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] {R\D F } V n(g(d G ))Jcd v (F (x), Θn(D 1 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] {R\D F } V n(g(d G ))J cd v 2 (F (x), G(D G ), Ψ n(d G ))d[hn(x, D G ) H θ (x, D G )] {R\D F } U n(f (x))[j cd u 2 (Φn(x), Gn(D G ), G n(d G )) J cd u 2 (F (x), G(D G ), G(D G )) ] dh θ (x, D G ) {R\D F } V n(g(d G ))[Jcd v 1 (F (x), Θn(D G ), G n(d G )) J cd v 1 (F (x), G(D G ), G(D G )) ] dh θ (x, D G ) {R\D F } V n(g(d G ))[J cd v 2 (F (x), G(D G ), Ψ n(d G )) J cd v 2 (F (x), G(D G ), G(D G )) ] dh θ (x, D G ) where Φn( ), Θn( ), and Ψn( ) are defined by the Mean Value Theorem. Next, we will show that the A terms are asymptotic normal and the B terms converge to 0 in probability. 34

43 (1) Note that A 1n can be written as A 1n = n 1/2 n A 1jn, j=1 where A 1jn = J cd (F (X j ), G(D G ), G(DG )) I {j Acd } µcd. Note that A 1jn are i.i.d. with mean zero. Since J cd (u, v 1, v 2 ) = v2 θ log c θ (u, v)dv, v 1 using assumption (D2), we have {R\D F } Jcd (F (x), G(D G ), G(DG )) 2+δ 0dH θ (x, D G ) { } 1/p0 {R\D F } Jcd (F (x), G(D G ), G(DG )) (2+δ 0 )p 0dH θ (x, D G ) { } 1/q0 {R\D F } 1(2+δ 0 )q 0dH θ (x, D G ) { (0,1) S F c J cd (u, G(D G ), G(DG )) (2+δ 0 )p } 1/p0 0 c θ (u, v)dv du. S G Under the assumption (D2), (0,1) S F c J cd (u, v 1, v 2 ) (2+δ 0 )p 0 c θ (u, v)dv du S G M 4 (0,1) S F c r(u) a(2+δ 0 )p 0 c θ (u, v)dv du S G M 4 M 6 (0,1) S F c r(u) a(2+δ 0 )p 0r(u) a du <, if a(2 + δ 0 )p 0 + a > 1. By the CLT, we have the asymptotic normality of A 1n. 35

44 (2) Note that A 2n can be written as A 2n = n 1/2 {R\D F } ( ˆFn(x) F (x))j cd u 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) +n 1/2 {R\D F } (F n(x) ˆFn(x))J cd u 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) = A 2n1 + A 2n2. Furthermore, A 2n1 can be written as n 1/2 n j=1 A 2jn, where A 2jn = {R\D F } (φ X (x) j F (x))j cd u 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) are i.i.d. with mean 0. Using the assumption (D2), we have A 2jn M 5 (0,1) S F c M 5 M 6 (0,1) S c F r a 1 (u) c θ (u, v)dv du S G r 2a 1 (u)du < as long as (2a 1) > 1. Thus we have the asymptotic normality of A 2n1. Using the assumption (D2), we have A n 2n2 = ˆFn(x)J n + 1 u cd (F (x), G(D {R\D F } 2 G ), G(DG ))dh θ (x, D G ) n n + 1 sup x ˆFn(x) {R\D F } Jcd u (F (x), G(D 2 G ), G(DG ))dh θ (x, D G ) = op(1). Therefore, the asymptotic normality of A 2n has been established. (3) Note that A 3n can be written as A 3n = n 1/2 {R\D F } [Ĝn(D G ) G(DG )]J cd v 1 (F (x), G(D G ), G(DG ))dh θ (x, D G ) +n 1/2 {R\D F } [G n(d G ) Ĝn(D G )]J cd v 1 (F (x), G(D G ), G(DG ))dh θ (x, D G ) = A 3n1 + A 3n2. Similarly, A 3n1 can be written as n 1/2 n j=1 A 3jn, where A 3jn = {R\D F } (ψ Y (D j G ) 36

45 Gn(D G ))J cd v (F (x), G(D 1 G ), G(DG ))dh θ (x, D G ) are i.i.d. with mean zero. and 1 if Y j < y, ψ Yj (y) = 0 if Y j y. Noting that the absolute value of the random part ψ Yj (D G ) Gn(D G ) in A3jn is bounded above by 1, we have A 3jn (0,1) S F c Jv cd (u, G(D 1 G ), G(DG )) c θ (u, v)dv du. S G By the assumption (D2), the integral above is finite. Therefore, A 3jn has an absolute moment of order 2 + δ 1 for some δ 1 > 0, which leads to the asymptotic normality of A 3n1. Using argument similar to that used for A 2n2 as in (2), one can show that A 3n2 = o p(1). In summary, we have the asymptotic normality of A 3n. (4) Result of A 4n can be obtained in a similar way by noting that A 4n can be written as A 4n = n 1/2 {R\D F } [Ĝn(D G ) G(D G )]J cd v 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) +n 1/2 {R\D F } [G n(d G ) Ĝn(D G )]J cd v 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) = A 4n1 + A 4n2, and A n 4n1 can be written as n 1/2 A 4jn, where A 4jn = j=1 {R\D F } (φ Y (D j G ) Gn(D G ))Jv cd (F (x), G(D 2 G ), G(DG ))dh θ (x, D G ) are i.i.d. with mean zero. Using arguments similar to those in (3), we can get that A 4n is asymptotically normally distributed with mean 0. (5) Finally, we show that the sum of A in, i = 1,..., 4, converges to a normal random variable n with mean 0. Note that the sum can be written as n 1/2 (A 1jn + A 2jn + A 3jn + A 4jn ) + j=1 A n 2n2 + A 3n2 + A 4n2 n 1/2 A 2jn + A 2n2 + A 3n2 + A 4n2, where A 2jn = A 1jn + j=1 A 2jn + A 3jn + A 4jn depends on (X j, Y j ) only, hence are i.i.d. with mean 0 as shown before 37

46 (Similarly, we can define A 3jn for Rdc n ), and A 2n2, A 3n2 and A 4n2 are negligible. Using the CLT, we have that the sum of A in, i = 1,..., 4, is asymptotically normally distributed with mean 0. We will finish the proof of R cd n by a few corollaries. Corollary 5. B 1n p 0 as n. Proof: Note that B 1n = χ(ω 1n ) {R\D F } U n(f (x))ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] +χ(ω c 1n ) H θ (x, D G )] {R\D F } U n(f (x))j cd u 2 (Φn(x), Gn(D G ), G n(d G ))d[hn(x, D G ) where Ω 1n is defined in (3.15), which tells us the second term on the righthand side of the equality above goes to 0 in probability as n. Also, χ(ω 1n ) {R\D F } U n(f (x))ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] χ(ω 1n ) {R\D F } M Ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] = χ(ω 1n ) M Ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] + F χ(ω 1n ) {(,X 1n ) (Xnn, )} M Ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )], the second term on the righthand side of the inequality goes to 0 in probability as n by the Glivenko-Cantelli theorem and the assumption (D2). Since J cd u 2 (Φn(x), Gn(D G ), G n(d G )) is bounded above almost surely when x F and Hn(x, y) converges to H θ (x, y) in distribution as n, using Theorem 1.9 (i), [43], once again, we have the first term on the righthand side of the inequality above converges to 0. In summary, we have shown that B 1n p 0 as n. 38

47 Corollary 6. B 2n p 0 and B 3n p 0 as n. Proof: It suffices to show the result for B 2n. By the CLT, Vn(G(D G )) converges to a normal random variable as n, so we can define a set Ω 2n such that P (Ω 2n ) = P ({ω : sup ω Vn(G(D G )) M }) > 1 ɛ. Therefore, B 2n = χ(ω 2n ) {R\D F } V n(g(d G ))J cd v (F (x), Θn(D 1 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] +χ(ω c 2n ) H θ (x, D G )], {R\D F } V n(g(d G ))J cd v 1 (F (x), Θn(D G ), G n(d G ))d[hn(x, D G ) and the second term on the righthand side of the equality above goes to 0 in probability as n. Also, χ(ω 2n ) {R\D F } V n(g(d G ))J cd v (F (x), Θn(D 1 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] χ(ω 2n ) {R\D F } M Jv cd (F (x), Θn(D 1 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] = χ(ω 2n ) χ(ω 2n ) H θ (x, D G )], F M J cd v 1 (F (x), Θn(D G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] + {(,X 1n ) (Xnn, )} M J cd v 1 (F (x), Θn(D G ), G n(d G ))d[hn(x, D G ) the second term on the righthand side of the inequality goes to 0 in probability as n by the Glivenko-Cantelli theorem and the assumption (D2). From the assumption (D2), combining with the definition of Θn( ) and the fact that J cd u 2 (u 2, v 1, v 2 ) is continuous with respect to u 2, v 1, and v 2 on (0, 1) 3, we know that 39

48 J cd u 2 (F (x), Θn(D G ), G n(d G )) is bounded above almost surely for x F. Noting that Hn(x, y) converges to H θ (x, y) in distribution as n, using Theorem 1.9 (i), [43], once again, we have the first term on the righthand side of the inequality above converges to 0. In summary, we have shown that B 2n p 0 as n. Corollary 7. B 4n p 0 as n. Proof: Note that B 4n = χ(ω 1n ) {R\D F } U n(f (x))[ju cd (Φn(x), Gn(D 2 G ), G n(d G )) Ju cd (F (x), Gn(D ] 2 G ), G n(d G )) dh θ (x, D G ) +χ(ω c 1n ) {R\D F } U n(f (x))[j cd u 2 (Φn(x), Gn(D G ), G n(d G )) J cd u 2 (F (x), Gn(D G ), G n(d G )) ] dh θ (x, D G ) where Ω 1n is defined in (3.15). Therefore, the second term on the righthand side of the equality above goes to 0 in probability as n. Also, χ(ω 1n ) {R\D F } U n(f (x))[ju cd (Φn(x), Gn(D 2 G ), G n(d G )) Ju cd (F (x), Gn(D ] 2 G ), G n(d G )) dh θ (x, D G ) χ(ω 1n ) {R\D F } M Ju cd (Φn(x), Gn(D 2 G ), G n(d G )) Ju cd (F (x), Gn(D 2 G ), G n(d G )) dh θ (x, D G ) = χ(ω 1n ) M Ju cd (Φn(x), Gn(D 2 G ), G n(d G )) F Ju cd (F (x), Gn(D 2 G ), G n(d G )) dh θ (x, D G ) + χ(ω 1n ) {(,X 1n ) (Xnn, )} M Ju cd (Φn(x), Gn(D 2 G ), G n(d G )) Ju cd (F (x), Gn(D 2 G ), G n(d G )) dh θ (x, D G ), 40

49 the second term on the righthand side of the inequality goes to 0 in probability as n by the Glivenko-Cantelli theorem and the assumption (D2). Since J cd u 2 (Φn(x), Gn(D G ), G n(d G )) J cd u 2 (F (x), Gn(D G ), G n(d G )) is bounded above almost surely when x F and Hn(x, y) converges to H θ (x, y) in distribution as n, using Theorem 1.9 (i), [43], once again, we have the first term on the righthand side of the inequality above converges to 0. In summary, we have shown that B 4n p 0 as n. Corollary 8. B 5n p 0 and B 6n p 0 as n. Proof: Using similar arguments to those used in the proof of Corollary 7 and Corollary 8, one can see this is true, so proof omitted. In summary, we have the asymptotic normality of R cd n Asymptotic normality of R dd n Note that where n 1/2 (R dd n µ dd ) = µ dd = E[N dd (X, Y )], 5 4 A kn + B kn, k=1 k=1 A 1n = A 2n = A 3n = n 1/2 J dd (F (D F ), F (D F ), G(D G ), G(D G ))[dhdd n dh dd ], Vn(G(D G ))J dd v 2 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd, Un(F (D F ))J dd u 2 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd, A 4n = Vn(G(D G ))Jdd v 1 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd, A 5n = Un(F (D F ))Jdd u 1 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd, 41

50 B 1 = Vn(G(D G ))[Jv dd (F (D 2 F ), F (D F ), G(D G ), Ψ n)dhn dd J dd v 2 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd ], B 2 = Un(F (D F ))[Ju dd (F (D 2 F ), Φ n, G(D G ), G(D G ))dhdd n J dd u 2 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd ], B 3 = Vn(G(D G ))[Jdd v (F (D 1 F ), F (D F ), Θ n, G(D G ))dhn dd J dd u 1 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd ], B 4 = Un(F (D F ))[Jdd u (Γn, F (D 1 F ), G(D G ), G(D G ))dhdd n J dd u 1 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd ], with n dd = {the number of observations such that X j = D F and Y j = D G } for k = 1, 2,..., n, and dh dd n = n dd n dh dd = P (X = D F and Y = D G ). We need to show that the A-terms are asymptotically normal, and the B-terms converge to 0 in probability as n 0. By the SLLN, it is obvious that dh dd n dh dd almost surely as n, and by the CLT, in distribution Un(F (D F )) N(0, [F (D F )(1 F (D F ))]) Un(F (D F )) N(0, [F (D F )(1 F (D F ))]) Vn(G(D G )) N(0, [G(D G )(1 G(D G ))]) Vn(G(D G )) N(0, [G(D G )(1 G(D G ))]) n 1/2 [dh dd n dh dd ] N(0, [dh dd (1 dh dd )]) 42

51 as n. Note that J dd (F (D F ), F (D F ), G(D G ), G(D G )), J dd v 2 (F (D F ), F (D F ), G(D G ), G(D G )), J dd u 2 (F (D F ), F (D F ), G(D G ), G(D G )), Jv dd (F (D 1 F ), F (D F ), G(D G ), G(D G )), and Ju dd (F (D 1 F ), F (D F ), G(D G ), G(D G )), in A 1n, A 2n, A 3n, A 4n and A 5n, respectively, are fixed numbers. Therefore, the A-terms are asymptotically normally distributed with mean 0. To show the sum of the A-terms is still normal, one only need to notice that Un(F (D F )) = n1/2 ( ˆFn(D F ) F (D F )) + n1/2 (Fn(D F ) ˆFn(D F )) = n 1/2 n (ψ Xj (D F ) F (D F )) + o p(1) j=1 = n 1/2 n H 1kn + o p(1) j=1 Un(F (D F )) = n 1/2 ( ˆFn(D F ) F (D F )) + n 1/2 (Fn(D F ) ˆFn(D F )) = n 1/2 n (φ Xj (D F ) F (D F )) + op(1) j=1 = n 1/2 n H 2kn + o p(1) j=1 Vn(G(D G )) = n1/2 (Ĝn(D G ) G(D G )) + n1/2 (Gn(D G ) Ĝn(D G )) = n 1/2 n (ψ Yj (D G ) G(D G )) + o p(1) j=1 = n 1/2 n H 3kn + o p(1) j=1 Vn(G(D G )) = n 1/2 (Ĝn(D G ) G(D G )) + n 1/2 (Gn(D G ) Ĝn(D G )) = n 1/2 n (φ Yj (D G ) G(D G )) + op(1) j=1 = n 1/2 n H 4kn + o p(1) j=1 43

52 and n 1/2 [dh dd n dh dd ] = n 1/2 n j=1 ( I {j Add } dhdd) = n 1/2 n H 5kn, j=1 where H ikn are i.i.d. and depend on (X j, Y j ) only, for k = 1,..., 5. The A 1n + A 2n + A 3n + n A 4n + A 5n can be written as n 1/2 A 4jn, where A 4jn are i.i.d. sum of products of H ikn j=1 and some certain fixed number, and four negligible terms. Using the CLT one more time, we have that the sum of A-terms is asymptotically normally distributed with mean 0. Under the assumption (D4), B i converges to 0 as n, for i = 1,..., 4, in probability. We have shown that the R dd n is asymptotically normally distributed with mean Asymptotic normality of R n To show that n 1/2 Rn = n 1/2 (R cc n + R cd n + R dc n + R dd n ) is asymptotic normal, we need the following notations: n 1/2 (R cc n µ cc ) R cc N(0, var(m cc (X, Y ))), n 1/2 (R cd n µ cd ) R cd N(0, var(m cd (X, Y ))), Noting that n 1/2 (R dc n µ dc ) R dc N(0, var(m dc (X, Y ))), n 1/2 (R dd n µ dd ) R dd N(0, var(m dd (X, Y ))). n 1/2 (R cc n µ cc ) = n 1/2 (R cd n µ cd ) = n 1/2 (R dc n µ dc ) = n 1/2 (R dd n µ dd ) = n j=1 n j=1 n j=1 n j=1 n 1/2 A 1jn + a few negligible terms, n 1/2 A 2jn + a few negligible terms, n 1/2 A 3jn + a few negligible terms, n 1/2 A 4jn + a few negligible terms, 44

53 by the CLT, we have n 1/2 n (Rn µ) = n 1/2 (A 1jn + A 2jn + A 3jn + A 4jn ) + a few negligible terms, j=1 N(0, σ 2 ) where µ = (µ cc + µ cd + µ dc + µ dd ), and σ 2 = var[m cc (X, Y ) + M cd (X, Y ) + M dc (X, Y ) + M dd (X, Y )]. 3.5 Proof of the Main Result We finish this Chapter by providing the proof of Theorem Proof of Theorem As shown in Section 3.3, Bn β almost surely, and in Section 3.4 An is asymptotically normal with mean 0 and variance σ 2, as n. Using the Slutsky s Theorem, n 1/2 (ˆθn θ) = n 1/2 An/Bn is asymptotically normal with mean 0 and variance ϱ = σ 2 /β 2. In summary, we have shown the consistency and asymptotic normality of the semi-parametric estimator ˆθn. 45

54 CHAPTER 4 A VARIANCE ESTIMATOR OF BIVARIATE DISTRIBUTIONS In Chapter 3, we provided an explicit formula for ϱ = σ2 β 2. Suppose estimators ˆσ2 and ˆβ 2 could be found for σ 2 and β 2 respectively. A rough-and-ready estimator of the asymptotic variance of ϱ would then be given by ˆϱ = ˆσ2. If the variables given in (3.9) and (3.10) could be observed, one ˆβ 2 could simply estimate σ 2 and β by the respective sample variance and sample mean. As this is not possible, the corresponding pseudo-observations can be used instead, which are defined in term of Cn, the re-scaled empirical copula function of the bivariate sample, namely Cn(u, v) = 1 n I n {. Fn(X j=1 j ) u,gn(y j ) v} Let l θ,θ cc, lcd θ,θ, ldc θ,θ, and ldd θ,θ be the decomposed components of l θ,θ with respect to c θ in (2.7) under different situations. Similarly, one can define l θ cc, lcd θ, ldc θ, and ldd θ for l θ. From (3.9), we have the following: ˆβ = 1 n [ n j=1 ˆN cc j + ˆN cd j + 46 ˆN dc j + ˆN dd j ], (4.1)

55 j (X j, Y j ) (2.1,1.0) (2.5, 0.5) (3.2,1.0) (3.2,1.5) (3.2,2.0) (3.5,2.5) (3.7,2.5) R X(j) R Y (j) Table 4.1. Relationship among (X j, Y j ), R X(j), and R Y (j). where ˆN cc j ˆN cd j ˆN dc j ˆN dd j = I {j A cc} lcc θ,θ (ˆθn, Fn(X j ), Gn(Y j )) = I {j Acd } lcd θ,θ (ˆθn, Fn(X j ), Gn(Y j ), G n(y j )) = I {j Adc } ldc θ,θ (ˆθn, Fn(X j ), F n(x j ), Gn(Y j )) = I {j Add } ldd θ,θ (ˆθn, Fn(X j ), F n(x j ), Gn(Y j ), G n(y j )). To get ˆσ 2 for σ 2 in (3.11), we need more notation. Note that rearranging {X j, Y j }, j = 1,..., n, shall not change the value of ˆσ 2. Therefore, we assume the sample is in an order such that X j s are in non-decreasing order. Let R X(j) = i I {Xi X j } be the rank of X j in the sequence of X s (Similarly, we can define R Y (j) for Y s), and let R X(j) = i I {Xi <X j } be the next lower rank to R X(j) within the X s sequence. Similarly, we can define R Y (j) for R Y (j) within the Y s sequence. Under the current setting, for any i < j, the following always holds: R X(i) R X(j). 47

56 Table 4.1 gives a simple example showing the relationship among (X j, Y j ), R X(j), and R Y (j). As shown in the table, R X(3) = R X(4) = R X(5) = 5, and R X(5) = R X(4) = R X(3) = R X(2) = 2 by definition. Now we can work on the details. Note that in (3.10), σ 2 is the variance of the sum of M S (X, Y ), where S = {cc}, {cd}, {dc}, or{dd}. For a given sample (X j, Y j ), σ 2 can be estimated by the sample variance of the following items: ˆM cc j ˆM cd j ˆM dc j = l θ cc(ˆθn, Fn(X j ), Gn(Y j ))I {j A cc} + 1 n l n θ,u cc 2 k=j + ( ) R Y (k) R l cc R X(k) R Y (k) Y (j) θ,v ˆθn, 2 n + 1, n + 1 d G = l=1 + 1 d G n n l=1 k=j d G + l=1 + d G l=1 d F = h=1 + 1 d F n h=1 ( ) R X(k) R Y (k) ˆθn, n + 1, n + 1 I {k A cc} l cd θ (ˆθn, Fn(X j ), Gn(Y j ), G n(y j ))I {j Acd, Y j =D Gl } l cd θ,u 2 R Y (k) >R Y (j) l cd θ,v 1 R Y (k) R Y (j) l cd θ,v 2 R R X(k) ˆθn, n + 1, Y (k) R Y (k) n + 1, I n + 1 {k Acd, Y k =D Gl } R R X(k) ˆθn, n + 1, Y (k) R Y (k) n + 1, I n + 1 {k Acd, Y k =D Gl } R R X(k) ˆθn, n + 1, Y (k) R Y (k) n + 1, I n + 1 {k Acd, Y k =D Gl } l dc θ (ˆθn, Fn(X j ), F n(x j ), Gn(Y j ))I {j Adc, X j =D F h } l dc R X(k) R X(k) θ,v ˆθn, 2 n + 1, R Y (k) R Y (j) R Y (k) n + 1, n + 1 I {k Adc, X k =D F h } 48

57 ˆM dd j = d F + l dc R X(k) R X(k) R Y (k) θ,u ˆθn, 1 n + 1, n + 1, n + 1 h=1 R X(k) >R X(j) I {k Adc, X k =D F h } d F n + l θ,u dc ˆθn, 2 h=1 k=j d F d G h=1 d F d G + h=1 l=1 R X(k) n + 1, R X(k) R Y (k) n + 1, I n + 1 {k Adc, X k =D F h } l θ dd (ˆθn, Fn(X j ), F n(x j ), Gn(Y j ), G n(y j ))I {Xj =D F h, Y j =D Gl } l=1 I {Xk =D F h, Y k =D Gl } d F d G + h=1 l=1 n l θ,v dd 1 R Y (k) >R Y (j) n l θ,v dd 2 R Y (k) R Y (j) I {Xk =D F h, Y k =D Gl } d F d G + h=1 l=1 n l θ,u dd 1 R X(k) >R X(j) I {Xk =D F h, Y k =D Gl } ˆθn, R n + 1, R X(k) n + 1, R X(k) Y (k) R Y (k) n + 1, n + 1 R X(k) R R X(k) ˆθn, n + 1, n + 1, Y (k) R Y (k) n + 1, n + 1 R X(k) R R X(k) ˆθn, n + 1, n + 1, Y (k) R Y (k) n + 1, n + 1 d F d G n + l θ,u2 ˆθn, h=1 l=1 k=j I {Xk =D F h, Y k =D Gl } n hl n, R n + 1, R X(k) n + 1, R X(k) Y (k) R Y (k) n + 1, n + 1 where n hl = {the number of observations (X j, Y j ) : X j = D F h, Y j = D Gl }. Next, to establish the consistency of ˆϱn, it suffices to show that ˆσ 2 n and ˆβ 2 n are themselves consistent. To prove that ˆβn β converges almost surely, for example, express the difference as 1 n n Jˆθn (F n(x j ), F n(x j ), Gn(Y j ), G n(y j )) E[J θ (F (X ), F (X), G(Y ), G(Y ))] j=1 49

58 in terms of J θ = l θ,θ. By the triangle inequality, this quantity is no greater than 1 n n Jˆθn (F n(x j ), F n(x j ), Gn(Y j ), G n(y j )) j=1 J θ (Fn(X j ), F n(x j ), Gn(Y j ), G n(y j )) + 1 n J n θ (Fn(X j ), F n(x j ), Gn(Y j ), G n(y j )) E[J θ (F (X ), F (X), G(Y ), G(Y ))] j=1 Assuming that J θ is bounded by an integrable function in a neighborhood of the true value of θ, by the convergence of maximum likelihood estimators, the first summand then converges to zero by the Dominated Convergence Theorem. Since the second term vanishes asymptotically by a similar argument as used in Chapter 3, it follows that ˆβn β almost surely. The argument for ˆσn is similar, though somewhat more involved. It will not be presented here. In summary, we provided a variance estimator of ϱ, ˆϱ, and illustrated its consistency. 50

59 CHAPTER 5 EXTENSIONS TO HIGHER DIMENSIONS The previous developments extend more or less automatically to situations where it is desired to estimate a multidimensional dependence semi-parametrically. Let a boldfaced letter such as x denotes a D-tuple (D 2) vector of real numbers, that is, x (x 1,..., x D ) T where x k R for k = 1,..., D, and F 1,..., F D denote D univariate mixed marginals on the real line given by d k d F k (x k ) = p kh I {Dkh x k } + (1 k p kh ) f k (w)dw, w x h=1 h=1 k where D kh is the h-th jump point of F k with P (X k = D kh ) = p kh, for h = 1,..., d k, and f k ( ) is a continuous density function with support on the real line. When restricted to a particular copula family, say {C θ : θ = (θ 1,..., θq) T A R q }, a multivariate joint distribution function for (x 1,..., x D ) T can be defined as follows: F D (x 1,..., x D ) = C θ (F 1 (x 1 ),..., F D (x D )). (5.1) Result 1: F D (x 1,..., x D ) defined in (5.1) is a valid joint distribution function on R D. Proof: For F D (x 1,..., x D ) to be a valid distribution function on R D, the following four conditions should be satisfied: 51

60 (1) 0 F D (x 1,..., x D ) 1 for all (x 1,..., x D ), (2) F D (x 1,..., x D ) 0 as max(x 1,..., x D ), (3) F D (x 1,..., x D ) 1 as min(x 1,..., x D ) +, and (4) for every pair of (x 1,..., x D ) and (y 1,..., y D ) R D with x k y k for k = 1,..., D, which defines a D-box V = [x 1, y 1 ] [x 2, y 2 ]... [x d, y D ]. Then sgn(c)f (c) 0 where the sum is taken over all vertices c of V and sgn( ) is defined as in (1.3). Note that since F D (x 1,..., x D ) = C θ (F 1 (x 1 ),..., F D (x D )) = P (U 1 F 1 (x 1 ),..., U D F D (x D )), (1) follows. To prove (2), note that F k (x k ) 0 for all k = 1,..., D when max(x 1,..., x D ). Hence, F D (x 1,..., x D ) 0 from the property of a cumulative distribution function. (3) follows similarly. (4) is the direct result of (1.3) and (5.1). Proof completed. Define C(F k ) to be the collection of all points of continuity of F k and J (F k ) = {D k1,..., D k dk } to be the collection of all jump points of F k. For a fixed x R D, every component of x corresponds to either a point of continuity or discontinuity of the corresponding marginal distribution. Suppose the indexes c 1, c 2,..., c H {1, 2,..., D} corresponds to xc h C(Fc ) (i.e., h xc belongs to the set of continuity points of h Fc ) h and the remaining indexes, say d 1, d 2,..., d L, with H + L = D, are the indexes of marginals such that x dl = D dl,i dl, l = 1, 2,..., L. When {c 1, c 2,..., c H } = {1, 2,..., D}, the density of F D is given by df D D = c θ (F 1 (x 1 ),..., F D (x D )) f k (x k ), k=1 52

61 and if {c 1, c 2,..., c H } {1, 2,..., D} the density of F D is given by df D =... L l=1 [F d (D ),F l d l,i dl (D dl,i )] c θ (u 1,..., u D )du H d...du 1 dl fc h (xc h ), dl dl h=1 where uc h = Fc h (xc h ). Note that the above can be written as c H θ (F 1,...F D )(x) fc h (xc h ). h=1 Letting X j = (X 1j, X 2j,..., X Dj ) T where j = 1,..., n represent a random sample from F D, the semi-parametric estimator ˆθn of θ would then be obtained as a solution of the system 1 n n j=1 θ i log[df D (X 1j,..., X Dj )] = 0, (1 i q) (5.2) Since θ is the only vector of parameters of interest, the components of likelihood that matters is 1 n n j=1 log[c θ θ (F 1,...F D )(X 1j,..., X Dj )] = 0, (1 i q) (5.3) i Using the same techniques as in Chapter 3, Theorem can be extended to the D- dimensional case. Furthermore, the limiting variance-covariance matrix of n 1/2 (ˆθn θ) is then B 1 ΣB 1, where B is the information matrix associated with C θ. To give the explicit form of Σ, we need some notations: Let F p k (x k ) = F k (x k ) u k2, if x k C(F k ), (F k (x k ), F k (x k )) (u k1, u k2 ), if x k J (F k ). Note that in the present case, the relevant statistic in (5.3) can be written as, for 1 i q, R i,n = 1 n J n i (F 1 (X 1j ), F 1 (X 1j ),..., F D (X Dj ), F D (X Dj )) j=1 = 1 n J Q n i (F 1 p (X 1j ),..., F p D (X Dj )), Q j=1 53

62 where Q = {Q 1,..., Q D } is a D-sequence consisting of letters of {c} or {d}, with the following properties: (i) when Q is combined with a point x and the marginals F 1,..., F D, say x Q(F 1,..., F D ), a c at the k-th component of Q refers to x k belonging to C(F k ), and a d at the k-th component refers to x k belonging to J (F k ), for k = 1,..., D; (ii) when Q is combined with J i or J i s derivative with respect to u k1 or u k2, a c at the k-th component refers to F p k (X kj ) = F k (X kj ), and a d at the k-th component refers to F p k (X kj ) = (F k (X kj ), F k (X kj )), for k = 1,..., D. Using the same techniques as those used in Chapter 3, one can see that Σ is the variancecovariance matrix of the q-dimensional random vector whose i-th component is given by var( Q M Q i (X 1,..., X D )); the following is a general form of M Q i (X 1,..., X D )). Assuming Q = {c 1, c 2,..., c H } {d 1, d 2,..., d L }, and L and H can be any value among {0,..., D}, such that H + L = D. Then M Q i (x 1,..., x D ) =... J Q i (F 1 p (x 1 ),,..., F p D (x D )) H I {x c i d1 i h C(Fc )} h dl h=1 L D l=1 I {x dl =D dl i } + L ik. dl k=1 The set of jump points for F dl is {D dl 1,..., D d l d dl }, and L ik is given as below: (i) if the k-th component in Q = c; L ik = i d1... i dl {R\{ d c 1 n=1 D c 1 n}} J Q iu k2 (F p 1 (x 1 ),..., F p D (x D ))df (x 1,..., x D )... {R\{ d I(x c H k x n=1 D k ) c H n}} with x d l = D dl i dl ; 54

63 (ii) if the k-th component in Q = d; L ik =... i d1 i dl {R\{ d c 1 n=1 D c 1 n}}... {R\{ d I(x c H k < x n=1 D k ) c H n}} J Q iu (F p k1 1 (x 1 ),..., F p D (x D ))df (x 1,..., x D ) +... i d1 i {R\{ d c... 1 dl n=1 D c 1 n}} {R\{ d c I(x H k x n=1 D k ) c H n}} J Q iu k2 (F p 1 (x 1 ),..., F p D (x D ))df (x 1,..., x D ), with x d = D l dl i. dl As the parallel with the case q = 1 and D = 2 described earlier in Chapter 2 and Chapter 3, an estimator of the variance-covariance matrix of ˆθn could be found by repeating the procedure described in Chapter 4. 55

64 CHAPTER 6 JOINT DISTRIBUTIONS USING t-copula 6.1 Joint Distributions Using t-copula In this Chapter, we develop the joint distributions with the family of t-copulas, see, [34], and [8]. The reason we choose t-copulas is not only it is a generalization of the Gaussian copulas, but also it has the great tail dependence property. This property allows us to study the limiting association between random variables X and Y as both x and y go to their boundaries. Our finding is that the limiting association is governed by the correlation coefficient ρ together with the degrees of freedom ν, which is listed in Lemma In reality, there are situations where random variables are still associated in a certain level even in the tails. For instance, in biometric recognition, the genuine and imposter distributions generated from the same biometric trait or different biometric traits in a multimodal biometric system, tend to have not exact the same but similar tail behavior, as shown in Chapter 7, Section The t-copula models can fit in this case more appropriately than a copula family which does not have the tail dependency. Before defining the t-copula, we develop some notations for the presentation. Let Σ denote a positive definite matrix of dimension D D. For such a Σ, the D-dimensional t density with ν 56

65 degrees of freedom will be denoted by f D ν,σ (x) Γ( ν+d 2 ) (πν) D 2 Γ( ν 2 ) Σ 1 2 ( 1 + xt Σ 1 x ν ) ν+d 2, with corresponding cumulative distribution function t D ν,σ w x (x) fd ν,σ (w) dw. The matrix Σ with unit diagonal entries corresponds to a correlation matrix and will be denoted by R. The D-dimensional t-copula function is given by C ν,r (u) w t 1 ν (u) fd ν,r (w) dw (6.1) where t 1 ν (u) (t 1 ν (u 1 ),..., t 1 ν (u D )) T, t 1 ν is the inverse of the cumulative distribution function of univariate t with ν degrees of freedom, and R is a D D correlation matrix. Note that C ν,r (u) = P{X t 1 ν (u)}, for X = (X 1,..., X D ) T distributed as t D ν,r, demonstrating that C ν,r (u) is a distribution function on (0, 1) D with uniform marginals. The density corresponding to C ν,r (u) is given by c ν,r (u) = D C ν,r (u) u 1 u 2... u D = f ν,r D {t 1 ν (u)} Dk=1 fν{t 1 ν (u k )}, (6.2) where fν in (6.2) is the density of the univariate t distribution with ν degrees of freedom. We consider the joint distributions of the form F ν,r D (x) = C ν,r {F 1 (x 1 ),..., F D (x D )} (6.3) with C ν,r defined as in (6.1). It follows from the properties of a copula function that F ν,r D is a valid multivariate distribution function on R D. The identifiability of the marginal distributions F k, k = 1,..., D, the correlation matrix R, and the degrees of freedom parameter, ν, are established in the following theorem: 57

66 Theorem Let F D ν 1,R 1 and G D ν 2,R 2 denote two distribution functions on R D obtained from equation (6.3) with marginal distributions F k, k = 1,..., D and G k, k = 1,..., D, respectively. Suppose we have F D ν 1,R 1 (x) = G D ν 2,R 2 (x) for all x. Then, F k (x) = G k (x) for all k, ν 1 = ν 2 and R 1 = R 2. In order to prove identifiability of (ν, R) in Theorem 6.1.1, we first must state and prove several lemmas. Lemma Fix ν. Let F D ν,r 1 and G D ν,r 2 denote two cumulative distribution functions on R D obtained from equation (6.3) with marginal distributions F k, k = 1,..., D and G k, k = 1,..., D, respectively. Suppose we have for all x. Then, F k (x) = G k (x) for all k and R 1 = R 2. F D ν,r 1 (x) = G D ν,r 2 (x) (6.4) Proof: By taking x i, i k tending to, we get that F D ν,r 1 (x) F k (x k ) and G D ν,r 2 (x) G k (x k ). It follows that F k (x) = G k (x) for all k. Next we show that the correlation matrices R 1 and R 2 are equal. We first prove this result for D = 2. Note that when D = 2, R 1 and R 2 can be determined by one correlation parameter, namely, ρ 1 and ρ 2, respectively, so, we have Fν,ρ 2 (x) = 1 Cν,ρ 1 {F 1 (x 1 ), F 2 (x 2 )} w t 1 ν (v) f2 ν,ρ (w) dw (6.5) 1 and G 2 ν,ρ (x) = 2 Cν,ρ 2 {F 1 (x 1 ), F 2 (x 2 )} w t 1 ν (v) f2 ν,ρ (w) dw (6.6) 2 where v = (F 1 (x 1 ), F 2 (x 2 )) T, so, one only needs to prove ρ 1 = ρ 2. Now, for any real numbers a and b, the bivariate t-cumulative distribution function, t 2 ν,ρ(a, b), with ν degrees of freedom and correlation parameter ρ is a strictly increasing function of ρ. Note that t 2 ν,ρ(a, b) = = a b f2 ν,ρ(m, n)dm dn a b φ(m, n, ρ, w) gν(w) dw dm dn, 0 58

67 where { 1 φ(m, n, ρ, w) = 2πw(1 ρ 2 exp m2 2ρmn + n 2 } ) 1/2 2w(1 ρ 2 ) and gν(w) is the probability density function of the inverse Gamma distribution defined as Differentiating t 2 ν,ρ(a, b) with respect to ρ, we get t 2 ν,ρ(a, b) ρ = = = gν(w 2 ) IG( ν 2, 2 ). (6.7) ν a b φ(m, n, ρ, w) gν(w) dw dm dn 0 ρ a b w d2 φ(m, n, ρ, w) gν(w) dw dm dn 0 dm dn w φ(m, n, ρ, w) gν(w) dw > 0, 0 using the fact that φ(m, n, ρ, w)/dρ = w 2 φ(m, n, ρ, w)/ m n. t 1 ν Note that (6.5) and (6.6) can be re-written as F 2 ν,ρ j (x) = t 2 ν,ρ j (a, b) for j = 1, 2 with a = (F 1 (x 1 )) and b = t 1 ν (F 2 (x 2 )). So, Fν,ρ 2 (x) = F 2 1 ν,ρ (x), which gives 2 t 2 ν,ρ 1 (a, b) = t 2 ν,ρ 2 (a, b). Since when ν is fixed, for any a and b, t 2 ν,ρ j (a, b) is an strictly increasing function of ρ, the equality above implies that ρ 1 = ρ 2 must hold. The proof is completed. Now, we prove Lemma for general D. Let R 1 = ((ρ kk,1 )) and R 2 = ((ρ kk,2 )) be the representation of the correlation matrices R 1 and R 2 in terms of their entries. Note that the condition (6.4) can be reduced to the case D = 2 by taking x i, i k, k tending to infinity. Thus, we have F 2 ν,ρ kk,1 (x k, x k ) = G 2 ν,ρ kk,2 (x k, x k ), but this implies ρ kk,1 = ρ kk,2 from the case D = 2. Next, we state and prove two more relevant lemmas: Lemma For a > 0, tν{ (aν) 1/2 } is a decreasing function of ν. Proof: Let Y tν( ). Then Z = Y/ν 1/2 has pdf fν(z) = [Γ{(ν + 1)/2}/{Γ(ν/2)}] ( 1 + z 2) (ν+1)/2 π 1/2. One can see that if ν1 ν 2, {fν 2 ( z )/fν 1 ( z )} is a strictly decreasing function of z. Since I { Z a 1/2 } is a nondecreasing function of a, mimicking the 59

68 proof of Lemma 2 in [28], one can prove that Eν(I { Z a 1/2 } ) is a decreasing function of ν. It follows that tν{ (aν) 1/2 } = P{Y (aν) 1/2 } = P{Z a 1/2 } = Eν(I { Z a 1/2 } ) 2 is decreasing in ν. Lemma For the bivariate t-copula Cν,ρ(u, v), let λ = lim v 0 + Then, it follows that { } C ν,ρ(u, v) v u=v λ { = t ν+1 [ (ν + 1) 1 ρ } ] 1/2 1 + ρ and µ = lim v 0 + { } C ν,ρ(u, v) v u=1 v. and µ { = t ν+1 [ (ν + 1) 1 + ρ } ] 1/2. 1 ρ Proof: It is not hard to see that if (X, Y ) T t 2 ν,ρ, then given Y = y, Therefore ( ) ν + 1 1/2 X ρy ν + y 2 (1 ρ 2 ) 1/2 t ν+1. (6.8) Cν,ρ(u, v) = P(X t 1 ν (u), Y t 1 ν (v)) t 1 [ ν (v) ( ) ν + 1 1/2 {t 1 }] = t ν+1 ν + y 2 ν (u) ρy (1 ρ 2 ) 1/2 fν(y) dy. Taking derivative with respect to v, we get { } 1/2 { Cν,ρ(u, v) = t ν + 1 t 1 v ν+1 ν + t 1 ν (u) ρt 1 } ν (v) ν (v) 2 (1 ρ 2 ) 1/2 { } 1/2 { = t ν + 1 t 1 ν+1 ν + t 1 ν (v) ρt 1 } ν (v) ν (v) 2 (1 ρ 2 ) 1/2, putting u = v { } 1/2 { = t ν + 1 t 1 } ν+1 ν + t 1 ν (v)(1 ρ) ν (v) 2 (1 ρ 2 ) 1/2 { t ν+1 [ (ν + 1) 1 ρ } ] 1/2 1 + ρ 60

69 as v 0+. The other expression can be derived similarly. Remark: Lemma tells the property of tail dependence in t-copulas. Proof of Theorem Using similar arguments as before, we only need to prove Theorem for D = 2. It easily follows by taking x i for i k, that F k (x k ) = G k (x k ) for k = 1, 2. It follows that Cν 1,ρ 1 (u, v) = Cν 2,ρ 2 (u, v) (6.9) for all pairs of (u, v) of the form (F 1 (x 1 ), F 2 (x 2 )). Thus, (6.9) holds for all values of (u, v) in (0, 1) 2 as in the case when both marginals are continuous. Nevertheless, since both marginals have only a finite number of discontinuities, there are infinite values of (u, v) for which the limits and derivatives in Lemma can be applied to obtain λ and µ. Since (6.9) holds, we must have λ 1 = λ 2 and µ 1 = µ 2. Now without loss of generality, assume ρ 1 ρ 2 and ρ 1 ρ 2, from the equality λ 1 = λ 2 and Lemma 6.1.2, we must have ν 1 ν 2. On the other hand, the equality µ 1 = µ 2 gives ν 1 ν 2. Hence, ρ 1 ρ 2 implies that ν 1 = ν 2 = ν, say. Now, using Lemma 6.1.1, we get ρ 1 = ρ 2. The proof of theorem is completed. Remark: Note that as the degrees of freedom ν, the t-distribution converges asymptotically to a Gaussian distribution, so does the t-copula. For the Gaussian copulas, the counterpart of Theorem also holds. Therefore, we have the following lemma: Lemma Let F R D and G D 1 R denote two distribution functions on R D in the Gaussian 2 copula family with marginal distributions F k, k = 1,..., D and G k, k = 1,..., D, respectively. Suppose we have F R D (x) = G D 1 R (x) for all x. Then, F 2 k (x) = G k (x) for all k, and R 1 = R 2. Proof: Without loss of generality, we prove Lemma for the case that D = 2, i.e., ρ 1 = ρ 2. Notice that it suffices to prove that for any fixed (a, b) R 2, Pρ(Z 1 a, Z 2 b) = Φ 2,ρ (a, b) is increasing in ρ, where Z i N(0, 1), i = 1, 2, with correlation coefficient ρ. 61

70 Note that which gives us Φ 2,ρ (a, b) = ρ = = a b Φ 2,ρ (a, b) = φ ρ(z 1, z 2 )dz 1 dz 2, a { b φ ρ(z 1, z 2 )dz 1 dz 2 a b [ 1 2π (1 ρ 2 ) exp a z2 1 2ρz 1 z 2 +z2 2 2(1 ρ 2 ) b } ρ 1 ρ 2 + (z 1 ρ z 2 )(z 2 ρ z 1 ) (1 ρ 2 ) 2 dz 1 dz 2 2 φρ(z 1, z 2 ) z 1 z 2 dz 1 dz 2 ] = φρ(a, b) which is always positive. Therefore, we have proved the uniqueness of ρ. The proof of Lemma is completed. Remark: It is not hard to see that for the Gaussian copulas, the tail dependence between random variables no longer exists. Using notations defined in Chapter 5, the density of F D ν,r at a fixed point x RD, df D ν,r (x), is a function on R D that satisfies F ν,r D w x (x) = df ν,r D (w), (6.10) where df ν,r D (x) is given by... L l=1 [F d (D ),F l d l,i dl (D dl,i )] c ν,r (u 1,..., u D )du d,..., du 1 dl dl dl H fc h (xc ), h h=1 where uc h = Fc h (xc h ), and H and L could be any value among {0, 1,..., D} such that H + L = D. Note that the above density can be generally written as c H (ν, R, F 1,..., F D )(x) fc h (xc ). (6.11) h h=1 62

71 6.2 Estimation of R and ν Estimation of R for fixed ν Let X 1,..., Xn be n independent and identically distributed D-dimensional random vectors arising from the joint distribution F R,ν D in (6.3), ˆF kn denote the empirical distribution function of F k and F kn = n ˆF kn /(n + 1). From (6.11), the log-likelihood function corresponding to the n i.i.d. observations X j, j = 1,..., n, is given by τ(ν, R) = = n log df ν,r D (X j ) j=1 n log c (ν, R, F 1,..., F D )(X j ) j=1 (6.12) For fixed ν, our estimator of R is taken to be the maximizer of ˆτ(ν, R), that is, ˆR(ν) = arg max R ˆτ(ν, R). (6.13) Note that there are two main challenges in maximizing the likelihood ˆτ(ν, R): Firstly, ˆτ(ν, R) involves several integrals corresponding to discrete components in X j, j = 1,..., n; c is not available in a closed form. Secondly, ˆτ(ν, R) needs to be maximized over all D D correlation matrices. The space of all correlation matrices is a compact space of [ 1, 1] D2 since all entries of a correlation matrix lie between 1 and 1. However, this space is not easy to maximize over due to the constraint of positive definiteness placed on correlation matrices. The EM Algorithm The difficulties mentioned can be overcome with the use of the EM algorithm; see, for example, [31]. The EM algorithm is a well-known algorithm used to find the maximum likelihood estimator (MLE) of a parameter θ based on data y distributed according to the likelihood l obs ( y; θ). In 63

72 many situations, obtaining the MLE, ˆθ MLE = arg max θ l obs ( y; θ), via maximization of l obs ( y; θ) over θ turns out to be difficult. In such cases, the observed likelihood can usually be expressed in terms of an integral over missing components of a complete likelihood; in other words, if z and lcom( y, z; θ), respectively, denote the missing observations and the complete likelihood corresponding to (y, z), it follows that l obs ( y; θ) = Z(y) l com( y, z; θ) dz where Z(y) is the range of integration of z subject to the observed data being y. The EM algorithm is an iterative procedure that can be formulated in two steps: First, (i) the E-step, where the quantity Q(θ (N), θ) = E π( z θ (N),y) {log lcom( y, z; θ)} (6.14) is formed; the expectation in (6.14) is taken with respect to the conditional distribution of Z given y when evaluated at a particular value, θ (N), and the conditional distribution of Z given y is given by π( z θ, y) = l com( y, z; θ). (6.15) l obs (y; θ) Second, (ii) the M-step, where Q(θ (N), θ) is maximized with respect to θ to obtain θ (N+1), that is, θ (N+1) = arg max θ Q(θ (N), θ). Starting from an initial value θ (0), the sequence θ (N), N 0 converges to ˆθ MLE under suitable regularity conditions. The integrals in ˆτ(ν, R) corresponding to the discrete components can be formulated as missing components of a complete likelihood. Recall that the notations c h and d l were used to denote the discrete and continuous components in the vector x. We now extend this notation to represent discrete and continuous components in the j-th observation vector X j, j = 1,..., n. For j = 64

73 1,..., n, let d lj, l = 1,..., L j and c hj, h = 1,..., H j, respectively, denote the discrete and continuous components of X j with respect to the corresponding marginal distributions. Next, define the vector u j in the following way: The d lj -th component of u j, u dlj, is a number that is allowed to vary between [F dlj (x d lj ), F dlj (x dlj )], that is, u lj [F dlj (x d lj ), F dlj (x dlj )] S lj, say, for l = 1,..., L j. The c hj -th component of u j, u c hj, is taken to be uc hj = Fc hj (xc hj ) for h = 1,..., H j. In that case, we have c (ν, R, F 1,..., F D )(X j ) =... S 1j S L j c D ν,r (u j ) du d 1j... du dl j Making the transformation z j = t 1 ν (u j ), the likelihood corresponding to the j-th observation in (6.12) can be written as c (ν, R, F 1,..., F D )(X j ) = NUM j /DENOM j, where NUM j = t 1 ν (S 1j )... t 1 ν (S L ) fd ν,r (z j ) dz d... dz 1j dl j j and H j DENOM j = fν(zc hj ), h=1 where zc = t 1 hj ν {Fc hj (xc )} and t 1 hj ν (S lj ) = [t 1 ν {F dlj (x )}, t d 1 ν {F dlj (x dlj )}]. We lj make several important observations. First, where the maximization of R is concerned for fixed ν, it is enough to consider the NUM j terms. Thus, in the EM framework, we define NUM j to be the observed likelihood corresponding to the x j : l obs (x j ; ν, R) = t 1 ν (S 1j )... t 1 ν (S L ) fd ν,r (z j ) dz d... dz 1j dl. (6.16) j j If the variables z dlj j = 1,..., L j in (6.16) are treated as missing, the complete likelihood corresponding to the j-th observation becomes L lcom(x j, Z j ; ν, R) = f ν,r D (z j ) j I {zdlj t 1 l=1 ν (S lj )}, 65

74 where Z j = { z dlj, j = 1,..., L j }. Next, we note that the t density, fd ν,r, is an infinite scale mixture of Gaussian densities, namely, where f D ν,r (z j ) = φ D R,σ j (z) = σ 2 j =0 φd R,σ j (z j ) gν(σ 2 j ) dσ2 j 1 (2π) D 2 σ D j R 1 2 exp zt R 1 z 2σ 2 j and the mixing distribution on σ j is the inverse Gamma distribution as defined in (6.7). In other words, an extra missing component can be added into the E-step, namely, the mixing parameter, σ j. The complete likelihood specification for the j-th observation now becomes L lcom(x j, Z j, σ j ; ν, R) = φ D R,σ (z j j ) gν(σ j 2 ) j I {zdlj t 1 l=1 ν (S lj )}. (6.17) The conditional distribution π required to obtain Q in (6.14) is defined as where n π(z j, σ j, j = 1,..., n; ν, R ) = π j (Z j, σ j ; ν, R ) j=1 π j (Z j, σ j ; ν, R ) = l com(x j, Z j, σ j ; ν, R ) ; (6.18) l obs (x j ; ν, R ) see (6.15). Two distributions derived from (6.18) will be used subsequently: (i) the conditional distribution of σ j given Z j, given by π j (σ j 2 Z ν + D j, ν, R) = IG 2, ν + zt j R 1 z j 2 and (ii) the marginal of Z j s (after integrating out σ j ), given by 1, f ν,r D (z j ) L j l=1 I {z dlj t 1 ν (t 1 ν (S lj )} π j (Z j ; ν, R) =. t 1 ν (S 1j ) t 1 ν (S L ) fd ν,r (z j ) dz d dz 1j dl j j 66

75 The E-step entails that the expected value of the logarithm of the complete likelihood (6.17) is taken with respect to the missing components, Z j, and the N-th iterate of R, R (N) : E π(zj,σ j ; ν,r (N) ) {logl com(x j, Z j, σ j ; ν, R ) } E j {loglcom(x j, Z j, σ j ; ν, R )}, where π j (Z j, σ j ; ν, R ) is as defined in (6.18). The expected value can be simplified to z T j E j R 1 z j 2σ j 2 1 log R (6.19) 2 plus other terms that do not involve the correlation matrix R, and hence are irrelevant for the subsequent M-step. The expectation in (6.19) is taken in two steps, namely, E j = E j1 E j2 where E j2 is the conditional expectation of σ j given Z j, and E j1 is the expectation with respect to the marginal of Z j. On taking E j2, the expression in (6.19) simplifies to ν + D 2 where W (N) = ((w (N) kk z j T R 1 z j E j1 ν + z j T (R(N) ) 1 z j = ν + D 2 ( tr R 1 W (N)), )) is a D D matrix with entries z kj z k j ν + z j T (R(N) ) 1 z π j (Z j ; ν, R(N) ) dz j j w (N) kk = for z j = (z 1j,..., z Dj ) T. The last integral is approximated by numerical integration on a grid. We partition each interval t 1 ν (S lj ), l = 1,..., L j into a large number of subintervals and evaluate the integrand on the partition points. Finally, a Reimann sum is obtained as an approximation to the integral. The M-step consists of maximizing the complete likelihood function (6.19), z T j E j R 1 z j 2σ j D log R = ν tr(r 1 W (N) ) 1 log R, (6.20) with respect to the correlation matrix R. Since we have to maximize the objective function in the space of all correlation matrices, this is somewhat a difficult task. We adopt the methodology 67

76 presented by Barnard et al. [1] where an iterative procedure to maximize (6.20) is developed by considering the maximization of one element of R, say ρ, each time. In order to preserve the positive definiteness of R, one can show (see Barnard et al. [1] for details) that ρ should lie in an interval [ρ l, ρu]. The lower and upper limits of this interval are derived from the fact that in order for R to be positive definite, it is both necessary and sufficient that all the principal submatrices of R are positive definite. This is equivalent to a non-negativity condition on the determinant of each principal submatrix, which translates to an interval for the range of values of ρ. This procedure is repeated for the other elements of R and cycled until convergence before going to the (N + 1)-st iteration of the EM algorithm Selection of the Degrees of Freedom, ν The above procedure is carried out for a collection of degrees of freedom ν A, where A is finite. For each fixed ν, we obtain the estimator of the correlation matrix ˆR(ν) based on the EM algorithm above. We select the degrees of freedom in the following way: Select ˆν such that ˆν = arg max ν A 1 n ˆτ{ν, ˆR(ν)}. (6.21) 6.3 Summary In this Chapter, we introduced the t-copula family in R D, and showed that for any joint distribution F D in R D, there exists a unique pair (ν, R) such that F ν,r D (x) = C ν,r (F 1 (x 1 ),..., F D (x D )) within this family. Furthermore, we developed a semi-parametric technique to estimate the unknown pair (ν, R) using EM algorithm. The application of this approach will be presented in the next chapter. 68

77 CHAPTER 7 SIMULATION AND REAL DATA RESULTS 7.1 Simulation Results The results in this section are based on simulated data of four experiments. The first three are for the bivariate case, whereas the fourth is for a trivariate distribution. The true distributions for the observations X j, j = 1,..., n, in all experiments are of the type as defined in (6.3). In experiment 1, ν 0 = in (6.3) corresponding to the Gaussian copula function. We choose ρ = The two mixed marginals, F 1 and F 2, have the following cumulative distribution Sample size, n Average Absolute Relative Average Coverage Bias Absolute Bias % 92.4 % % 91.6 % % 92.3 % % 93.2 % % 94.7 % Table 7.1. Simulation results for Experiment 1 with ν 0 = and ρ = The absolute bias and relative absolute bias of the estimator ˆρn are provided, together with the empirical coverage of the approximate 95% confidence interval for ρ based on the asymptotic normality of ˆρn. 69

78 functions: F 1 (x 1 ) = 0.3I {x1 0.2} + 0.7φ(x 1 ), and F 2 (x 2 ) = 0.2I {x2 0.1} + 0.8φ(x 2 ), where φ( ) is the standard normal density function. Thus, we have d 1 = d 2 = 1 with D 11 = 0.2, and D 21 = 0.1 with probabilities p 11 = 0.3, and p 21 = 0.2, respectively. The set of jump points are J (F 1 ) = {0.2} and J (F 2 ) = {0.1} with continuous components of F 1 and F 2 corresponding to f 1 (x) = f 2 (x) = φ(x). The sample size n is taken from n = 500 to n = 2500 in increments of 500. In each trial, we simulate n observations from F, (x 1, x 2 ) and estimate ρ within the Gaussian copulas using the methodology presented in Section 6.2. Table 7.1 contains the simulation result from Experiment 1. For each sample size, the experiment was repeated 1,000 times. The average absolute bias and relative average absolute bias of the estimator ˆρn are reported, together with the empirical coverage of the approximate 95% confidence interval for ρ based on the asymptotic normality of ˆρn. Experiment 2 consists of the following choices: ν 0 = in (6.3) corresponding to the Gaussian copula function. We choose ρ 0 = The two mixed marginals, F 1 and F 2, have the following cumulative distribution functions: F 1 (x 1 ) = 0.25I {x1 0.2} + 0.6t 10 (x 1 ) I {x 1 0.7}, and F 2 (x 2 ) = 0.2I {x2 0.3} + 0.5t 10 (x 2 ) + 0.3I {x 2 0.6}. Thus, we have d 1 = d 2 = 2 with D 11 = 0.2, D 12 = 0.7, D 21 = 0.3 and D 22 = 0.6 with probabilities p 11 = 0.25, p 12 = 0.15, p 21 = 0.2 and p 22 = 0.3, respectively. The set of jump points are J (F 1 ) = {0.2, 0.7} and J (F 2 ) = {0.3, 0.6} with continuous components of F 1 and F 2 corresponding to f 1 (x) = f 2 (x) = t 10 (x). 70

79 df =3 df=5 df=10 df=15 df=20 df=25 normal Figure 7.1. Density curves for t distribution with degrees of freedom ν = 3, 5, 10, 15, 20, 25 and normal distribution. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. For now, we select the set for the degrees of freedom of the t-copula to be A = {3, 5, 10, } for illustrative purposes. Note that we choose ν 0 values in A so that the corresponding t-distributions are significantly different from one another. Figure 7.1 gives the t-densities for several ν 0 values, including values in A. There exist significant gaps between the t-density curves corresponding to ν 0 = {3, 5, 10}, but relatively smaller gaps for ν 0 = {15, 20, 25, }. Table 7.2 provides the L 1 - distances between some t- distributions in Figure 7.1. The sample size n is taken from n = 500 to n = 2500 in increments of 500. In each trial, we 71

80 ρ 0 ν ρ 0 = ρ 0 = Table 7.2. L 1 distances for paired values of ν 0 corresponding to two values of ρ 0, 0.20 and Sample size, n Percentage of times Mean ˆρ(ˆν) MSE(ˆρ(ˆν)) ˆν = ν % % % % % Table 7.3. Simulation results for Experiment 2 with ν 0 = and ρ 0 = simulate n observations from F, (x 1, x 2 ) and estimate ρ 0 and ν 0 based on the methodology presented in Section 6.2. The trial is repeated 50 times. The simulation results, including percentage of times (out of 50) that the true value of ν 0, mean of ˆρ(ˆν), and the MSE of ˆρ(ˆν) is chosen, are presented in Table 7.3. In Experiment 3, we took ν 0 = 10. The two generalized marginal distributions are the same as in Experiment 2. The correlation parameter ρ 0 were selected to be 0.20 and 0.75, respectively. The results are presented in Table 7.4. From the entries of Table 7.3 and Table 7.4, we see that the estimation procedure is more effective in selecting the true degrees of freedom when ν 0 = compared to ν 0 = 10. The reason of being that is the distribution corresponding to ν 0 = is further away from all the other candidate distributions in A. Also, the estimation procedure is less effected by the value of ρ 0 as illustrated by the percentage of times ˆν = ν 0 column in Table

81 ρ 0 Sample size, n Percentage of times Mean ˆρ(ˆν) MSE(ˆρ(ˆν)) ˆν = ν % % ρ 0 = % % % % % ρ 0 = % % % Table 7.4. Simulation results for Experiment 3 with ν 0 = 10. The two correlation values considered are ρ 0 = 0.2 and ρ 0 = sample percentage mean mean mean total size of getting ρˆ 1 (ˆν) ρˆ 2 (ˆν) ρˆ 3 (ˆν) MSE true ν % % % % % Table 7.5. Simulation results for Experiment 4. 73

82 In Experiment 4, we took D = 3 and ν 0 = 10. The first two marginal distributions are the same as before. The third marginal distribution is taken to be the t-distribution with 10 degrees of freedom (thus, having no points of discontinuity). We took the correlation matrix R 0 as R 0 = = ρ 1 ρ 2 ρ 1 1 ρ 3, say. (7.1) ρ 2 ρ For different sample sizes, the experiment were repeated 20 times (instead of 50) to reduce computational time. The estimators of ρ i, ˆρ i, i = 1, 2, 3, were obtained based on the iterative procedure outlined in Section 6.2. Since the maximization step involves another loop within the M-step, the objective function was maximized over ρ-intervals in steps of 0.01 to reduce computational time. The iterative procedure within the M-step was not required when D = 2 which enabled us to maximize the objective function over a finer grid (steps of ). The results are given in Table 7.5; note that (i) the estimators converge and (ii) the MSE reduces as n tends to infinity. 7.2 Application to Multimodal Fusion in Biometric Recognition Introduction Biometric recognition refers to the automatic identification of individuals based on their biological or behavioral characteristics [20]. In recent years, recognition of an individual based on his/her biometric trait has become an increasingly important method for testing you are who you claim you are, see, for example, [18] and [29]. Biometric recognition is more reliable compared to traditional approaches, such as password-based or token-based approaches, as biometric traits cannot be easily stolen or forgotten. Some examples of biometric traits include fingerprint, face, signature, voice and hand geometry (See Figure 7.2). A number of commercial recognition systems based on 74

83 (a) (b) (c) (d) (e) (f) (g) (h) Figure 7.2. Some examples of biometric traits: (a) fingerprint, (b) iris scan, (c) face scan, (d) signature, (e) voice, (f) hand geometry, (g) retina, and (h) ear. these traits have been deployed and are currently in use. Biometric technology has now become a viable alternative to traditional government applications (e.g., US-VISIT program [48] and the proposed biometric passport which is capable of storing biometric information of the owner in a chip inside the passport). With increasing applications involving human-computer interactions, there is a growing need for recognition techniques that are reliable and secure. Recognition of an individual can be viewed as a test of statistical hypothesis. Based on the biometric input Q and a claimed identity Ic, we would like to test H 0 : I t = Ic vs. H 1 : I t Ic, (7.2) where I t is the true identity of the user. The testing in (7.2) is performed by a matcher which computes a similarity measure, S(Q, T ), based on Q and T ; large (respectively, small) values of S indicate that T and Q are close to (far 75

Semi-parametric Distributional Models for Biometric Fusion

Semi-parametric Distributional Models for Biometric Fusion Semi-parametric Distributional Models for Biometric Fusion Wenmei Huang and Sarat C. Dass Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48824, USA Abstract The

More information

Robustness of a semiparametric estimator of a copula

Robustness of a semiparametric estimator of a copula Robustness of a semiparametric estimator of a copula Gunky Kim a, Mervyn J. Silvapulle b and Paramsothy Silvapulle c a Department of Econometrics and Business Statistics, Monash University, c Caulfield

More information

Bahadur representations for bootstrap quantiles 1

Bahadur representations for bootstrap quantiles 1 Bahadur representations for bootstrap quantiles 1 Yijun Zuo Department of Statistics and Probability, Michigan State University East Lansing, MI 48824, USA zuo@msu.edu 1 Research partially supported by

More information

STAT 512 sp 2018 Summary Sheet

STAT 512 sp 2018 Summary Sheet STAT 5 sp 08 Summary Sheet Karl B. Gregory Spring 08. Transformations of a random variable Let X be a rv with support X and let g be a function mapping X to Y with inverse mapping g (A = {x X : g(x A}

More information

Chapter 4: Asymptotic Properties of the MLE

Chapter 4: Asymptotic Properties of the MLE Chapter 4: Asymptotic Properties of the MLE Daniel O. Scharfstein 09/19/13 1 / 1 Maximum Likelihood Maximum likelihood is the most powerful tool for estimation. In this part of the course, we will consider

More information

A measure of radial asymmetry for bivariate copulas based on Sobolev norm

A measure of radial asymmetry for bivariate copulas based on Sobolev norm A measure of radial asymmetry for bivariate copulas based on Sobolev norm Ahmad Alikhani-Vafa Ali Dolati Abstract The modified Sobolev norm is used to construct an index for measuring the degree of radial

More information

simple if it completely specifies the density of x

simple if it completely specifies the density of x 3. Hypothesis Testing Pure significance tests Data x = (x 1,..., x n ) from f(x, θ) Hypothesis H 0 : restricts f(x, θ) Are the data consistent with H 0? H 0 is called the null hypothesis simple if it completely

More information

Financial Econometrics and Volatility Models Copulas

Financial Econometrics and Volatility Models Copulas Financial Econometrics and Volatility Models Copulas Eric Zivot Updated: May 10, 2010 Reading MFTS, chapter 19 FMUND, chapters 6 and 7 Introduction Capturing co-movement between financial asset returns

More information

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3

Hypothesis Testing. 1 Definitions of test statistics. CB: chapter 8; section 10.3 Hypothesis Testing CB: chapter 8; section 0.3 Hypothesis: statement about an unknown population parameter Examples: The average age of males in Sweden is 7. (statement about population mean) The lowest

More information

Goodness-of-Fit Testing with Empirical Copulas

Goodness-of-Fit Testing with Empirical Copulas with Empirical Copulas Sami Umut Can John Einmahl Roger Laeven Department of Econometrics and Operations Research Tilburg University January 19, 2011 with Empirical Copulas Overview of Copulas with Empirical

More information

Estimation of the Bivariate and Marginal Distributions with Censored Data

Estimation of the Bivariate and Marginal Distributions with Censored Data Estimation of the Bivariate and Marginal Distributions with Censored Data Michael Akritas and Ingrid Van Keilegom Penn State University and Eindhoven University of Technology May 22, 2 Abstract Two new

More information

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin

ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE. By Michael Nussbaum Weierstrass Institute, Berlin The Annals of Statistics 1996, Vol. 4, No. 6, 399 430 ASYMPTOTIC EQUIVALENCE OF DENSITY ESTIMATION AND GAUSSIAN WHITE NOISE By Michael Nussbaum Weierstrass Institute, Berlin Signal recovery in Gaussian

More information

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples

Homework # , Spring Due 14 May Convergence of the empirical CDF, uniform samples Homework #3 36-754, Spring 27 Due 14 May 27 1 Convergence of the empirical CDF, uniform samples In this problem and the next, X i are IID samples on the real line, with cumulative distribution function

More information

Asymptotic Statistics-VI. Changliang Zou

Asymptotic Statistics-VI. Changliang Zou Asymptotic Statistics-VI Changliang Zou Kolmogorov-Smirnov distance Example (Kolmogorov-Smirnov confidence intervals) We know given α (0, 1), there is a well-defined d = d α,n such that, for any continuous

More information

1 Glivenko-Cantelli type theorems

1 Glivenko-Cantelli type theorems STA79 Lecture Spring Semester Glivenko-Cantelli type theorems Given i.i.d. observations X,..., X n with unknown distribution function F (t, consider the empirical (sample CDF ˆF n (t = I [Xi t]. n Then

More information

7 Influence Functions

7 Influence Functions 7 Influence Functions The influence function is used to approximate the standard error of a plug-in estimator. The formal definition is as follows. 7.1 Definition. The Gâteaux derivative of T at F in the

More information

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics

Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Chapter 2: Fundamentals of Statistics Lecture 15: Models and statistics Data from one or a series of random experiments are collected. Planning experiments and collecting data (not discussed here). Analysis:

More information

Non-parametric Estimation of Elliptical Copulae With Application to Credit Risk

Non-parametric Estimation of Elliptical Copulae With Application to Credit Risk Non-parametric Estimation of Elliptical Copulae With Application to Credit Risk Krassimir Kostadinov Abstract This paper develops a method for statistical estimation of the dependence structure of financial

More information

On the Estimation and Application of Max-Stable Processes

On the Estimation and Application of Max-Stable Processes On the Estimation and Application of Max-Stable Processes Zhengjun Zhang Department of Statistics University of Wisconsin Madison, WI 53706, USA Co-author: Richard Smith EVA 2009, Fort Collins, CO Z. Zhang

More information

Marginal Specifications and a Gaussian Copula Estimation

Marginal Specifications and a Gaussian Copula Estimation Marginal Specifications and a Gaussian Copula Estimation Kazim Azam Abstract Multivariate analysis involving random variables of different type like count, continuous or mixture of both is frequently required

More information

Copula modeling for discrete data

Copula modeling for discrete data Copula modeling for discrete data Christian Genest & Johanna G. Nešlehová in collaboration with Bruno Rémillard McGill University and HEC Montréal ROBUST, September 11, 2016 Main question Suppose (X 1,

More information

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011

A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 A6523 Signal Modeling, Statistical Inference and Data Mining in Astrophysics Spring 2011 Reading Chapter 5 (continued) Lecture 8 Key points in probability CLT CLT examples Prior vs Likelihood Box & Tiao

More information

Copulas. MOU Lili. December, 2014

Copulas. MOU Lili. December, 2014 Copulas MOU Lili December, 2014 Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion Preliminary MOU Lili SEKE Team 3/30 Probability

More information

Inferences about Parameters of Trivariate Normal Distribution with Missing Data

Inferences about Parameters of Trivariate Normal Distribution with Missing Data Florida International University FIU Digital Commons FIU Electronic Theses and Dissertations University Graduate School 7-5-3 Inferences about Parameters of Trivariate Normal Distribution with Missing

More information

Lecture Quantitative Finance Spring Term 2015

Lecture Quantitative Finance Spring Term 2015 on bivariate Lecture Quantitative Finance Spring Term 2015 Prof. Dr. Erich Walter Farkas Lecture 07: April 2, 2015 1 / 54 Outline on bivariate 1 2 bivariate 3 Distribution 4 5 6 7 8 Comments and conclusions

More information

Multivariate Non-Normally Distributed Random Variables

Multivariate Non-Normally Distributed Random Variables Multivariate Non-Normally Distributed Random Variables An Introduction to the Copula Approach Workgroup seminar on climate dynamics Meteorological Institute at the University of Bonn 18 January 2008, Bonn

More information

The Central Limit Theorem Under Random Truncation

The Central Limit Theorem Under Random Truncation The Central Limit Theorem Under Random Truncation WINFRIED STUTE and JANE-LING WANG Mathematical Institute, University of Giessen, Arndtstr., D-3539 Giessen, Germany. winfried.stute@math.uni-giessen.de

More information

Spring 2012 Math 541B Exam 1

Spring 2012 Math 541B Exam 1 Spring 2012 Math 541B Exam 1 1. A sample of size n is drawn without replacement from an urn containing N balls, m of which are red and N m are black; the balls are otherwise indistinguishable. Let X denote

More information

Explicit Bounds for the Distribution Function of the Sum of Dependent Normally Distributed Random Variables

Explicit Bounds for the Distribution Function of the Sum of Dependent Normally Distributed Random Variables Explicit Bounds for the Distribution Function of the Sum of Dependent Normally Distributed Random Variables Walter Schneider July 26, 20 Abstract In this paper an analytic expression is given for the bounds

More information

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications.

is a Borel subset of S Θ for each c R (Bertsekas and Shreve, 1978, Proposition 7.36) This always holds in practical applications. Stat 811 Lecture Notes The Wald Consistency Theorem Charles J. Geyer April 9, 01 1 Analyticity Assumptions Let { f θ : θ Θ } be a family of subprobability densities 1 with respect to a measure µ on a measurable

More information

5 Introduction to the Theory of Order Statistics and Rank Statistics

5 Introduction to the Theory of Order Statistics and Rank Statistics 5 Introduction to the Theory of Order Statistics and Rank Statistics This section will contain a summary of important definitions and theorems that will be useful for understanding the theory of order

More information

Accounting for extreme-value dependence in multivariate data

Accounting for extreme-value dependence in multivariate data Accounting for extreme-value dependence in multivariate data 38th ASTIN Colloquium Manchester, July 15, 2008 Outline 1. Dependence modeling through copulas 2. Rank-based inference 3. Extreme-value dependence

More information

Multivariate Distributions

Multivariate Distributions IEOR E4602: Quantitative Risk Management Spring 2016 c 2016 by Martin Haugh Multivariate Distributions We will study multivariate distributions in these notes, focusing 1 in particular on multivariate

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results

Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Introduction to Empirical Processes and Semiparametric Inference Lecture 12: Glivenko-Cantelli and Donsker Results Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics

More information

14.30 Introduction to Statistical Methods in Economics Spring 2009

14.30 Introduction to Statistical Methods in Economics Spring 2009 MIT OpenCourseWare http://ocw.mit.edu 4.0 Introduction to Statistical Methods in Economics Spring 009 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

More information

A copula goodness-of-t approach. conditional probability integral transform. Daniel Berg 1 Henrik Bakken 2

A copula goodness-of-t approach. conditional probability integral transform. Daniel Berg 1 Henrik Bakken 2 based on the conditional probability integral transform Daniel Berg 1 Henrik Bakken 2 1 Norwegian Computing Center (NR) & University of Oslo (UiO) 2 Norwegian University of Science and Technology (NTNU)

More information

Inverse Statistical Learning

Inverse Statistical Learning Inverse Statistical Learning Minimax theory, adaptation and algorithm avec (par ordre d apparition) C. Marteau, M. Chichignoud, C. Brunet and S. Souchet Dijon, le 15 janvier 2014 Inverse Statistical Learning

More information

Three-Stage Semi-parametric Estimation of T-Copulas: Asymptotics, Finite-Samples Properties and Computational Aspects

Three-Stage Semi-parametric Estimation of T-Copulas: Asymptotics, Finite-Samples Properties and Computational Aspects Three-Stage Semi-parametric Estimation of T-Copulas: Asymptotics, Finite-Samples Properties and Computational Aspects Dean Fantazzini Moscow School of Economics, Moscow State University, Moscow - Russia

More information

Order Statistics and Distributions

Order Statistics and Distributions Order Statistics and Distributions 1 Some Preliminary Comments and Ideas In this section we consider a random sample X 1, X 2,..., X n common continuous distribution function F and probability density

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 02: Overview Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and Operations Research

More information

Lecture 3. Inference about multivariate normal distribution

Lecture 3. Inference about multivariate normal distribution Lecture 3. Inference about multivariate normal distribution 3.1 Point and Interval Estimation Let X 1,..., X n be i.i.d. N p (µ, Σ). We are interested in evaluation of the maximum likelihood estimates

More information

Efficiency of Profile/Partial Likelihood in the Cox Model

Efficiency of Profile/Partial Likelihood in the Cox Model Efficiency of Profile/Partial Likelihood in the Cox Model Yuichi Hirose School of Mathematics, Statistics and Operations Research, Victoria University of Wellington, New Zealand Summary. This paper shows

More information

Construction and estimation of high dimensional copulas

Construction and estimation of high dimensional copulas Construction and estimation of high dimensional copulas Gildas Mazo PhD work supervised by S. Girard and F. Forbes Mistis, Inria and laboratoire Jean Kuntzmann, Grenoble, France Séminaire Statistiques,

More information

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring

STAT 6385 Survey of Nonparametric Statistics. Order Statistics, EDF and Censoring STAT 6385 Survey of Nonparametric Statistics Order Statistics, EDF and Censoring Quantile Function A quantile (or a percentile) of a distribution is that value of X such that a specific percentage of the

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle 2015 IMS China, Kunming July 1-4, 2015 2015 IMS China Meeting, Kunming Based on joint work

More information

. Find E(V ) and var(v ).

. Find E(V ) and var(v ). Math 6382/6383: Probability Models and Mathematical Statistics Sample Preliminary Exam Questions 1. A person tosses a fair coin until she obtains 2 heads in a row. She then tosses a fair die the same number

More information

A semiparametrie estimation procedure of dependence parameters in multivariate families of distributions

A semiparametrie estimation procedure of dependence parameters in multivariate families of distributions Biometrika (1995), 82, 3, pp. 543-52 Primed in Great Britain A semiparametrie estimation procedure of dependence parameters in multivariate families of distributions BY C. GENEST, K. GHOUDI AND L.-P. RIVEST

More information

Multivariate Statistics

Multivariate Statistics Multivariate Statistics Chapter 2: Multivariate distributions and inference Pedro Galeano Departamento de Estadística Universidad Carlos III de Madrid pedro.galeano@uc3m.es Course 2016/2017 Master in Mathematical

More information

Semiparametric Gaussian Copula Models: Progress and Problems

Semiparametric Gaussian Copula Models: Progress and Problems Semiparametric Gaussian Copula Models: Progress and Problems Jon A. Wellner University of Washington, Seattle European Meeting of Statisticians, Amsterdam July 6-10, 2015 EMS Meeting, Amsterdam Based on

More information

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS

A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Statistica Sinica 20 2010, 365-378 A PRACTICAL WAY FOR ESTIMATING TAIL DEPENDENCE FUNCTIONS Liang Peng Georgia Institute of Technology Abstract: Estimating tail dependence functions is important for applications

More information

Statistics: Learning models from data

Statistics: Learning models from data DS-GA 1002 Lecture notes 5 October 19, 2015 Statistics: Learning models from data Learning models from data that are assumed to be generated probabilistically from a certain unknown distribution is a crucial

More information

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time

1. Introduction In many biomedical studies, the random survival time of interest is never observed and is only known to lie before an inspection time ASYMPTOTIC PROPERTIES OF THE GMLE WITH CASE 2 INTERVAL-CENSORED DATA By Qiqing Yu a;1 Anton Schick a, Linxiong Li b;2 and George Y. C. Wong c;3 a Dept. of Mathematical Sciences, Binghamton University,

More information

A Very Brief Summary of Statistical Inference, and Examples

A Very Brief Summary of Statistical Inference, and Examples A Very Brief Summary of Statistical Inference, and Examples Trinity Term 2009 Prof. Gesine Reinert Our standard situation is that we have data x = x 1, x 2,..., x n, which we view as realisations of random

More information

Section 8: Asymptotic Properties of the MLE

Section 8: Asymptotic Properties of the MLE 2 Section 8: Asymptotic Properties of the MLE In this part of the course, we will consider the asymptotic properties of the maximum likelihood estimator. In particular, we will study issues of consistency,

More information

Score Normalization in Multimodal Biometric Systems

Score Normalization in Multimodal Biometric Systems Score Normalization in Multimodal Biometric Systems Karthik Nandakumar and Anil K. Jain Michigan State University, East Lansing, MI Arun A. Ross West Virginia University, Morgantown, WV http://biometrics.cse.mse.edu

More information

Smooth simultaneous confidence bands for cumulative distribution functions

Smooth simultaneous confidence bands for cumulative distribution functions Journal of Nonparametric Statistics, 2013 Vol. 25, No. 2, 395 407, http://dx.doi.org/10.1080/10485252.2012.759219 Smooth simultaneous confidence bands for cumulative distribution functions Jiangyan Wang

More information

Citation for published version (APA): Susyanto, N. (2016). Semiparametric copula models for biometric score level fusion

Citation for published version (APA): Susyanto, N. (2016). Semiparametric copula models for biometric score level fusion UvA-DARE (Digital Academic Repository) Semiparametric copula models for biometric score level fusion Susyanto, N. Link to publication Citation for published version (APA): Susyanto, N. (2016). Semiparametric

More information

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued

Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Introduction to Empirical Processes and Semiparametric Inference Lecture 09: Stochastic Convergence, Continued Michael R. Kosorok, Ph.D. Professor and Chair of Biostatistics Professor of Statistics and

More information

Behaviour of multivariate tail dependence coefficients

Behaviour of multivariate tail dependence coefficients ACTA ET COMMENTATIONES UNIVERSITATIS TARTUENSIS DE MATHEMATICA Volume 22, Number 2, December 2018 Available online at http://acutm.math.ut.ee Behaviour of multivariate tail dependence coefficients Gaida

More information

Estimation of Dynamic Regression Models

Estimation of Dynamic Regression Models University of Pavia 2007 Estimation of Dynamic Regression Models Eduardo Rossi University of Pavia Factorization of the density DGP: D t (x t χ t 1, d t ; Ψ) x t represent all the variables in the economy.

More information

Semi-parametric predictive inference for bivariate data using copulas

Semi-parametric predictive inference for bivariate data using copulas Semi-parametric predictive inference for bivariate data using copulas Tahani Coolen-Maturi a, Frank P.A. Coolen b,, Noryanti Muhammad b a Durham University Business School, Durham University, Durham, DH1

More information

Chapter 3. Point Estimation. 3.1 Introduction

Chapter 3. Point Estimation. 3.1 Introduction Chapter 3 Point Estimation Let (Ω, A, P θ ), P θ P = {P θ θ Θ}be probability space, X 1, X 2,..., X n : (Ω, A) (IR k, B k ) random variables (X, B X ) sample space γ : Θ IR k measurable function, i.e.

More information

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics

Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics Random Matrix Eigenvalue Problems in Probabilistic Structural Mechanics S Adhikari Department of Aerospace Engineering, University of Bristol, Bristol, U.K. URL: http://www.aer.bris.ac.uk/contact/academic/adhikari/home.html

More information

large number of i.i.d. observations from P. For concreteness, suppose

large number of i.i.d. observations from P. For concreteness, suppose 1 Subsampling Suppose X i, i = 1,..., n is an i.i.d. sequence of random variables with distribution P. Let θ(p ) be some real-valued parameter of interest, and let ˆθ n = ˆθ n (X 1,..., X n ) be some estimate

More information

Graduate Econometrics I: Asymptotic Theory

Graduate Econometrics I: Asymptotic Theory Graduate Econometrics I: Asymptotic Theory Yves Dominicy Université libre de Bruxelles Solvay Brussels School of Economics and Management ECARES Yves Dominicy Graduate Econometrics I: Asymptotic Theory

More information

Statistical Inference with Monotone Incomplete Multivariate Normal Data

Statistical Inference with Monotone Incomplete Multivariate Normal Data Statistical Inference with Monotone Incomplete Multivariate Normal Data p. 1/4 Statistical Inference with Monotone Incomplete Multivariate Normal Data This talk is based on joint work with my wonderful

More information

Robust Backtesting Tests for Value-at-Risk Models

Robust Backtesting Tests for Value-at-Risk Models Robust Backtesting Tests for Value-at-Risk Models Jose Olmo City University London (joint work with Juan Carlos Escanciano, Indiana University) Far East and South Asia Meeting of the Econometric Society

More information

Maximum Likelihood Estimation

Maximum Likelihood Estimation University of Pavia Maximum Likelihood Estimation Eduardo Rossi Likelihood function Choosing parameter values that make what one has observed more likely to occur than any other parameter values do. Assumption(Distribution)

More information

Overview of Extreme Value Theory. Dr. Sawsan Hilal space

Overview of Extreme Value Theory. Dr. Sawsan Hilal space Overview of Extreme Value Theory Dr. Sawsan Hilal space Maths Department - University of Bahrain space November 2010 Outline Part-1: Univariate Extremes Motivation Threshold Exceedances Part-2: Bivariate

More information

Estimation of multivariate critical layers: Applications to rainfall data

Estimation of multivariate critical layers: Applications to rainfall data Elena Di Bernardino, ICRA 6 / RISK 2015 () Estimation of Multivariate critical layers Barcelona, May 26-29, 2015 Estimation of multivariate critical layers: Applications to rainfall data Elena Di Bernardino,

More information

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535

Additive functionals of infinite-variance moving averages. Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Additive functionals of infinite-variance moving averages Wei Biao Wu The University of Chicago TECHNICAL REPORT NO. 535 Departments of Statistics The University of Chicago Chicago, Illinois 60637 June

More information

Modelling and Estimation of Stochastic Dependence

Modelling and Estimation of Stochastic Dependence Modelling and Estimation of Stochastic Dependence Uwe Schmock Based on joint work with Dr. Barbara Dengler Financial and Actuarial Mathematics and Christian Doppler Laboratory for Portfolio Risk Management

More information

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability

Chapter 6. Convergence. Probability Theory. Four different convergence concepts. Four different convergence concepts. Convergence in probability Probability Theory Chapter 6 Convergence Four different convergence concepts Let X 1, X 2, be a sequence of (usually dependent) random variables Definition 1.1. X n converges almost surely (a.s.), or with

More information

A Goodness-of-fit Test for Copulas

A Goodness-of-fit Test for Copulas A Goodness-of-fit Test for Copulas Artem Prokhorov August 2008 Abstract A new goodness-of-fit test for copulas is proposed. It is based on restrictions on certain elements of the information matrix and

More information

P (A G) dp G P (A G)

P (A G) dp G P (A G) First homework assignment. Due at 12:15 on 22 September 2016. Homework 1. We roll two dices. X is the result of one of them and Z the sum of the results. Find E [X Z. Homework 2. Let X be a r.v.. Assume

More information

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap

The bootstrap. Patrick Breheny. December 6. The empirical distribution function The bootstrap Patrick Breheny December 6 Patrick Breheny BST 764: Applied Statistical Modeling 1/21 The empirical distribution function Suppose X F, where F (x) = Pr(X x) is a distribution function, and we wish to estimate

More information

Inference For High Dimensional M-estimates. Fixed Design Results

Inference For High Dimensional M-estimates. Fixed Design Results : Fixed Design Results Lihua Lei Advisors: Peter J. Bickel, Michael I. Jordan joint work with Peter J. Bickel and Noureddine El Karoui Dec. 8, 2016 1/57 Table of Contents 1 Background 2 Main Results and

More information

Lecture 2: CDF and EDF

Lecture 2: CDF and EDF STAT 425: Introduction to Nonparametric Statistics Winter 2018 Instructor: Yen-Chi Chen Lecture 2: CDF and EDF 2.1 CDF: Cumulative Distribution Function For a random variable X, its CDF F () contains all

More information

1 Probability theory. 2 Random variables and probability theory.

1 Probability theory. 2 Random variables and probability theory. Probability theory Here we summarize some of the probability theory we need. If this is totally unfamiliar to you, you should look at one of the sources given in the readings. In essence, for the major

More information

1 The Glivenko-Cantelli Theorem

1 The Glivenko-Cantelli Theorem 1 The Glivenko-Cantelli Theorem Let X i, i = 1,..., n be an i.i.d. sequence of random variables with distribution function F on R. The empirical distribution function is the function of x defined by ˆF

More information

Bandits : optimality in exponential families

Bandits : optimality in exponential families Bandits : optimality in exponential families Odalric-Ambrym Maillard IHES, January 2016 Odalric-Ambrym Maillard Bandits 1 / 40 Introduction 1 Stochastic multi-armed bandits 2 Boundary crossing probabilities

More information

DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES

DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES DISTRIBUTION OF EIGENVALUES OF REAL SYMMETRIC PALINDROMIC TOEPLITZ MATRICES AND CIRCULANT MATRICES ADAM MASSEY, STEVEN J. MILLER, AND JOHN SINSHEIMER Abstract. Consider the ensemble of real symmetric Toeplitz

More information

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2

n! (k 1)!(n k)! = F (X) U(0, 1). (x, y) = n(n 1) ( F (y) F (x) ) n 2 Order statistics Ex. 4. (*. Let independent variables X,..., X n have U(0, distribution. Show that for every x (0,, we have P ( X ( < x and P ( X (n > x as n. Ex. 4.2 (**. By using induction or otherwise,

More information

Bootstrap, Jackknife and other resampling methods

Bootstrap, Jackknife and other resampling methods Bootstrap, Jackknife and other resampling methods Part III: Parametric Bootstrap Rozenn Dahyot Room 128, Department of Statistics Trinity College Dublin, Ireland dahyot@mee.tcd.ie 2005 R. Dahyot (TCD)

More information

11 Survival Analysis and Empirical Likelihood

11 Survival Analysis and Empirical Likelihood 11 Survival Analysis and Empirical Likelihood The first paper of empirical likelihood is actually about confidence intervals with the Kaplan-Meier estimator (Thomas and Grunkmeier 1979), i.e. deals with

More information

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich

Modelling Dependence with Copulas and Applications to Risk Management. Filip Lindskog, RiskLab, ETH Zürich Modelling Dependence with Copulas and Applications to Risk Management Filip Lindskog, RiskLab, ETH Zürich 02-07-2000 Home page: http://www.math.ethz.ch/ lindskog E-mail: lindskog@math.ethz.ch RiskLab:

More information

Probability Distribution And Density For Functional Random Variables

Probability Distribution And Density For Functional Random Variables Probability Distribution And Density For Functional Random Variables E. Cuvelier 1 M. Noirhomme-Fraiture 1 1 Institut d Informatique Facultés Universitaires Notre-Dame de la paix Namur CIL Research Contact

More information

Laplace s Equation. Chapter Mean Value Formulas

Laplace s Equation. Chapter Mean Value Formulas Chapter 1 Laplace s Equation Let be an open set in R n. A function u C 2 () is called harmonic in if it satisfies Laplace s equation n (1.1) u := D ii u = 0 in. i=1 A function u C 2 () is called subharmonic

More information

Non-parametric Inference and Resampling

Non-parametric Inference and Resampling Non-parametric Inference and Resampling Exercises by David Wozabal (Last update. Juni 010) 1 Basic Facts about Rank and Order Statistics 1.1 10 students were asked about the amount of time they spend surfing

More information

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS

SOME CONVERSE LIMIT THEOREMS FOR EXCHANGEABLE BOOTSTRAPS SOME CONVERSE LIMIT THEOREMS OR EXCHANGEABLE BOOTSTRAPS Jon A. Wellner University of Washington The bootstrap Glivenko-Cantelli and bootstrap Donsker theorems of Giné and Zinn (990) contain both necessary

More information

FULL LIKELIHOOD INFERENCES IN THE COX MODEL

FULL LIKELIHOOD INFERENCES IN THE COX MODEL October 20, 2007 FULL LIKELIHOOD INFERENCES IN THE COX MODEL BY JIAN-JIAN REN 1 AND MAI ZHOU 2 University of Central Florida and University of Kentucky Abstract We use the empirical likelihood approach

More information

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing

Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing Quantitative Introduction ro Risk and Uncertainty in Business Module 5: Hypothesis Testing M. Vidyasagar Cecil & Ida Green Chair The University of Texas at Dallas Email: M.Vidyasagar@utdallas.edu October

More information

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS

GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Statistica Sinica 20 (2010), 441-453 GOODNESS-OF-FIT TESTS FOR ARCHIMEDEAN COPULA MODELS Antai Wang Georgetown University Medical Center Abstract: In this paper, we propose two tests for parametric models

More information

Product measure and Fubini s theorem

Product measure and Fubini s theorem Chapter 7 Product measure and Fubini s theorem This is based on [Billingsley, Section 18]. 1. Product spaces Suppose (Ω 1, F 1 ) and (Ω 2, F 2 ) are two probability spaces. In a product space Ω = Ω 1 Ω

More information

Stein s Method and Characteristic Functions

Stein s Method and Characteristic Functions Stein s Method and Characteristic Functions Alexander Tikhomirov Komi Science Center of Ural Division of RAS, Syktyvkar, Russia; Singapore, NUS, 18-29 May 2015 Workshop New Directions in Stein s method

More information

STAT 7032 Probability Spring Wlodek Bryc

STAT 7032 Probability Spring Wlodek Bryc STAT 7032 Probability Spring 2018 Wlodek Bryc Created: Friday, Jan 2, 2014 Revised for Spring 2018 Printed: January 9, 2018 File: Grad-Prob-2018.TEX Department of Mathematical Sciences, University of Cincinnati,

More information

On the Power of Tests for Regime Switching

On the Power of Tests for Regime Switching On the Power of Tests for Regime Switching joint work with Drew Carter and Ben Hansen Douglas G. Steigerwald UC Santa Barbara May 2015 D. Steigerwald (UCSB) Regime Switching May 2015 1 / 42 Motivating

More information

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017

Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Ph.D. Qualifying Exam Friday Saturday, January 6 7, 2017 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Let X 1, X 2,, X n be a sequence of i.i.d. observations from a

More information

Lecture 21: Convergence of transformations and generating a random variable

Lecture 21: Convergence of transformations and generating a random variable Lecture 21: Convergence of transformations and generating a random variable If Z n converges to Z in some sense, we often need to check whether h(z n ) converges to h(z ) in the same sense. Continuous

More information

Elements of Financial Engineering Course

Elements of Financial Engineering Course Elements of Financial Engineering Course Baruch-NSD Summer Camp 0 Lecture Tai-Ho Wang Agenda Methods of simulation: inverse transformation method, acceptance-rejection method Variance reduction techniques

More information