SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED MARGINALS

Size: px

Start display at page:

Download "SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED MARGINALS"

Brett Griffith
5 years ago
Views:

1 SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED MARGINALS By Wenmei Huang A DISSERTATION Submitted to Michigan State University in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY Department of Statistics and Probability 2011

2 ABSTRACT SEMI-PARAMETRIC ESTIMATION OF BIVARIATE DEPENDENCE UNDER MIXED MARGINALS By Wenmei Huang Copulas are a flexible way of modeling dependence in a set of variables where the association between variables can be elicited separately from the marginal distributions. A semi-parametric approach for estimating the dependence structure of bivariate distributions derived from copulas is investigated when the associated marginals are mixed, that is, consisting of both discrete and continuous components. The semi-parametric likelihood approach is proposed for obtaining the estimator of the dependence parameter under unknown marginals. The consistency and asymptotic normality of the estimator is established as sample size tends to infinity. For constructing confidence intervals in practice, an estimator of the asymptotic variance is proposed and its properties are investigated via simulation. Extensions to higher dimensions are discussed. Several simulation studies and real data examples are provided for investigating the application of the developed methodology of inference. This work generalizes prior results obtained on the estimation of dependence when the marginals are continuous by Genest et al. [11].

3 To my professors and my loving family. iii

4 ACKNOWLEDGMENTS I would like to express my sincere gratitude to my dissertation advisor, Prof. Sarat C. Dass, for his patient guidance, encouragement and support during my graduate study at Michigan State University, and his precious advice for my professional career. He helped me to learn not only how to do research in statistics, but also how to write statistical papers using proper English. The education I received under his guidance will be the most valuable in my future career. I am grateful to Prof. Tapabrata Maiti, Prof. Chae Young Lim of the Department of Statistics and Probability, and Prof. Zhengfang Zhou of the Department of Mathematics, for serving on my thesis committee. I accumulated practical experience from the consulting work at the Center for Statistical Training and Consulting. Prof. Connie Page, Prof. Dennis Gilliland, and Prof. Sandra Herman shared with me their experience and knowledge in dealing with practical problems, and guided me through my projects. I would like to thank Prof. James Stapleton for his help and encouragement with respect to my teaching and study. He has been very supportive to me, as well as to all his students. I thank him for offering to review my thesis. I would like to thank Prof. Anil K. Jain at Department of Computer Science and Engineering, Michigan State University, for letting me use the biometrics data in the PRIP lab in my research. I would like to thank Dr. Yongfang Zhu for discussions on course work and research. I also would like to thank all the professors of the Department of Statistics and Probability and all my great friends at Michigan State University for making my life there so unique and wonderful. Finally, I especially thank my loving family for their encouragement and support. iv

5 TABLE OF CONTENTS List of Tables vii List of Figures viii 1 INTRODUCTION 1 2 A MODEL FOR BIVARIATE DISTRIBUTIONS Joint Distributions with Mixed Marginals Semi-Parametric Estimation of θ ASYMPTOTIC PROPERTIES OF SEMI-PARAMETRIC ESTIMATOR IN BI- VARIATE DISTRIBUTIONS Statement of the Main Theorem Consistency of ˆθn Asymptotic Behavior of Bn Asymptotic Behavior of An Asymptotic normality of Rn cc Asymptotic normality of Rn cd and Rn dc Asymptotic normality of Rn dd Asymptotic normality of Rn Proof of the Main Result A VARIANCE ESTIMATOR OF BIVARIATE DISTRIBUTIONS 46 5 EXTENSIONS TO HIGHER DIMENSIONS 51 6 JOINT DISTRIBUTIONS USING t-copula Joint Distributions Using t-copula Estimation of R and ν Estimation of R for fixed ν Selection of the Degrees of Freedom, ν Summary SIMULATION AND REAL DATA RESULTS Simulation Results Application to Multimodal Fusion in Biometric Recognition Introduction Application in Biometric Fusion v

6 8 SUMMARY AND CONCLUSION 86 Bibliography vi

7 LIST OF TABLES 4.1 Relationship among (X j, Y j ), R X(j), and R Y (j) Simulation results for Experiment 1 with ν 0 = and ρ = The absolute bias and relative absolute bias of the estimator ˆρn are provided, together with the empirical coverage of the approximate 95% confidence interval for ρ based on the asymptotic normality of ˆρn L 1 distances for paired values of ν 0 corresponding to two values of ρ 0, 0.20 and Simulation results for Experiment 2 with ν 0 = and ρ 0 = Simulation results for Experiment 3 with ν 0 = 10. The two correlation values considered are ρ 0 = 0.2 and ρ 0 = Simulation results for Experiment Results of the estimation procedure for R and ν based on the NIST database vii

8 LIST OF FIGURES 7.1 Density curves for t distribution with degrees of freedom ν = 3, 5, 10, 15, 20, 25 and normal distribution. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation Some examples of biometric traits: (a) fingerprint, (b) iris scan, (c) face scan, (d) signature, (e) voice, (f) hand geometry, (g) retina, and (h) ear Histograms of matching scores, corresponding to genuine scores for Matcher 1. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines) Histograms of matching scores, corresponding to genuine scores for Matcher 2. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines) Histograms of matching scores, corresponding to impostor scores for Matcher 1. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines). The spike corresponds to discrete components. Note how the generalized density estimator performs better compared to the continuous estimator (assuming no discrete components) Histograms of matching scores, corresponding to impostor scores for Matcher 2. Continuous (respectively, generalized) density estimators is given by the dashed lines (solid lines) Performance of copula fusion on the MSU-Multimodal database Performance of copula fusion on the NIST database viii

9 CHAPTER 1 INTRODUCTION Copulas are functions that join or couple multivariate distribution functions to their onedimensional marginal distribution functions. Essentially, a copula is a function C from [0, 1] D (D 2) to [0, 1] with the following properties: 1. For every (u 1,..., u D ) [0, 1] D, C(u 1,..., u D ) = 0 if at least one of u k = 0, for k = 1,..., D (1.1) and C(u 1,..., u D ) = u k if u j = 1 for all j k, k = 1,..., D (1.2) 2. For every (u 1,..., u D ) and (v 1,..., v D ) [0, 1] D such that u k v k for k = 1,..., D, which defines a D-box V = [u 1, v 1 ] [u 2, v 2 ]... [u D, v D ] [0, 1] D. Then sgn(c)c(c) 0 (1.3) where the sum is taken over all vertices c of V and sgn(c) is given by 1, if c k = u k for an even number of k s, sgn = 1, if c k = u k for an odd number of k s. Alternatively, copulas are multivariate distribution functions whose one-dimensional marginals are uniform on the interval [0, 1]. 1

10 Since their introduction by Sklar [46], copulas have proved to be a useful tool for analyzing multivariate dependence structures due to many unique and interesting features. Let F k (x k ) = P (X k x k ) for k = 1, 2,..., D denote D continuous distribution functions on the real line, and let F denote a joint distribution function on R D whose k-th marginal distribution corresponds to F k. According to Sklar s Theorem [46], there exists a unique function C(u 1,..., u D ) from [0, 1] D to [0, 1] satisfying F (x 1,..., x D ) = C(F 1 (x 1 ),..., F D (x D )), (1.4) where x 1,..., x D are D real numbers. The function C is known as a D-copula function that couples the one-dimensional distribution functions F k, k = 1, 2,..., D to obtain F. If not all marginals are continuous, C is uniquely determined on RanF 1... RanF D, where RanF k is the range of F k. Equation (1.4) can also be used to construct D-dimensional distribution functions F whose marginals are the pre-specified distributions F k, k = 1,..., D: Choose a copula function C and define the distribution function F as in (1.4). It follows that F is a D-dimensional distribution function with marginals F k, k = 1,..., D. Investigating dependence structures of multivariate distributions has always been an important area for researchers; see, for example, [26], [27], and [35]. For example, one of the central problems in statistics concerns testing the hypothesis that random variables are independent. Prior to the very recent explosion of copula theory and application, the only models available in many fields to represent the dependence structure were the classical multivariate models, such as Gaussian multivariate model. These models entailed rigid assumptions on the marginal and joint behaviors of the variables. Therefore, they provide limited usefulness. Though the phrase copula was first used by Sklar [46] in 1959 and traces of copula theory can be found in Hoeffdings work during the 1940s, the study of copulas and their application in statistics is a rather modern phenomenon. Earlier efforts have addressed different statistical aspects of specialized as well as general copula-based joint distributions. Inference procedures for bivariate Archimedean copulas have been developed and discussed by Genest et. al [12]. Demarta et al., 2

11 [8], studied properties related to t-copulas, whereas Genest et al., [11] and [13], gave a goodness of fit procedure for general copulas. Estimation techniques and properties of the estimators have been well studied with applications ranging from statistics to mathematical finance and financial risk management; see, for example, Shih et al. [44], Embrechts et al. [9], Cherubini et al. [5], Frey et al. [10] and Chen et al. [4]. Many multivariate models for dependence can be generated by parametric families of copulas, {C θ : θ Θ}, typically indexed by a real or vector-valued parameter θ. Examples of such systems are given in [23], [30], [22], and others. The recent monograph by Hutchinson and Lai [17], which includes an extensive bibliography, constitutes a handy reference to this expanding literature. Copula-based models are natural in situations where learning about the association between the variables is important, since the effect of the dependence structure is easily separated from that of the marginals. In such situations, there is typically enough data to obtain nonparametric estimators of the marginal distributions, but insufficient information to afford nonparametric estimation of the structure of the association. In such cases, it is convenient to adopt a parametric form for the dependence function while keeping the marginals unspecified. To estimate the dependence parameter θ, two strategies could be adopted depending on the circumstances: (i) if valid parametric models are already available for the marginals, then it is straightforward in principle to write down a likelihood function for the data, which makes the estimation of θ margin-dependent, because the estimators of the parameters involved in the marginals would be indirectly affected by the copula. This is the parametric approach. (ii) when nonparametric estimators are contemplated for the marginals, however, inference about the dependence parameter θ will be margin-free. This is the semi-parametric approach. Clayton [6], Hougaard et al. [15] and Oakes [36] have pointed out that the margin-free requirement is sensible in applications where the focus of the analysis is on the dependence structure. Given a sample of n observations X j = (X 1j, X 2j,..., X Dj ), j = 1, 2,..., n from the joint distribution F (x 1, x 2,..., x D ) = C θ (F 1 (x 1 ),..., F D (x D )), the estimation procedure involves 3

12 selecting ˆθn to maximize the semi-parametric likelihood n l(θ) = log [c θ (F 1n (X 1j ),..., F Dn (X Dj ))] (1.5) j=1 where F kn denotes n/(n + 1) times the empirical distribution function of the k-th component observations X kj for j = 1, 2,..., n, and c θ is the density of C θ with respect to Lebesgue measure on [0, 1] D. The utilization of F kn instead of the empirical distribution here avoids difficulties arising from the potential unboundedness of log c θ (u 1,..., u D ) as some of the u k s tend to 1. A central assumption made in the earlier studies is that the marginals associated with the joint distribution F should be continuous. In reality, there are many situations where this assumption is not satisfied and the marginals can be mixed, that is, they contain both discrete and continuous components (see, for example, Kohn et al. [37]). Based on the continuous assumption and the regularity conditions, Genest et al. [11] have shown that n 1/2ˆθn is asymptotic normal, where ˆθn is the semi-parametric estimator of θ in (1.5). Our work extends previous methodology by accounting for mixed marginals for D = 2 and Θ R. Uniqueness is an important consideration when estimating the unknown parameters corresponding to a copula. Since our mixed marginals have discrete components, one way to achieve uniqueness is to restrict C to belong to a particular parametric family. Under this assumption, We develop an estimation technique for finding the semi-parametric maximum likelihood estimator, ˆθn, for the parametric family {C θ : θ Θ} in the bivariate situation in Chapter 2. The estimation methodology involves integrals corresponding to the discrete components and is therefore, non-standard. Consistency and asymptotic normality of the semi-parametric maximum likelihood estimator ˆθn is developed in Chapter 3. A a variance estimator of ˆθn, ϱ is developed in Chapter 4. Note that for multivariate observations, copulas allow flexible modelling of the joint distribution via its marginal distributions as well as the correlation between pairwise components of the vector of observations. Therefore, the theory presented in Chapter 2 and Chapter 3 can be extended to higher dimensions, and this is given in Chapter 5. In Chapter 6, C is restricted to a particular parametric family of copula, the t-copula, and findings related to the t-copula are provided. Numerical 4

13 simulations and application of the methodology to real biometric data are presented in Chapter 7. We summarize our work and discuss future research directions in Chapter 8. 5

14 CHAPTER 2 A MODEL FOR BIVARIATE DISTRIBUTIONS 2.1 Joint Distributions with Mixed Marginals Let F and G be two distributions on the real line that are mixed, that is, both F and G consist of a mixture of discrete as well as continuous components. The general form for F is given by d F d F (x) = p F h I {DF h x} + (1 F x p F h ) f(w) dw, (2.1) h=1 h=1 where I {A} is the indicator function of set A (that is, I {A} = 1 if A is true, and 0 otherwise); in (2.1), the distribution function F consists of d F discrete components denoted by D F h with F (D F h ) F (D F h ) = p F h for h = 1, 2,..., d F where x denotes the left limit of x, and f is the density (continuous) component of F. The discrete components D F h, h = 1, 2,..., d F correspond to jump points in the distribution function F, and are called the jump points of F. All other points on R correspond to points of continuity of F, that is, F (x) F (x ) = 0 if x D F h. The set of jump and continuity points of F are denoted by J (F ) and C(F ), respectively. In the same way, writing d G d G(y) = p Gl I {DGl y} + (1 G y p Gl ) l=1 l=1 g(w) dw, (2.2) 6

15 similar definitions can be given for the quantities d G, p Gl, D Gl and g. We denote the set of jump and continuity points of G by J (G) and C(G), respectively. Let H θ be the bivariate function defined by H θ (x, y) = C θ (F (x), G(y)) (2.3) for θ Θ with F and G as in (2.1) and (2.2), respectively. It follows from the properties of a copula function that Theorem The function H θ, as defined in (2.3), is a valid bivariate distribution function on R 2 with marginals F and G. Proof: For H θ (x, y) to be a valid distribution function on R 2, the following four conditions should be satisfied: (1) 0 H θ (x, y) 1 for all (x, y), (2) H θ (x, y) 0 as max(x, y), (3) H θ (x, y) 1 as min(x, y) +, and (4) for every x 1 x 2 and y 1 y 2, H H θ (x 2, y 2 ) H θ (x 1, y 2 ) H θ (x 2, y 1 ) + H θ (x 1, y 1 ) 0. Note that since H θ (x, y) = C θ (F (x), G(y)) = P (U F (x), V G(y)), (1) follows. To prove (2), note that F (x) 0 and G(y) 0 when max(x, y). Hence, H θ (x, y) 0 from the property of a cumulative distribution function. (3) follows similarly. To prove (4), we use the facts that F (x 1 ) F (x 2 ) for x 1 x 2 and G(y 1 ) G(y 2 ) for y 1 y 2, and H = [F (x 1 ),F (x 2 )] [G(y 1 ),G(y 2 )] c θ (u, v) du dv. (2.4) Since c θ (u, v) 0, H 0 follows. Proof completed. We turn now to give a characterization of the density, h θ (x, y), of H θ (x, y). For a fixed (x, y) R 2, the two components of (x, y) correspond to either a point of continuity or a jump point of 7

16 the corresponding marginal distribution function. For (X, Y ) H θ, h θ (x, y) has four different expressions, namely, P {X (x, x + a], Y (y, y + b]} lim a,b 0 ab if x C(F ), y C(G), h θ (x, y) = lim b 0 P {X = x, Y (y, y + b]} b lim a 0 P {X (x, x + a], Y = y} a if x J (F ), y C(G), if x C(F ), y J (G), and P {X = x, Y = y} if x J (F ), y J (G), (2.5) The following theorem gives workable expressions for h θ (x, y) in terms of the copula: Theorem For each (x, y) R 2, f(x) g(y) c θ (F (x), G(y)) if x C(F ), y C(G), y(y) [F (x ),F (x)] c θ (u, G(y)) du if x J (F ), y C(G), h θ (x, y) = f(x) [G(y ),G(y)] c θ (F (x), v) dv if x C(F ), y J (G), [F (x ),F (x)] [G(y ),G(y)] c θ (u, v) du dv if x J (F ), y J (G). (2.6) Proof: When x C(F ) and y C(G), P {X (x, x + a], Y (y, y + b]} = [F (x),f (x+a)] [G(y),G(y+b)] c θ (u, v) du dv from (2.4). This is approximately ab f(x) g(y) c θ (F (x), G(y)) for small a and b. When x J (F ) and y C(G), note that P {X = x, Y (y, y + b]} = [F (x ),F (x)] [G(y),G(y+b)] c θ (u, v) du dv, 8

17 again from (2.4), which is approximately b g(y) [F (x ),F (x)] c θ (u, G(y)) du for small b. The third case follows similarly and the fourth expression is straightforward. Proof completed. The terms in h θ (x, y) that involve the densities f and g do not depend on θ and can, hence, be ignored for the estimation of θ. Subsequently, we focus on the function c θ c θ (F (x ), F (x), G(y ), G(y)) defined by c θ = c θ (F (x), G(y)) if x C(F ), y C(G), [F (x ),F (x)] c θ (u, G(y)) du if x J (F ), y C(G), [G(y ),G(y)] c θ (F (x), v) dv if x C(F ), y J (G), [F (x ),F (x)] [G(y ),G(y)] c θ (u, v) du dv if x J (F ), y J (G). (2.7) 2.2 Semi-Parametric Estimation of θ If not all marginals are continuous, C in (1.4) is no longer unique. Uniqueness is an important consideration when estimating the unknown parameters corresponding to the copula function. Since our marginals have discrete components, one way to achieve uniqueness is to restrict C to belong to a particular parametric family. From now on, let {C θ : θ Θ} be a restricted parametric family of copulas. Suppose { (X j, Y j ) T, j = 1, 2,..., n } is the set of n independent and identically distributed bivariate random vectors arising from the joint distribution H θ (x, y) in (2.3). The parameter of interest is the bivariate dependence parameter, θ, and the log-likelihood function corresponding to the n observations is n log h θ (x j, y j ). j=1 The estimation of θ is complicated by the presence of nuisance parameters consisting of the unknown distributions F and G (only the number d F and jump points D F h, h = 1, 2,..., d F 9

18 of F (correspondingly, d G and points D Gl of G) are known). An objective function that has θ as the only unknown parameter can be obtained by replacing F and G by Fn and Gn in the loglikelihood function, where Fn (and respectively, Gn) is n/(n + 1) times the empirical cdf of F (respectively, G). The rescaling by n/(n + 1) avoids difficulties arising from the unboundedness of the log-likelihood function as the empirical cdfs tend to 0 or 1, and has been employed by Genest et al. in [11]. Thus, the unique semi-parametric maximum likelihood estimator of θ, ˆθn, is the maximizer of n L(θ) = log {c θ,j (F n(x j ), F n(x j ), Gn(y j ), G n(y j ))} (2.8) j=1 (that is, ˆθn = arg max θ Θ L(θ)), where c θ,j c θ (F n(x j ), F n(x j ), Gn(y j ), G n(y j )), ignoring terms that do not depend on θ in n j=1 log h θ (x j, y j ). In the special case when no discrete components are present, Genest et al. [11] developed a semi-parametric approach to estimate the unknown parameters based on the semi-parametric likelihood. Subsequently, it was shown that the resulting estimators were consistent and asymptotically normally distributed; see [11] for details. Expressions (2.7) and (2.8), respectively, are generalizations of the methodology of Genest et al. [11] when F and G contain discrete components. Note that the challenge in maximizing the semiparameteric log-likelihood in (2.8) is that it involves several integrals corresponding to discrete components in (x j, y j ), j = 1, 2,..., n. 10

19 CHAPTER 3 ASYMPTOTIC PROPERTIES OF SEMI-PARAMETRIC ESTIMATOR IN BIVARIATE DISTRIBUTIONS 3.1 Statement of the Main Theorem In view of its similarity with the semi-parametric maximum likelihood estimator for continuous marginals, we expect that ˆθn in the case of mixed marginals to be consistent and asymptotically normal. To prove this, we denote l(θ, u 1, u 2, v 1, v 2 ) = log {c θ (u 1, u 2, v 1, v 2 )} and use the notation l θ and l θ,θ to denote the first and second derivative of l with respect to θ. The estimator ˆθn satisfies 1 L(θ) n θ = 1 n l n θ (θ, Fn(x j ), F n(x j ), Gn(y j ), G n(y j )) = 0. (3.1) j=1 Here is some heuristics of the proof of asymptotic normality. Expanding in a Taylor s series, one obtains, 1 L(θ) n θ = 0 A n (ˆθn θ)bn (3.2) θ=ˆθn 11

20 where An = 1 n l n θ (θ, Fn(x j ), F n(x j ), Gn(y j ), G n(y j )) j=1 and Bn = 1 n l n θ,θ (θ, Fn(x j ), F n(x j ), Gn(y j ), G n(y j )). j=1 It follows from equation (3.2) that n 1/2 (ˆθn θ) n 1/2 An/Bn. From the theory of multivariate rank statistics, one can get the following limiting behaviors: Bn β = E(l θ,θ {θ, F (X ), F (X), G(Y ), G(Y )}) (3.3) almost surely, and n 1/2 An is asymptotically normal with zero mean and variance σ 2, of which the explicit form will be provided in Section 3.4. Thus, we have Theorem Under suitable regularity conditions, the semi-parametric estimator ˆθn is consistent and n 1/2 (ˆθn θ) is asymptotically normal with mean 0 and variance ϱ = σ 2 /β 2. We start with the proofs of consistency in Section 3.2, the asymptotic properties of Bn and An in Section 3.3 and Section 3.4 respectively, on which Theorem is based, and complete this Chapter by proving Theorem in Section 3.5. Here are some notations which will be used later in this Chapter. Let S F h = [F (D F h ), F (D F h )], for h = 1, 2,..., d F, S Gl = [F (D Gl ), F (D Gl )] for l = 1, 2,..., d G. Under the situation that both F and G only have one jump point, the previous notations can be simplified to D F, D G, S F = [F (D F ), F (D F )], and S G = [G(D G ), G(D G )], respectively. 12

21 3.2 Consistency of ˆθ n Nothing that ˆθn satisfying (3.1), by letting J = l θ, which is a function on (0, 1) 2, we only need to show that Rn = 1 n J(Fn(X n j ), Gn(Y j )) 0 a.s. (3.4) j=1 Since H θ involves one or more jump points, Rn can be written as n Rn = 1 n j=1 J(Fn(X j ), F n(x j ), Gn(Y j ), G n(y j )) (3.5) We introduce some notations for the subsequent presentation. Let {cc}, {cd}, {dc}, and {dd} be the events that {(X, Y ) : X C(F ), Y C(G)}, {(X, Y ) : X C(F ), Y J (G)}, {(X, Y ) : X J (F ), Y C(G)} and {(X, Y ) : X J (F ), Y J (G)}, respectively. Further, let Acc = {j : (X j, Y j ) {cc}}. Similarly, one can define A cd, A dc and A dd. Note that the sets A S for S = {cc}, {cd}, {dc}, and {dd} is a partition of the set of integers {1, 2,..., n}. We consider the decomposition of Rn based on these four partition sets, namely, Rn = R cc n +R cd n +R dc n +R dd n where Rn S = 1 J S (Fn(X n j ), F n(x j ), Gn(Y j ), G n(y j )), j A S for S = {cc}, {cd}, {dc}, and {dd}, and J S is given by J cc (u 2, v 2 ) J S J cd (u 2, v 1, v 2 ) (u 1, u 2, v 1, v 2 ) = J dc (u 1, u 2, v 2 ) J dd (u 1, u 2, v 1, v 2 ) if S = {cc} if S = {cd} if S = {dc} if S = {dd}. (3.6) The functions J S, where S = {cc}, {cd}, {dc} and {dd}, are assumed to be continuous on their respective domains. For example, J cc (u 2, v 2 ) is assumed to be continuous on [0, 1] 2 ; this is a reasonable assumption to make since candidates for J cc will be either θ log c θ (u 2, v 2 ) or 13

22 2 θ 2 log c θ (u 2, v 2 ) in this Chapter, which are continuous functions of u 2 and v 2. Jcd (u 2, v 1, v 2 ) is assumed to be continuous on [0, 1] 3. A particular candidate of J cd is J cd (u 2, v 1, v 2 ) = θ log [v 1,v 2 ] c θ (u 2, v)dv which is continuous in u 2, v 1 and v 2. Similarly for J dc and J dd. Almost sure convergence of Rn is established in the following theorem: Theorem Let J = l θ, r(u) = u(1 u), δ > 0, p and q are positive numbers satisfying 1/p + 1/q = 1. Let a and b be numbers given by a = ( 1 + δ)/p and b = ( 1 + δ)/q. Consider the conditions (C1) J cc (u 2, v 2 ) M 1 r(u 2 ) a r(v 2 ) b, (C2) J cd (u 2, v 1, v 2 ) M 2 r(u 2 ) a (independent of v 1 and v 2 ), and (C3) J dc (u 1, u 2, v 2 ) M 3 r(v 2 ) b (independent of u 1 and u 2 ). (C4) In a small neighborhood of each (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) [0, 1]4, for h = 1, 2,..., d F and l = 1, 2,..., d G, J dd (u 1, u 2, v 1, v 2 ) is finite. Under conditions (C1-C4), Rn κ(= 0) almost surely. where κ = E[N cc (X, Y ) + N cd (X, Y ) + N dc (X, Y ) + N dd (X, Y )], (3.7) with E[N cc (X, Y )], E[N cd (X, Y )], E[N dc (X, Y )] and E[N dd (X, Y )] are given as below, respectively, {(0,1) ( d F S h=1 F h ) c } {(0,1) ( d G S l=1 Gl ) c J cc (u, v)c θ (u, v) du dv, } d G {J (0,1) ( d cd F S h=1 F h ) c (u, G(D Gl } ), G(DGl )) c θ (u, v) dv du, S l=1 Gl d F {J (0,1) ( d dc G S l=1 Gl ) c (F (D F } h ), F (DF h ), v) c θ (u, v) du dv, h=1 S F h J dd (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) c θ (u, v) du dv. h,l S F h S Gl 14

23 Remark: In the case of t-copulas (as defined in (6.1)), a and b can be chosen such that a 2 ν, b 2 ν, a = 1 + δ, b = 1 + δ p q for some p, q and δ. Proof: Without loss of generality, we consider the case that there is only one point of discontinuity in both F and G, i. e., D F and D G. Let Cn(u, v) be the empirical copula defined by the sample Cn(u, v) = 1 n I n {F n(x j ) u,gn(y j ) v}. j=1 We consider the decomposition of the empirical copula measure into 4 sub-measures C cc n, C cd n, C dc n, and C dd n, where for S = {cc}, {cd}, {dc} and {dd}, and Cn S (u, v) = 1 n I n {F n(x j ) u,gn(y j ) v}, j A S Cn(u, v) = S {cc,cd,dc,dd} C S n (u, v). Then by the Glivenko-Cantelli Theorem, C cc n (u, v) converges uniformly to C 1 (u, v), C cd n (u, v) converges uniformly to C 2 (u, v), C dc n (u, v) converges uniformly to C 3 (u, v), and C dd n (u, v) converges uniformly to C 4 (u, v) respectively, and C i (u, v) for i = 1, 2, 3, 4, are given by C 1 (u, v) = P (F (X) u and G(Y ) v) if (u, v) (0, F (D F )) (0, G(D G )) P (F (X) (0, u) S F c and G(Y ) v) if (u, v) [F (D F ), 1) (0, G(D G )) P (F (X) u and G(Y ) (0, v) S G c ) if (u, v) (0, F (D F )) [G(D G ), 1) P (F (X) (0, u) S F c and G(Y ) (0, v) Sc G ) if (u, v) [F (D F ), 1) [G(D G ), 1) C 2 (u, v) = 0 if v (0, G(D G )) P (F (X) u and Y = D G ) if (u, v) (0, F (D F )) [G(D G ), 1) P (F (X) (0, u) S F c and Y = D G ) if (u, v) [F (D F ), 1) [G(D G ), 1) 15

24 C 3 (u, v) = 0 if u (0, F (D F )) P (X = D F and G(Y ) v) if (u, v) [F (D F ), 1) (0, G(D G )) P (X = D F and G(Y ) (0, v) S G c ) if (u, v) [F (D F ), 1) [G(D G ), 1) and C 4 (u, v) only put mass P (X = D F and Y = D G ) on a single point {F (D F )} {G(D G )}. Note that C i (u, v), i = 1,..., 4, are not probability measures and 4 C θ (u, v) = C i (u, v). i=1 To obtain the almost sure convergence, it takes four steps. Step 1. Let Rn cc = (0,1) 2 Jcc (u, v)dcn cc (u, v). Now we prove that Rn cc µ cc E[N cc (X, Y )] almost surely. We show that J cc is uniformly integrable with respect to the measures dcn cc (u, v) by showing (0,1) 2 Jcc (u, v) 1+ɛ dcn cc (u, v) is bounded for some ɛ > 0. Using the Hölder s inequality and the assumption (C1), one can derive the following chain of inequalities, noting that dcn cc (u, v) putting mass 1 on each continuous n point: (0,1) 2 Jcc (u, v) 1+ɛ dc cc n (u, v) M 1 (0,1) 2 r(u)a(1+ɛ) r(v) b(1+ɛ) dc cc n (u, v) M 1 { (0,1) 2 r (u)a(1+ɛ)p dc cc n (u, v) } 1/p { 1 ( ) 1/p j a(1+ɛ)p 1 M 1 r n n + 1 n j Acc j Acc = M ( ) 1 j ( 1+δ)(1+ɛ) r n n + 1 j Acc 1 1 M 1 0 {u(1 u) (1 δ)(1+ɛ) du <, } 16 } 1/q (0,1) 2 r (u)b(1+ɛ)q dcn cc (u, v) ( j r n + 1 ) 1/q b(1+ɛ)q

25 for ɛ < δ. C 1 (u, v), we have Combining the result above with the fact that C cc n (u, v) uniformly converges to Rn cc (0,1) 2 Jcc (u, v)dc 1 (u, v) = ((0,1) S F c ) ((0,1) Sc G ) Jcc (u, v)c θ (u, v) dudv almost surely, which completes step 1. Step 2. Let Rn cd = (0,1) 2 Jcd (u, v 1, v 2 )dcn cd (u, v). Now we prove that R cd n µ cd E[N cd (X, Y )] almost surely. We show that J cd (u, v 1, v 2 ) is uniformly integrable with respect to the measures dcn cd (u, v) by showing (0,1) 2 Jcd (u, v 1, v 2 ) 1+ɛ dcn cd (u, v) is bounded for some ɛ > 0. Using the Hölder s inequality and the assumption (C2), one can derive the following chain of inequalities, noting that dc cd n (u, v) putting mass 1 n on each point on ((0, 1) Sc F ) G n(d G ): (0,1) 2 Jcd (u, v 1, v 2 ) 1+ɛ dc cd n (u, v) M 2 (0,1) 2 r(u)a(1+ɛ) dc cd n (u, v) M 2 { (0,1) 2 r (u)a(1+ɛ)p dc cd n (u, v) 1 M 2 r n j A cd 1 = M 2 r n j A cd M 2 { 1 <, 0 ( j n + 1 ( j n + 1 1/p ) a(1+ɛ)p } 1/p { 1 1/p ) ( 1+δ)(1+ɛ) 1 {u(1 u) (1 δ)(1+ɛ) } du } 1/p } 1/q (0,1) 2 1(1+ɛ)q dcn cd (u, v) if ɛ < δ. Combining the result above with the fact that C cd n (u, v) uniformly converges to C 2 (u, v), 17

26 we have R cd n (0,1) 2 Jcd (u, G(D G ), G(D G ))dc2 (u, v) = ((0,1) S F c ) Jcd (u, G(D G ), G(D G )) c θ (u, v) dv du S G almost surely, which completes step 2. Step 3. Under the assumption (C3), the proof for R dc n µ dc E[N dc (X, Y )] is similar to that in step 2, so it is omitted. Step 4. The convergence of R dd n can be established using the SLLN. Let Rn dd = (0,1) 2 Jdd (u 1, u 2, v 1, v 2 )dcn dd (u, v). Now we prove that R dd n µ dd E[N dd (X, Y )] almost surely. We show that J dd (u 1, u 2, v 1, v 2 ) is uniformly integrable with respect to the measures dcn dd (u, v) by showing (0,1) 2 Jdd (u 1, u 2, v 1, v 2 ) 1+ɛ dcn dd (u, v) is bounded for some ɛ > 0. Noting that dcn dd (u, v) putting mass n dd n on the single point (F n(d F ), Gn(D G )), where n dd is the number of observations of (X j, Y j ) such that X j = D F and Y j = D G : (0,1) 2 Jdd (u 1, u 2, v 1, v 2 ) 1+ɛ dc dd n (u, v) = n dd n Jdd (u 1, u 2, v 1, v 2 ) 1+ɛ <, as long as J dd (u 1, u 2, v 1, v 2 ) is finite, which is the assumption (C4). Combining the result above with the fact that Cn dd (u, v) uniformly converges to C 4 (u, v), we have Rn dd (0,1) 2 Jdd (u 1, u 2, v 1, v 2 )dc 4 (u, v) = J dd (F (D F ), F (D F ), G(D G ), G(D G )) S F S G c θ (u, v) du dv almost surely, which completes step 4. 18

27 In summary, note that Rn = R cc n + R cd n + R dc n + R dd n, Theorem is true. 3.3 Asymptotic Behavior of B n In the case when H θ and J are both continuous, in [11], by letting J = l θ,θ, one can see of interest is the asymptotic behavior of the statistic Rn defined by Rn = 1 n J(Fn(X n j ), Gn(Y j )) (3.8) j=1 where J(, ) is a function on (0, 1) 2. Genest et al. [11] have shown that Bn E(l θ,θ {θ, F (X), G(Y )}) as n almost surely. In the present case, the appropriate statistic is similar in form to Rn but with a significant difference, H θ involves one or more jump points. Therefore, Rn can be written of the form as in (3.5), with J = l θ,θ. Almost sure convergence of Rn is established in the following theorem: Theorem Let J = l θ,θ, r(u) = u(1 u), δ > 0, p and q are positive numbers satisfying 1/p + 1/q = 1. Let a and b be numbers given by a = ( 1 + δ)/p and b = ( 1 + δ)/q. Consider the conditions (C1) J cc (u 2, v 2 ) M 1 r(u 2 ) a r(v 2 ) b, (C2) J cd (u 2, v 1, v 2 ) M 2 r(u 2 ) a (independent of v 1 and v 2 ), and (C3) J dc (u 1, u 2, v 2 ) M 3 r(v 2 ) b (independent of u 1 and u 2 ). (C4) In a small neighborhood of each (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) [0, 1]4, for h = 1, 2,..., d F and l = 1, 2,..., d G, J dd (u 1, u 2, v 1, v 2 ) is finite. Under conditions (C1-C4), Rn β almost surely where β = E[N cc (X, Y ) + N cd (X, Y ) + N dc (X, Y ) + N dd (X, Y )], (3.9) 19

28 with E[N S (X, Y )] defined as in (3.7), and S = {cc}, {cd}, {dc}. Since the proof is similar to that in Theorem 3.2.1, it is omitted here. 3.4 Asymptotic Behavior of A n By taking J = l θ in (3.5), we have the following Theorem: Theorem Let r(u) = u(1 u), δ > 0, p and q be as in Theorem Let Ju S and Jv S be the partial derivatives of J S with respect to u and v, respectively, for S = {cc}, {cd}, {dc}, and {dd}. Also, let a and b be numbers given by a = ( δ)/p and b = ( δ)/q. Consider the conditions (D1) (D2) J cc (u 2, v 2 ) M 1 r(u 2 ) a r(v 2 ) b, with partial derivatives satisfying J cc u 2 (u 2, v 2 ) M 2 r(u 2 ) a 1 r(v 2 ) b and J cc v 2 (u 2, v 2 ) M 3 r(u 2 ) a r(v 2 ) b 1, J cd (u 2, v 1, v 2 ) M 4 r(u 2 ) a with Ju cd (u 2 2, v 1, v 2 ) M 5 r(u 2 ) a 1, v2 c θ (u 2, v)dv M 6 r(u 2 ) a, v 1 { } (0,1) { h S F h } c Jη cd (u 2, G(D Gl ), G(D Gl S )) c θ (u 2, v)dv du 2 < l Gl with η = v 1 or v 2, Ju cd (u 2 2, v 1, v 2 ) is continuous w.r.t. u 2 on C(F ) almost surely, J cd v 1 (u 2, v 1, v 2 ) and J cd v 2 (u 2, v 1, v 2 ) are continuous w.r.t. v 1 and v 2 respectively in the small neighborhoods of rectangles C(F ) (G(D Gl ), G(D Gl )] almost surely for l = 1,..., D G, 20

29 (D3) J dc (u 1, u 2, v 2 ) M 7 r(v 2 ) b with Jv dc (u 2 1, u 2, v 2 ) M 8 r(v 2 ) b 1, u2 c θ (u, v 2 )du M 9 r(v 2 ) b, u 1 { } (0,1) { l S Gl } c Jη dc (F (D F h ), F (D F h ), v 2 S ) c θ (u, v 2 )du dv 2 < h F h with η = u 1 or u 2, Jv dc (u 2 1, u 2, v 2 ) is continuous w.r.t. v 2 on C(G) almost surely, J dc u 1 (u 1, u 2, v 2 ) and J dc u 2 (u 1, u 2, v 2 ) are continuous w.r.t. u 1 and u 2 respectively in the small neighborhoods of rectangles (F (D F h ), F (D F h )] C(G) almost surely for h = 1,..., D F, (D4) J dd u 1 (u 1,,, ), J dd u 2 (, u 2,, ), J dd v 1 (,, v 1, ), and J dd v 2 (,,, v 2 ) are continuous w.r.t. u 1, u 2, v 1 and v 2, respectively, in a small neighborhood of (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) for any h = 1,..., D F and l = 1,..., D G. Under conditions (D1-D4), n 1/2 (Rn κ) N(0, σ 2 ) in distribution as n, where κ is defined in (3.7), and σ 2 [ = var M cc (X, Y ) + M cd (X, Y ) + M dc (X, Y ) + M dd ] (X, Y ), (3.10) with M cc (x, y), M cd (x, y), M dc (x, y) and M dd (x, y) are given as below, respectively, J cc (F (x), G(y))I {x C(F ),y C(G)} + {R\{ h D F h }} {R\{ l D Gl }} I {x x } Jcc u (F (x ), G(y ))dh 2 θ (x, y ) + {R\{ h D F h }} {R\{ l D Gl }} I {y y } Jcc v (F (x ), G(y ))dh 2 θ (x, y ), (3.11) 21

30 d G J cd (F (x), G(D Gl ), G(D Gl ))I {x C(F ), y=d Gl } l=1 d G + R\{ l=1 h D F h } I {x x } Jcd u (F (x ), G(D 2 Gl ), G(D Gl ))dh θ (x, D Gl ) d G + R\{ l=1 h D F h } I {y<d Gl } Jcd v (F (x ), G(D 1 Gl ), G(D Gl ))dh θ (x, D Gl ) d G + l=1 R\{ h D F h } I {y D Gl } Jcd v (F (x ), G(D 2 Gl ), G(D Gl ))dh θ (x, D Gl ), d F J dc (F (D F h ), F (D F h ), G(y))I {y C(G)} h=1 d F + h=1 R\{ l D Gl } I {y y } Jdc v (F (D 2 F h ), F (D F h ), G(y ))dh θ (D F h, y ) d F + R\{ h=1 l D Gl } I {x<d F h } Jdc u (F (D 1 F h ), F (D F h ), G(y ))dh θ (D F h, y ) d F + h=1 R\{ l D Gl } I {x D F h } Jdc u (F (D 2 F h ), F (D F h ), G(y ))dh θ (D F h, y ), J dd (F (D F h ), F (D F h ), G(D Gl ), G(D Gl ))I {x=d F h, y=d Gl } h,l + [ I {y<dgl } Jdd v (F (D 1 F h ), F (D F h ), G(D Gl ), G(D Gl )) h,l + I {y DGl } Jdd v 2 (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) + I {x<df h } Jdd u 1 (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) + I {x DF h } Jdd u 2 (F (D F h ), F (D F h ), G(D Gl ), G(D Gl )) ] P (X = D F h, Y = D Gl ). In the case when both H θ and J are continuous, Genest et al. [11] showed that the statistic Rn 22

31 was a special case of multivariate rank order statistics whose asymptotic behavior was thoroughly studied by Ruymgaart et al. [41], Rumyaart [42] and Rüschendorf [40]. For continuous H θ and J, Genest et al. [11] proposed regularity conditions ensuring almost sure convergence and asymptotic normality. In the present case, H θ involves one or more jump points, which causes the J function to be discontinuous on (0, 1) 2 as well. In the rest of this Chapter, we will provide a scratch proof of Theorem Without loss of generality, we will show the theorem is true in the case that there is only one discontinuity point in both F and G, which should be completed in the subsequent sections, using arguments similar to those used in [41]. We shall need the following empirical processes Un(F (x)) = n 1/2 (Fn(x) F (x)), Un(F (x )) = n 1/2 (Fn(x ) F (x )), Vn(G(y)) = n 1/2 (Gn(y) G(y))), Vn(G(y )) = n 1/2 (Gn(y ) G(y )). Note that P (Ω 0 ) = P ({ω : Fn(F 1 (F )) = Fn, Gn(G 1 (G)) = Gn, for all x, y and n}) = 1. (3.12) The above identities, Fn(F 1 (F )) = Fn and Gn(G 1 (G)) = Gn are true even for the case when there are jump points. At the jump point of F, one can see that F 1 (F (D F )) = D F. The similar result holds for G as well Asymptotic normality of R cc n For small positive γ define the set Ωγn = {ω : sup F n F < γ/2, sup Gn G < γ/2}. (3.13) 23

32 For ω Ω 0 Ωγn, the Mean Value Theorem gives n 1/2 J cc (Fn, ) = n 1/2 J cc (F, ) + Un(F )J cc u 2 (Φn, ), for all x C(F ). In the formula above Φn is defined by Φn = F + η(fn F ), where η = η(ω, x, n) is a number between 0 and 1. Let F = [X 1n, D F ) (D F, Xnn], G = [Y 1n, D G ) (D G, Ynn], where X 1n ( or Y 1n ) = min 1 j n X j ( or Y j ), Xnn( or Ynn) = max 1 j n X j ( or Y j ). Note that where n 1/2 (R cc n µ cc ) = 3 4 A in + Bγ in + B 5n + Cn, i=1 i=1 µ cc = E[N cc (X, Y )] with l θ,θ replaced by l θ in (3.9), A 1n = A 2n = A 3n = B γ 1n = n 1/2 {R\D F } {R\D G } Jcc (F (x), G(y))d[Hn(x, y) H θ (x, y)], {R\D F } {R\D G } U n(f (x))j cc u 2 (F (x), G(y))dH θ (x, y), {R\D F } {R\D G } V n(g(y))j cc v 2 (F (x), G(y))dH θ (x, y), χ(ω c γn ){n1/2 {R\D F } {R\D G } [Jcc (Fn(x), G(y)) J cc (F (x), G(y))]dHn(x, y) A 2n }, 24

33 B γ 2n = χ(ωγn ) {R\D F } {R\D G } U n(f (x))[ju cc (Φn(x), G(y)) 2 Ju cc ] (F (x), G(y)) dhn(x, y), 2 B γ 3n = χ(ωγn ) { F G } {{R\D F } {R\D G }} U n(f (x)) J cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)], B γ 4n = χ(ωγn ) { { F G } {{R\D F } {R\D G }} U n(f (x)) J cc u 2 (F (x), G(y))dH θ (x, y) A 2n }, B 5n = n 1/2 {R\D F } {R\D G } [Jcc (F (x), Gn(y)) J cc (F (x), G(y))] dhn(x, y) A 3n, Cn = n 1/2 {R\D F } {R\D G } [Jcc (Fn(x), Gn(y)) J cc (Fn(x), G(y)) J cc (F (x), Gn(y)) + J cc (F (x), G(y))] dhn(x, y), where χ(ωγn ) denotes the indicator function of Ω γn ; H n(x, y) is the empirical cumulative distribution of H θ (x, y). Note that with a further decomposition of Cn, one has C γ 1n = χ(ω c γn )n1/2 {R\D F } {R\D G } [Jcc (Fn(x), Gn(y)) J cc (Fn(x), G(y)) C γ 2n = J cc (F (x), Gn(y)) + J cc (F (x), G(y))] dhn(x, y), χ(ωγn ) {R\D F } {R\D G } U n(f (x))[ju cc (Φn(x), Gn(y)) 2 Ju cc ] (Φn(x), G(y)) dhn(x, y). 2 We start with a few lemmas to be used in the proof. 25

34 Lemma For any ξ 0 and function r(u) = u(1 u), the function r(u) ξ is symmetric about 1 ( 2, decreasing on 0, 1 ] and has the property that for each β in (0, 1) there exits a constant 2 M = M β such that and Proof of Lemma can be found in [41]. r(βs) ξ Mr(s) ξ for 0 < s 1 2, r(1 β(1 s)) ξ Mr(s) ξ for 1 2 < s < 1. Lemma Let Φn and Ψn be functions on n1 and n2, where [X 1n, Xnn], if Xnn D df n1 = [X 1n, Xnn), if Xnn = D df and respectively, satisfying n2 = [Y 1n, Ynn], if Ynn D dg [Y 1n, Ynn), if Ynn = D dg min(f, Fn) Φn max(f, Fn) and where defined. Then uniformly for n = 1, 2,, min(g, Gn) Ψn max(g, Gn) Proof: (i) (ii) sup r(φn) ξ r(f ) ξ = Op(1), for each ξ 0, n1 sup r(ψn) η r(g) η = Op(1), for each η 0. n2 It suffices to prove (i). From Lemma A.3 in [45] and (3.12), it follows that one needs to show that for each ɛ > 0, there exists a constant β = βɛ in (0, 1) such that P (Ωn) = P ({ω : βf Fn 1 β(1 F ) any x(ω) on n1 }) > 1 ɛ, (3.14) 26

35 for all n and F. Let A = {ω : βf Fn 1 β(1 F ) any x(ω) n1 C(F )}, and B = {ω : βf Fn 1 β(1 F ) any x(ω) n1 J (F )}. Note that in our case, P ({ω : βf Fn 1 β(1 F ) any x(ω) on n1 }) = P (A B) = P (A) + P (B) = P ({ω : x(ω) n1 C(F )}) P ({ω : βf Fn 1 β(1 F ) x(ω) n1 C(F )}) +P ({ω : x(ω) n1 J (F )}) P ({ω : βf Fn 1 β(1 F ) x(ω) n1 J (F )}). In Lemma 6.1 in [41] (3.14) has been verified for all continuous F for a constant β = β ɛ, which gives us P ({ω : βf Fn 1 β(1 F ) x(ω) C(F ) n1 }) > 1 ɛ. Noting that P ({ω : x(ω) n1 C(F )}) + P ({ω : x(ω) n1 J (F )}) = 1, to prove (3.14) we only need to show when x(ω) n1 J (F ), P ({ω : βf Fn 1 β(1 F ) any x(ω) n1 J (F )}) > 1 ɛ is true for a constant β = β ɛ. This is a direct result of the Glivenko-Cantelli Theorem, which completes the proof. Lemma Uniformly in all F, we have (i) (ii) sup Un(F ) Un(F ) r(f ) ρ 1/2 p 0, as n, for each ρ 0, n1 sup (, )\{ h D F h } Un(F ) r(f ) ρ 1/2 = Op(1), as n, for each ρ 0, where Un(F (x)) = n 1/2 ( ˆFn(x) F (x)), and ˆFn is the empirical distribution of F. Note that Fn n was defined to be n + 1 ˆFn. Proof: (i) Note that Un(F ) Un(F ) r(f ) ρ 1/2 = n 1/2 ˆFn n + 1 r(f )ρ 1/2, 27

36 and that for any fixed β (0, 1), we have r ( ) β ρ 1/2 ( = r 1 β ρ 1/2 = O(n n n) ρ+1/2 ). Since F (X i ) are i.i.d. uniform random variables, given any arbitrary ɛ > 0, we can choose a β = βɛ in (0, 1) such that P ( β n F (X 1n ) F (X nn) 1 β ) > 1 ɛ n for all n and all F with F (D F ) 1, which is obvious since = = ( β P n F (X 1n ) F (X nn) 1 β ) n ( 1 β n β ) n n ( 1 2β ) n e 2β > 1 ɛ, n for sufficiently small β. Combining with all above results, we proved (i). (ii) follows from (i) and (iii) of Lemma 4.2 in [41]. Proof completed. Proof of Asymptotic normality of R cc n. (1) Note that A 1n can be written as A 1n = n 1/2 n A 1jn, j=1 where A 1jn = J cc (F (X j ), G(Y j )) I {j A cc} µcc. The A 1jn are i.i.d. with mean zero. By 28

37 the assumption (D1) and the Hölder s inequity, we have {R\D F } {R\D G } Jcc (F (x), G(y)) 2+δ 0dH θ (x, y) {R\D F } {R\D G } M2+δ 0 1 r(f (x)) a(2+δ 0 ) r(g(y)) b(2+δ 0 ) dh θ (x, y) [ ] 1/p0 [ ] 1/q0 M 2+δ 0 1 (0,1) r(u)a(2+δ 0 )p 0du (0,1) r(v)b(2+δ 0 )q 0dv [ ] 1/p0 [ ] 1/q0 = M 2+δ 0 1 (0,1) r(u)( 1 2 +δ)(2+δ 0 ) du (0,1) r(v)( 1 2 +δ)(2+δ 0 ) dv <, for some selected p 0, q 0 and δ satisfying (1 δ)(2 + δ 0 ) < 1, which means that A 1jn has a finite absolute moment of order 2+δ 0 for some δ 0 > 0. Moreover, the term on the left hand is uniformly bounded above of order 2 + δ 0. Using the CLT, we get the asymptotic normality of A 1n. (2) Note that A 2n can be written as A 2n = n 1/2 {R\D F } {R\D G } ( ˆFn(x) F (x))j cc u 2 (F (x), G(y))dH θ (x, y) +n 1/2 {R\D F } {R\D G } (F n(x) ˆFn(x))J cc u 2 (F (x), G(y))dH θ (x, y) = A 2n1 + A 2n2, where ˆFn(x) is the empirical distribution function of X. Let 0 if x < X j φ Xj (x) = 1 if x X j Then A 2n1 can be written as n 1/2 n A 2jn, where A 2jn = j=1 {R\D F } {R\D G } (φ X (x) F (x))j cc j u (F (x), G(y))dH 2 θ (x, y) are i.i.d. with mean zero. Note that φ Xj (x) F (x) 1. Under the assumption (D1), we have A 2jn M 2 {R\D F } {R\D G } ra 1 (F (x))r b (G(y))dH θ (x, y). 29

38 Note that {R\D F } {R\D G } ra 1 (F (x))r b (G(y))dH θ (x, y) < uniformly as long as we can find some p 1 and q 1 satisfying 1/p 1 + 1/q 1 = 1 and (a 1)p 1 > 1 and bq 1 > 1. Thus, we have shown that A 2n1 has an absolute moment of order 2 + δ 1, which leads to the asymptotic normality of A 2n1. Note that with the same p 1 and q 1 A 2n2 = n1/2 n + 1 {R\D F } {R\D G } n1/2 n + 1 sup x ˆFn(x) = op(1). Therefore, we have the asymptotic normality of A 2n. (3) Similarly, note that A 3n can be written as ˆFn(x)J cc u 2 (F (x), G(y))dH θ (x, y) {R\D F } {R\D G } Jcc u 2 (F (x), G(y))dH θ (x, y) A 3n = n 1/2 {R\D F } {R\D G } (Ĝn(y) G(y))J cc v 2 (F (x), G(y))dH θ (x, y) +n 1/2 {R\D F } {R\D G } (G n(y) = A 3n1 + A 3n2, Ĝn(y))J cc v 2 (F (x), G(y))dH θ (x, y) where Ĝn(y) is the empirical distribution function of Y. Similarly, the A 3n1 can be written as n 1/2 n A 3jn, where A 3jn = j=1 {R\D F } {R\D G } (φ Y (y) j G(y))Jv cc (F (x), G(y))dH 2 θ (x, y) are i.i.d. with mean zero. The same asymptotic conclusion can be drawn for A 3n. n (4) Note that the A 1n + A 2n + A 3n can be written as n 1/2 (A 1jn + A 2jn + A 3jn ) + j=1 A n 2n2 + A 3n2 n 1/2 A 1jn + A 2n2 + A 3n2, where the A 1jn = A 1jn + A 2jn + A 3jn j=1 depends on (X j, Y j ) only, hence are i.i.d. with mean zero, and the terms A 2n2 p 0 and A 3n2 p 0 as n. Using the CLT, we have that the A 1n + A 2n + A 3n is asymptotically normally distributed with mean 0. The asymptotic negligibility of the B and C terms will be stated as corollaries. 30

39 Corollary 1. For fixed γ, B γ1n p 0 and C γ1n p 0 as n. Proof: Note that P (Ω c γn ) 0 in (3.13) for any H θ by the Glivenko-Cantelli Theorem, and because the distribution of sup x Fn(x) F (x) does not depend on H θ. This completes the proof. Corollary 2. For fixed γ, B γ2n p 0 and C γ2n p 0 as n. Proof: According to Lemma (ii) with ρ = 1, for given ɛ > 0, there exists a constant 2 M such that P (Ω 1n ) = P (ω : {sup x Un(F ) M and x C(F )}) > 1 ɛ (3.15) for all n and H θ, and M such that P (Ω 2n ) = P (ω : { sup n1 r ξ (Φn)r ξ (F ) M }) > 1 ɛ, which gives P (Ω 1n Ω 2n ) > 1 2ɛ. Also, χ(ω 1 n Ω 2n ) B γ2n M J sup cc F G u (Φn(x), G(y)) J cc 2 u (F (x), G(y)) 2. +χ(ω 1 n Ω 2n ) {(,X 1n ) (Xnn, )} {(,Y 1n ) (Ynn, )} U n(f (x)) [J cc u 2 (Φn(x), G(y)) J cc u 2 (F (x), G(y))]dHn(x, y) By the Glivenko-Cantelli Theorem, P ({(, X 1n ) (Xnn, )} {(, Y 1n ) (Ynn, )}) 0 for any H θ, so the second term on the right side of the inequality above converges to 0, as n. The function J cc u 2 (u 2, v 2 ) is uniformly continuous on (0, 1) 2. Since Φn F Fn F where Φn is defined, the Glivenko-Cantelli Theorem yields J sup cc F G u (Φn(x), G(y)) J cc 2 u (F (x), G(y)) 2 p 0 uniformly for H θ. Therefore, we have shown that B γ2n p 0 as n. A similar argument may be used for C γ2n. Proof completed. 31

40 Corollary 3. For fixed γ, B γ3n p 0 as n. Proof: Noting that { F G } {{R\D F } {R\D G }} = F G, B γ3n = χ(ωγn ) = χ(ωγn Ω 1n ) +χ(ωγn Ωc 1n ) F G Un(F (x))j cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)] F G Un(F (x))j cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)] F G Un(F (x))j cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)] (3.16) where Ω 1n is as defined in (3.15). Note that the second term in (3.16) converges to 0 in probability, and χ(ωγn Ω 1n ) F G Un(F (x))j cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)] χ(ωγn Ω 1n )M F G J cc u 2 (F (x), G(y))d[Hn(x, y) H θ (x, y)]. χ(ωγn Ω 1n )M M F G d[hn(x, y) H θ (x, y)] where M J = sup cc F G u (F (x), G(y)) 2. From the Theorem 1.9, (i), from [43], the above integral converges to 0 in probability as n. Proof completed. Corollary 4. For fixed γ, B γ4n p 0 as n. Proof: Note that B γ4n = χ(ωγn Ω 1n ) { F G Un(F (x))j cc u 2 (F (x), G(y))dH θ (x, y) A 2n } +χ(ωγn Ωc 1n ) { F G Un(F (x))j cc u 2 (F (x), G(y))dH θ (x, y) A 2n } 32

41 = χ(ωγn Ω 1n ) {(,X 1n ) (Xnn, )} {(,Y 1n ) (Ynn, )} U n(f (x)) J cc u 2 (F (x), G(y)) dh θ (x, y) +χ(ωγn Ωc 1n ) { F G Un(F (x))j cc u 2 (F (x), G(y))dH θ (x, y) A 2n } where Ω 1n is as defined in (3.15). By the Glivenko-Cantelli Theorem, P ({(, X 1n ) (Xnn, )} {(, Y 1n ) (Ynn, )}) 0 for any H θ. Combining with the fact that J cc u 2 is uniformly continuous on (0, 1) 2, the righthand side converges to 0, as n. To see B 5n converging to 0 in probability as n, let us notice that 4 B γin = n 1/2 i=1 {R\D F } {R\D G } [Jcc (F (x), Gn(y)) J cc (F (x), G(y))]dHn(x, y) A 2n. In summary, we have shown that all these B-terms and C-term are negligible. Combining with the results of these A-terms, we have established the asymptotic normality of R cc n Asymptotic normality of R cd n and R dc n It suffices to show the asymptotic normality of R cd n. Note that n 1/2 (R cd n µ cd ) = 4 6 A in + B in, i=1 i=1 where 33

42 µ cd = E[N cd (X, Y )], A 1n = n 1/2 {R\D F } Jcd (F (x), G(D G ), G(D G ))d[h n(x, D G ) H θ (x, D G )]; A 2n = A 3n = A 4n = B 1n = B 2n = B 3n = B 4n = B 5n = B 6n = {R\D F } U n(f (x))ju cd (F (x), G(D 2 G ), G(D G ))dh θ (x, D G ); {R\D F } V n(g(d G ))Jcd v (F (x), G(D 1 G ), G(D G ))dh θ (x, D G ); {R\D F } V n(g(d G ))J cd v 2 (F (x), G(D G ), G(D G ))dh θ (x, D G ); {R\D F } U n(f (x))ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] {R\D F } V n(g(d G ))Jcd v (F (x), Θn(D 1 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] {R\D F } V n(g(d G ))J cd v 2 (F (x), G(D G ), Ψ n(d G ))d[hn(x, D G ) H θ (x, D G )] {R\D F } U n(f (x))[j cd u 2 (Φn(x), Gn(D G ), G n(d G )) J cd u 2 (F (x), G(D G ), G(D G )) ] dh θ (x, D G ) {R\D F } V n(g(d G ))[Jcd v 1 (F (x), Θn(D G ), G n(d G )) J cd v 1 (F (x), G(D G ), G(D G )) ] dh θ (x, D G ) {R\D F } V n(g(d G ))[J cd v 2 (F (x), G(D G ), Ψ n(d G )) J cd v 2 (F (x), G(D G ), G(D G )) ] dh θ (x, D G ) where Φn( ), Θn( ), and Ψn( ) are defined by the Mean Value Theorem. Next, we will show that the A terms are asymptotic normal and the B terms converge to 0 in probability. 34

43 (1) Note that A 1n can be written as A 1n = n 1/2 n A 1jn, j=1 where A 1jn = J cd (F (X j ), G(D G ), G(DG )) I {j Acd } µcd. Note that A 1jn are i.i.d. with mean zero. Since J cd (u, v 1, v 2 ) = v2 θ log c θ (u, v)dv, v 1 using assumption (D2), we have {R\D F } Jcd (F (x), G(D G ), G(DG )) 2+δ 0dH θ (x, D G ) { } 1/p0 {R\D F } Jcd (F (x), G(D G ), G(DG )) (2+δ 0 )p 0dH θ (x, D G ) { } 1/q0 {R\D F } 1(2+δ 0 )q 0dH θ (x, D G ) { (0,1) S F c J cd (u, G(D G ), G(DG )) (2+δ 0 )p } 1/p0 0 c θ (u, v)dv du. S G Under the assumption (D2), (0,1) S F c J cd (u, v 1, v 2 ) (2+δ 0 )p 0 c θ (u, v)dv du S G M 4 (0,1) S F c r(u) a(2+δ 0 )p 0 c θ (u, v)dv du S G M 4 M 6 (0,1) S F c r(u) a(2+δ 0 )p 0r(u) a du <, if a(2 + δ 0 )p 0 + a > 1. By the CLT, we have the asymptotic normality of A 1n. 35

44 (2) Note that A 2n can be written as A 2n = n 1/2 {R\D F } ( ˆFn(x) F (x))j cd u 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) +n 1/2 {R\D F } (F n(x) ˆFn(x))J cd u 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) = A 2n1 + A 2n2. Furthermore, A 2n1 can be written as n 1/2 n j=1 A 2jn, where A 2jn = {R\D F } (φ X (x) j F (x))j cd u 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) are i.i.d. with mean 0. Using the assumption (D2), we have A 2jn M 5 (0,1) S F c M 5 M 6 (0,1) S c F r a 1 (u) c θ (u, v)dv du S G r 2a 1 (u)du < as long as (2a 1) > 1. Thus we have the asymptotic normality of A 2n1. Using the assumption (D2), we have A n 2n2 = ˆFn(x)J n + 1 u cd (F (x), G(D {R\D F } 2 G ), G(DG ))dh θ (x, D G ) n n + 1 sup x ˆFn(x) {R\D F } Jcd u (F (x), G(D 2 G ), G(DG ))dh θ (x, D G ) = op(1). Therefore, the asymptotic normality of A 2n has been established. (3) Note that A 3n can be written as A 3n = n 1/2 {R\D F } [Ĝn(D G ) G(DG )]J cd v 1 (F (x), G(D G ), G(DG ))dh θ (x, D G ) +n 1/2 {R\D F } [G n(d G ) Ĝn(D G )]J cd v 1 (F (x), G(D G ), G(DG ))dh θ (x, D G ) = A 3n1 + A 3n2. Similarly, A 3n1 can be written as n 1/2 n j=1 A 3jn, where A 3jn = {R\D F } (ψ Y (D j G ) 36

45 Gn(D G ))J cd v (F (x), G(D 1 G ), G(DG ))dh θ (x, D G ) are i.i.d. with mean zero. and 1 if Y j < y, ψ Yj (y) = 0 if Y j y. Noting that the absolute value of the random part ψ Yj (D G ) Gn(D G ) in A3jn is bounded above by 1, we have A 3jn (0,1) S F c Jv cd (u, G(D 1 G ), G(DG )) c θ (u, v)dv du. S G By the assumption (D2), the integral above is finite. Therefore, A 3jn has an absolute moment of order 2 + δ 1 for some δ 1 > 0, which leads to the asymptotic normality of A 3n1. Using argument similar to that used for A 2n2 as in (2), one can show that A 3n2 = o p(1). In summary, we have the asymptotic normality of A 3n. (4) Result of A 4n can be obtained in a similar way by noting that A 4n can be written as A 4n = n 1/2 {R\D F } [Ĝn(D G ) G(D G )]J cd v 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) +n 1/2 {R\D F } [G n(d G ) Ĝn(D G )]J cd v 2 (F (x), G(D G ), G(DG ))dh θ (x, D G ) = A 4n1 + A 4n2, and A n 4n1 can be written as n 1/2 A 4jn, where A 4jn = j=1 {R\D F } (φ Y (D j G ) Gn(D G ))Jv cd (F (x), G(D 2 G ), G(DG ))dh θ (x, D G ) are i.i.d. with mean zero. Using arguments similar to those in (3), we can get that A 4n is asymptotically normally distributed with mean 0. (5) Finally, we show that the sum of A in, i = 1,..., 4, converges to a normal random variable n with mean 0. Note that the sum can be written as n 1/2 (A 1jn + A 2jn + A 3jn + A 4jn ) + j=1 A n 2n2 + A 3n2 + A 4n2 n 1/2 A 2jn + A 2n2 + A 3n2 + A 4n2, where A 2jn = A 1jn + j=1 A 2jn + A 3jn + A 4jn depends on (X j, Y j ) only, hence are i.i.d. with mean 0 as shown before 37

46 (Similarly, we can define A 3jn for Rdc n ), and A 2n2, A 3n2 and A 4n2 are negligible. Using the CLT, we have that the sum of A in, i = 1,..., 4, is asymptotically normally distributed with mean 0. We will finish the proof of R cd n by a few corollaries. Corollary 5. B 1n p 0 as n. Proof: Note that B 1n = χ(ω 1n ) {R\D F } U n(f (x))ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] +χ(ω c 1n ) H θ (x, D G )] {R\D F } U n(f (x))j cd u 2 (Φn(x), Gn(D G ), G n(d G ))d[hn(x, D G ) where Ω 1n is defined in (3.15), which tells us the second term on the righthand side of the equality above goes to 0 in probability as n. Also, χ(ω 1n ) {R\D F } U n(f (x))ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] χ(ω 1n ) {R\D F } M Ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] = χ(ω 1n ) M Ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] + F χ(ω 1n ) {(,X 1n ) (Xnn, )} M Ju cd (Φn(x), Gn(D 2 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )], the second term on the righthand side of the inequality goes to 0 in probability as n by the Glivenko-Cantelli theorem and the assumption (D2). Since J cd u 2 (Φn(x), Gn(D G ), G n(d G )) is bounded above almost surely when x F and Hn(x, y) converges to H θ (x, y) in distribution as n, using Theorem 1.9 (i), [43], once again, we have the first term on the righthand side of the inequality above converges to 0. In summary, we have shown that B 1n p 0 as n. 38

47 Corollary 6. B 2n p 0 and B 3n p 0 as n. Proof: It suffices to show the result for B 2n. By the CLT, Vn(G(D G )) converges to a normal random variable as n, so we can define a set Ω 2n such that P (Ω 2n ) = P ({ω : sup ω Vn(G(D G )) M }) > 1 ɛ. Therefore, B 2n = χ(ω 2n ) {R\D F } V n(g(d G ))J cd v (F (x), Θn(D 1 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] +χ(ω c 2n ) H θ (x, D G )], {R\D F } V n(g(d G ))J cd v 1 (F (x), Θn(D G ), G n(d G ))d[hn(x, D G ) and the second term on the righthand side of the equality above goes to 0 in probability as n. Also, χ(ω 2n ) {R\D F } V n(g(d G ))J cd v (F (x), Θn(D 1 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] χ(ω 2n ) {R\D F } M Jv cd (F (x), Θn(D 1 G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] = χ(ω 2n ) χ(ω 2n ) H θ (x, D G )], F M J cd v 1 (F (x), Θn(D G ), G n(d G ))d[hn(x, D G ) H θ (x, D G )] + {(,X 1n ) (Xnn, )} M J cd v 1 (F (x), Θn(D G ), G n(d G ))d[hn(x, D G ) the second term on the righthand side of the inequality goes to 0 in probability as n by the Glivenko-Cantelli theorem and the assumption (D2). From the assumption (D2), combining with the definition of Θn( ) and the fact that J cd u 2 (u 2, v 1, v 2 ) is continuous with respect to u 2, v 1, and v 2 on (0, 1) 3, we know that 39

48 J cd u 2 (F (x), Θn(D G ), G n(d G )) is bounded above almost surely for x F. Noting that Hn(x, y) converges to H θ (x, y) in distribution as n, using Theorem 1.9 (i), [43], once again, we have the first term on the righthand side of the inequality above converges to 0. In summary, we have shown that B 2n p 0 as n. Corollary 7. B 4n p 0 as n. Proof: Note that B 4n = χ(ω 1n ) {R\D F } U n(f (x))[ju cd (Φn(x), Gn(D 2 G ), G n(d G )) Ju cd (F (x), Gn(D ] 2 G ), G n(d G )) dh θ (x, D G ) +χ(ω c 1n ) {R\D F } U n(f (x))[j cd u 2 (Φn(x), Gn(D G ), G n(d G )) J cd u 2 (F (x), Gn(D G ), G n(d G )) ] dh θ (x, D G ) where Ω 1n is defined in (3.15). Therefore, the second term on the righthand side of the equality above goes to 0 in probability as n. Also, χ(ω 1n ) {R\D F } U n(f (x))[ju cd (Φn(x), Gn(D 2 G ), G n(d G )) Ju cd (F (x), Gn(D ] 2 G ), G n(d G )) dh θ (x, D G ) χ(ω 1n ) {R\D F } M Ju cd (Φn(x), Gn(D 2 G ), G n(d G )) Ju cd (F (x), Gn(D 2 G ), G n(d G )) dh θ (x, D G ) = χ(ω 1n ) M Ju cd (Φn(x), Gn(D 2 G ), G n(d G )) F Ju cd (F (x), Gn(D 2 G ), G n(d G )) dh θ (x, D G ) + χ(ω 1n ) {(,X 1n ) (Xnn, )} M Ju cd (Φn(x), Gn(D 2 G ), G n(d G )) Ju cd (F (x), Gn(D 2 G ), G n(d G )) dh θ (x, D G ), 40

49 the second term on the righthand side of the inequality goes to 0 in probability as n by the Glivenko-Cantelli theorem and the assumption (D2). Since J cd u 2 (Φn(x), Gn(D G ), G n(d G )) J cd u 2 (F (x), Gn(D G ), G n(d G )) is bounded above almost surely when x F and Hn(x, y) converges to H θ (x, y) in distribution as n, using Theorem 1.9 (i), [43], once again, we have the first term on the righthand side of the inequality above converges to 0. In summary, we have shown that B 4n p 0 as n. Corollary 8. B 5n p 0 and B 6n p 0 as n. Proof: Using similar arguments to those used in the proof of Corollary 7 and Corollary 8, one can see this is true, so proof omitted. In summary, we have the asymptotic normality of R cd n Asymptotic normality of R dd n Note that where n 1/2 (R dd n µ dd ) = µ dd = E[N dd (X, Y )], 5 4 A kn + B kn, k=1 k=1 A 1n = A 2n = A 3n = n 1/2 J dd (F (D F ), F (D F ), G(D G ), G(D G ))[dhdd n dh dd ], Vn(G(D G ))J dd v 2 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd, Un(F (D F ))J dd u 2 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd, A 4n = Vn(G(D G ))Jdd v 1 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd, A 5n = Un(F (D F ))Jdd u 1 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd, 41

50 B 1 = Vn(G(D G ))[Jv dd (F (D 2 F ), F (D F ), G(D G ), Ψ n)dhn dd J dd v 2 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd ], B 2 = Un(F (D F ))[Ju dd (F (D 2 F ), Φ n, G(D G ), G(D G ))dhdd n J dd u 2 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd ], B 3 = Vn(G(D G ))[Jdd v (F (D 1 F ), F (D F ), Θ n, G(D G ))dhn dd J dd u 1 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd ], B 4 = Un(F (D F ))[Jdd u (Γn, F (D 1 F ), G(D G ), G(D G ))dhdd n J dd u 1 (F (D F ), F (D F ), G(D G ), G(D G ))dhdd ], with n dd = {the number of observations such that X j = D F and Y j = D G } for k = 1, 2,..., n, and dh dd n = n dd n dh dd = P (X = D F and Y = D G ). We need to show that the A-terms are asymptotically normal, and the B-terms converge to 0 in probability as n 0. By the SLLN, it is obvious that dh dd n dh dd almost surely as n, and by the CLT, in distribution Un(F (D F )) N(0, [F (D F )(1 F (D F ))]) Un(F (D F )) N(0, [F (D F )(1 F (D F ))]) Vn(G(D G )) N(0, [G(D G )(1 G(D G ))]) Vn(G(D G )) N(0, [G(D G )(1 G(D G ))]) n 1/2 [dh dd n dh dd ] N(0, [dh dd (1 dh dd )]) 42

51 as n. Note that J dd (F (D F ), F (D F ), G(D G ), G(D G )), J dd v 2 (F (D F ), F (D F ), G(D G ), G(D G )), J dd u 2 (F (D F ), F (D F ), G(D G ), G(D G )), Jv dd (F (D 1 F ), F (D F ), G(D G ), G(D G )), and Ju dd (F (D 1 F ), F (D F ), G(D G ), G(D G )), in A 1n, A 2n, A 3n, A 4n and A 5n, respectively, are fixed numbers. Therefore, the A-terms are asymptotically normally distributed with mean 0. To show the sum of the A-terms is still normal, one only need to notice that Un(F (D F )) = n1/2 ( ˆFn(D F ) F (D F )) + n1/2 (Fn(D F ) ˆFn(D F )) = n 1/2 n (ψ Xj (D F ) F (D F )) + o p(1) j=1 = n 1/2 n H 1kn + o p(1) j=1 Un(F (D F )) = n 1/2 ( ˆFn(D F ) F (D F )) + n 1/2 (Fn(D F ) ˆFn(D F )) = n 1/2 n (φ Xj (D F ) F (D F )) + op(1) j=1 = n 1/2 n H 2kn + o p(1) j=1 Vn(G(D G )) = n1/2 (Ĝn(D G ) G(D G )) + n1/2 (Gn(D G ) Ĝn(D G )) = n 1/2 n (ψ Yj (D G ) G(D G )) + o p(1) j=1 = n 1/2 n H 3kn + o p(1) j=1 Vn(G(D G )) = n 1/2 (Ĝn(D G ) G(D G )) + n 1/2 (Gn(D G ) Ĝn(D G )) = n 1/2 n (φ Yj (D G ) G(D G )) + op(1) j=1 = n 1/2 n H 4kn + o p(1) j=1 43

52 and n 1/2 [dh dd n dh dd ] = n 1/2 n j=1 ( I {j Add } dhdd) = n 1/2 n H 5kn, j=1 where H ikn are i.i.d. and depend on (X j, Y j ) only, for k = 1,..., 5. The A 1n + A 2n + A 3n + n A 4n + A 5n can be written as n 1/2 A 4jn, where A 4jn are i.i.d. sum of products of H ikn j=1 and some certain fixed number, and four negligible terms. Using the CLT one more time, we have that the sum of A-terms is asymptotically normally distributed with mean 0. Under the assumption (D4), B i converges to 0 as n, for i = 1,..., 4, in probability. We have shown that the R dd n is asymptotically normally distributed with mean Asymptotic normality of R n To show that n 1/2 Rn = n 1/2 (R cc n + R cd n + R dc n + R dd n ) is asymptotic normal, we need the following notations: n 1/2 (R cc n µ cc ) R cc N(0, var(m cc (X, Y ))), n 1/2 (R cd n µ cd ) R cd N(0, var(m cd (X, Y ))), Noting that n 1/2 (R dc n µ dc ) R dc N(0, var(m dc (X, Y ))), n 1/2 (R dd n µ dd ) R dd N(0, var(m dd (X, Y ))). n 1/2 (R cc n µ cc ) = n 1/2 (R cd n µ cd ) = n 1/2 (R dc n µ dc ) = n 1/2 (R dd n µ dd ) = n j=1 n j=1 n j=1 n j=1 n 1/2 A 1jn + a few negligible terms, n 1/2 A 2jn + a few negligible terms, n 1/2 A 3jn + a few negligible terms, n 1/2 A 4jn + a few negligible terms, 44

53 by the CLT, we have n 1/2 n (Rn µ) = n 1/2 (A 1jn + A 2jn + A 3jn + A 4jn ) + a few negligible terms, j=1 N(0, σ 2 ) where µ = (µ cc + µ cd + µ dc + µ dd ), and σ 2 = var[m cc (X, Y ) + M cd (X, Y ) + M dc (X, Y ) + M dd (X, Y )]. 3.5 Proof of the Main Result We finish this Chapter by providing the proof of Theorem Proof of Theorem As shown in Section 3.3, Bn β almost surely, and in Section 3.4 An is asymptotically normal with mean 0 and variance σ 2, as n. Using the Slutsky s Theorem, n 1/2 (ˆθn θ) = n 1/2 An/Bn is asymptotically normal with mean 0 and variance ϱ = σ 2 /β 2. In summary, we have shown the consistency and asymptotic normality of the semi-parametric estimator ˆθn. 45

54 CHAPTER 4 A VARIANCE ESTIMATOR OF BIVARIATE DISTRIBUTIONS In Chapter 3, we provided an explicit formula for ϱ = σ2 β 2. Suppose estimators ˆσ2 and ˆβ 2 could be found for σ 2 and β 2 respectively. A rough-and-ready estimator of the asymptotic variance of ϱ would then be given by ˆϱ = ˆσ2. If the variables given in (3.9) and (3.10) could be observed, one ˆβ 2 could simply estimate σ 2 and β by the respective sample variance and sample mean. As this is not possible, the corresponding pseudo-observations can be used instead, which are defined in term of Cn, the re-scaled empirical copula function of the bivariate sample, namely Cn(u, v) = 1 n I n {. Fn(X j=1 j ) u,gn(y j ) v} Let l θ,θ cc, lcd θ,θ, ldc θ,θ, and ldd θ,θ be the decomposed components of l θ,θ with respect to c θ in (2.7) under different situations. Similarly, one can define l θ cc, lcd θ, ldc θ, and ldd θ for l θ. From (3.9), we have the following: ˆβ = 1 n [ n j=1 ˆN cc j + ˆN cd j + 46 ˆN dc j + ˆN dd j ], (4.1)

55 j (X j, Y j ) (2.1,1.0) (2.5, 0.5) (3.2,1.0) (3.2,1.5) (3.2,2.0) (3.5,2.5) (3.7,2.5) R X(j) R Y (j) Table 4.1. Relationship among (X j, Y j ), R X(j), and R Y (j). where ˆN cc j ˆN cd j ˆN dc j ˆN dd j = I {j A cc} lcc θ,θ (ˆθn, Fn(X j ), Gn(Y j )) = I {j Acd } lcd θ,θ (ˆθn, Fn(X j ), Gn(Y j ), G n(y j )) = I {j Adc } ldc θ,θ (ˆθn, Fn(X j ), F n(x j ), Gn(Y j )) = I {j Add } ldd θ,θ (ˆθn, Fn(X j ), F n(x j ), Gn(Y j ), G n(y j )). To get ˆσ 2 for σ 2 in (3.11), we need more notation. Note that rearranging {X j, Y j }, j = 1,..., n, shall not change the value of ˆσ 2. Therefore, we assume the sample is in an order such that X j s are in non-decreasing order. Let R X(j) = i I {Xi X j } be the rank of X j in the sequence of X s (Similarly, we can define R Y (j) for Y s), and let R X(j) = i I {Xi <X j } be the next lower rank to R X(j) within the X s sequence. Similarly, we can define R Y (j) for R Y (j) within the Y s sequence. Under the current setting, for any i < j, the following always holds: R X(i) R X(j). 47

56 Table 4.1 gives a simple example showing the relationship among (X j, Y j ), R X(j), and R Y (j). As shown in the table, R X(3) = R X(4) = R X(5) = 5, and R X(5) = R X(4) = R X(3) = R X(2) = 2 by definition. Now we can work on the details. Note that in (3.10), σ 2 is the variance of the sum of M S (X, Y ), where S = {cc}, {cd}, {dc}, or{dd}. For a given sample (X j, Y j ), σ 2 can be estimated by the sample variance of the following items: ˆM cc j ˆM cd j ˆM dc j = l θ cc(ˆθn, Fn(X j ), Gn(Y j ))I {j A cc} + 1 n l n θ,u cc 2 k=j + ( ) R Y (k) R l cc R X(k) R Y (k) Y (j) θ,v ˆθn, 2 n + 1, n + 1 d G = l=1 + 1 d G n n l=1 k=j d G + l=1 + d G l=1 d F = h=1 + 1 d F n h=1 ( ) R X(k) R Y (k) ˆθn, n + 1, n + 1 I {k A cc} l cd θ (ˆθn, Fn(X j ), Gn(Y j ), G n(y j ))I {j Acd, Y j =D Gl } l cd θ,u 2 R Y (k) >R Y (j) l cd θ,v 1 R Y (k) R Y (j) l cd θ,v 2 R R X(k) ˆθn, n + 1, Y (k) R Y (k) n + 1, I n + 1 {k Acd, Y k =D Gl } R R X(k) ˆθn, n + 1, Y (k) R Y (k) n + 1, I n + 1 {k Acd, Y k =D Gl } R R X(k) ˆθn, n + 1, Y (k) R Y (k) n + 1, I n + 1 {k Acd, Y k =D Gl } l dc θ (ˆθn, Fn(X j ), F n(x j ), Gn(Y j ))I {j Adc, X j =D F h } l dc R X(k) R X(k) θ,v ˆθn, 2 n + 1, R Y (k) R Y (j) R Y (k) n + 1, n + 1 I {k Adc, X k =D F h } 48

57 ˆM dd j = d F + l dc R X(k) R X(k) R Y (k) θ,u ˆθn, 1 n + 1, n + 1, n + 1 h=1 R X(k) >R X(j) I {k Adc, X k =D F h } d F n + l θ,u dc ˆθn, 2 h=1 k=j d F d G h=1 d F d G + h=1 l=1 R X(k) n + 1, R X(k) R Y (k) n + 1, I n + 1 {k Adc, X k =D F h } l θ dd (ˆθn, Fn(X j ), F n(x j ), Gn(Y j ), G n(y j ))I {Xj =D F h, Y j =D Gl } l=1 I {Xk =D F h, Y k =D Gl } d F d G + h=1 l=1 n l θ,v dd 1 R Y (k) >R Y (j) n l θ,v dd 2 R Y (k) R Y (j) I {Xk =D F h, Y k =D Gl } d F d G + h=1 l=1 n l θ,u dd 1 R X(k) >R X(j) I {Xk =D F h, Y k =D Gl } ˆθn, R n + 1, R X(k) n + 1, R X(k) Y (k) R Y (k) n + 1, n + 1 R X(k) R R X(k) ˆθn, n + 1, n + 1, Y (k) R Y (k) n + 1, n + 1 R X(k) R R X(k) ˆθn, n + 1, n + 1, Y (k) R Y (k) n + 1, n + 1 d F d G n + l θ,u2 ˆθn, h=1 l=1 k=j I {Xk =D F h, Y k =D Gl } n hl n, R n + 1, R X(k) n + 1, R X(k) Y (k) R Y (k) n + 1, n + 1 where n hl = {the number of observations (X j, Y j ) : X j = D F h, Y j = D Gl }. Next, to establish the consistency of ˆϱn, it suffices to show that ˆσ 2 n and ˆβ 2 n are themselves consistent. To prove that ˆβn β converges almost surely, for example, express the difference as 1 n n Jˆθn (F n(x j ), F n(x j ), Gn(Y j ), G n(y j )) E[J θ (F (X ), F (X), G(Y ), G(Y ))] j=1 49

58 in terms of J θ = l θ,θ. By the triangle inequality, this quantity is no greater than 1 n n Jˆθn (F n(x j ), F n(x j ), Gn(Y j ), G n(y j )) j=1 J θ (Fn(X j ), F n(x j ), Gn(Y j ), G n(y j )) + 1 n J n θ (Fn(X j ), F n(x j ), Gn(Y j ), G n(y j )) E[J θ (F (X ), F (X), G(Y ), G(Y ))] j=1 Assuming that J θ is bounded by an integrable function in a neighborhood of the true value of θ, by the convergence of maximum likelihood estimators, the first summand then converges to zero by the Dominated Convergence Theorem. Since the second term vanishes asymptotically by a similar argument as used in Chapter 3, it follows that ˆβn β almost surely. The argument for ˆσn is similar, though somewhat more involved. It will not be presented here. In summary, we provided a variance estimator of ϱ, ˆϱ, and illustrated its consistency. 50

59 CHAPTER 5 EXTENSIONS TO HIGHER DIMENSIONS The previous developments extend more or less automatically to situations where it is desired to estimate a multidimensional dependence semi-parametrically. Let a boldfaced letter such as x denotes a D-tuple (D 2) vector of real numbers, that is, x (x 1,..., x D ) T where x k R for k = 1,..., D, and F 1,..., F D denote D univariate mixed marginals on the real line given by d k d F k (x k ) = p kh I {Dkh x k } + (1 k p kh ) f k (w)dw, w x h=1 h=1 k where D kh is the h-th jump point of F k with P (X k = D kh ) = p kh, for h = 1,..., d k, and f k ( ) is a continuous density function with support on the real line. When restricted to a particular copula family, say {C θ : θ = (θ 1,..., θq) T A R q }, a multivariate joint distribution function for (x 1,..., x D ) T can be defined as follows: F D (x 1,..., x D ) = C θ (F 1 (x 1 ),..., F D (x D )). (5.1) Result 1: F D (x 1,..., x D ) defined in (5.1) is a valid joint distribution function on R D. Proof: For F D (x 1,..., x D ) to be a valid distribution function on R D, the following four conditions should be satisfied: 51

60 (1) 0 F D (x 1,..., x D ) 1 for all (x 1,..., x D ), (2) F D (x 1,..., x D ) 0 as max(x 1,..., x D ), (3) F D (x 1,..., x D ) 1 as min(x 1,..., x D ) +, and (4) for every pair of (x 1,..., x D ) and (y 1,..., y D ) R D with x k y k for k = 1,..., D, which defines a D-box V = [x 1, y 1 ] [x 2, y 2 ]... [x d, y D ]. Then sgn(c)f (c) 0 where the sum is taken over all vertices c of V and sgn( ) is defined as in (1.3). Note that since F D (x 1,..., x D ) = C θ (F 1 (x 1 ),..., F D (x D )) = P (U 1 F 1 (x 1 ),..., U D F D (x D )), (1) follows. To prove (2), note that F k (x k ) 0 for all k = 1,..., D when max(x 1,..., x D ). Hence, F D (x 1,..., x D ) 0 from the property of a cumulative distribution function. (3) follows similarly. (4) is the direct result of (1.3) and (5.1). Proof completed. Define C(F k ) to be the collection of all points of continuity of F k and J (F k ) = {D k1,..., D k dk } to be the collection of all jump points of F k. For a fixed x R D, every component of x corresponds to either a point of continuity or discontinuity of the corresponding marginal distribution. Suppose the indexes c 1, c 2,..., c H {1, 2,..., D} corresponds to xc h C(Fc ) (i.e., h xc belongs to the set of continuity points of h Fc ) h and the remaining indexes, say d 1, d 2,..., d L, with H + L = D, are the indexes of marginals such that x dl = D dl,i dl, l = 1, 2,..., L. When {c 1, c 2,..., c H } = {1, 2,..., D}, the density of F D is given by df D D = c θ (F 1 (x 1 ),..., F D (x D )) f k (x k ), k=1 52

61 and if {c 1, c 2,..., c H } {1, 2,..., D} the density of F D is given by df D =... L l=1 [F d (D ),F l d l,i dl (D dl,i )] c θ (u 1,..., u D )du H d...du 1 dl fc h (xc h ), dl dl h=1 where uc h = Fc h (xc h ). Note that the above can be written as c H θ (F 1,...F D )(x) fc h (xc h ). h=1 Letting X j = (X 1j, X 2j,..., X Dj ) T where j = 1,..., n represent a random sample from F D, the semi-parametric estimator ˆθn of θ would then be obtained as a solution of the system 1 n n j=1 θ i log[df D (X 1j,..., X Dj )] = 0, (1 i q) (5.2) Since θ is the only vector of parameters of interest, the components of likelihood that matters is 1 n n j=1 log[c θ θ (F 1,...F D )(X 1j,..., X Dj )] = 0, (1 i q) (5.3) i Using the same techniques as in Chapter 3, Theorem can be extended to the D- dimensional case. Furthermore, the limiting variance-covariance matrix of n 1/2 (ˆθn θ) is then B 1 ΣB 1, where B is the information matrix associated with C θ. To give the explicit form of Σ, we need some notations: Let F p k (x k ) = F k (x k ) u k2, if x k C(F k ), (F k (x k ), F k (x k )) (u k1, u k2 ), if x k J (F k ). Note that in the present case, the relevant statistic in (5.3) can be written as, for 1 i q, R i,n = 1 n J n i (F 1 (X 1j ), F 1 (X 1j ),..., F D (X Dj ), F D (X Dj )) j=1 = 1 n J Q n i (F 1 p (X 1j ),..., F p D (X Dj )), Q j=1 53

62 where Q = {Q 1,..., Q D } is a D-sequence consisting of letters of {c} or {d}, with the following properties: (i) when Q is combined with a point x and the marginals F 1,..., F D, say x Q(F 1,..., F D ), a c at the k-th component of Q refers to x k belonging to C(F k ), and a d at the k-th component refers to x k belonging to J (F k ), for k = 1,..., D; (ii) when Q is combined with J i or J i s derivative with respect to u k1 or u k2, a c at the k-th component refers to F p k (X kj ) = F k (X kj ), and a d at the k-th component refers to F p k (X kj ) = (F k (X kj ), F k (X kj )), for k = 1,..., D. Using the same techniques as those used in Chapter 3, one can see that Σ is the variancecovariance matrix of the q-dimensional random vector whose i-th component is given by var( Q M Q i (X 1,..., X D )); the following is a general form of M Q i (X 1,..., X D )). Assuming Q = {c 1, c 2,..., c H } {d 1, d 2,..., d L }, and L and H can be any value among {0,..., D}, such that H + L = D. Then M Q i (x 1,..., x D ) =... J Q i (F 1 p (x 1 ),,..., F p D (x D )) H I {x c i d1 i h C(Fc )} h dl h=1 L D l=1 I {x dl =D dl i } + L ik. dl k=1 The set of jump points for F dl is {D dl 1,..., D d l d dl }, and L ik is given as below: (i) if the k-th component in Q = c; L ik = i d1... i dl {R\{ d c 1 n=1 D c 1 n}} J Q iu k2 (F p 1 (x 1 ),..., F p D (x D ))df (x 1,..., x D )... {R\{ d I(x c H k x n=1 D k ) c H n}} with x d l = D dl i dl ; 54

63 (ii) if the k-th component in Q = d; L ik =... i d1 i dl {R\{ d c 1 n=1 D c 1 n}}... {R\{ d I(x c H k < x n=1 D k ) c H n}} J Q iu (F p k1 1 (x 1 ),..., F p D (x D ))df (x 1,..., x D ) +... i d1 i {R\{ d c... 1 dl n=1 D c 1 n}} {R\{ d c I(x H k x n=1 D k ) c H n}} J Q iu k2 (F p 1 (x 1 ),..., F p D (x D ))df (x 1,..., x D ), with x d = D l dl i. dl As the parallel with the case q = 1 and D = 2 described earlier in Chapter 2 and Chapter 3, an estimator of the variance-covariance matrix of ˆθn could be found by repeating the procedure described in Chapter 4. 55

64 CHAPTER 6 JOINT DISTRIBUTIONS USING t-copula 6.1 Joint Distributions Using t-copula In this Chapter, we develop the joint distributions with the family of t-copulas, see, [34], and [8]. The reason we choose t-copulas is not only it is a generalization of the Gaussian copulas, but also it has the great tail dependence property. This property allows us to study the limiting association between random variables X and Y as both x and y go to their boundaries. Our finding is that the limiting association is governed by the correlation coefficient ρ together with the degrees of freedom ν, which is listed in Lemma In reality, there are situations where random variables are still associated in a certain level even in the tails. For instance, in biometric recognition, the genuine and imposter distributions generated from the same biometric trait or different biometric traits in a multimodal biometric system, tend to have not exact the same but similar tail behavior, as shown in Chapter 7, Section The t-copula models can fit in this case more appropriately than a copula family which does not have the tail dependency. Before defining the t-copula, we develop some notations for the presentation. Let Σ denote a positive definite matrix of dimension D D. For such a Σ, the D-dimensional t density with ν 56

65 degrees of freedom will be denoted by f D ν,σ (x) Γ( ν+d 2 ) (πν) D 2 Γ( ν 2 ) Σ 1 2 ( 1 + xt Σ 1 x ν ) ν+d 2, with corresponding cumulative distribution function t D ν,σ w x (x) fd ν,σ (w) dw. The matrix Σ with unit diagonal entries corresponds to a correlation matrix and will be denoted by R. The D-dimensional t-copula function is given by C ν,r (u) w t 1 ν (u) fd ν,r (w) dw (6.1) where t 1 ν (u) (t 1 ν (u 1 ),..., t 1 ν (u D )) T, t 1 ν is the inverse of the cumulative distribution function of univariate t with ν degrees of freedom, and R is a D D correlation matrix. Note that C ν,r (u) = P{X t 1 ν (u)}, for X = (X 1,..., X D ) T distributed as t D ν,r, demonstrating that C ν,r (u) is a distribution function on (0, 1) D with uniform marginals. The density corresponding to C ν,r (u) is given by c ν,r (u) = D C ν,r (u) u 1 u 2... u D = f ν,r D {t 1 ν (u)} Dk=1 fν{t 1 ν (u k )}, (6.2) where fν in (6.2) is the density of the univariate t distribution with ν degrees of freedom. We consider the joint distributions of the form F ν,r D (x) = C ν,r {F 1 (x 1 ),..., F D (x D )} (6.3) with C ν,r defined as in (6.1). It follows from the properties of a copula function that F ν,r D is a valid multivariate distribution function on R D. The identifiability of the marginal distributions F k, k = 1,..., D, the correlation matrix R, and the degrees of freedom parameter, ν, are established in the following theorem: 57

66 Theorem Let F D ν 1,R 1 and G D ν 2,R 2 denote two distribution functions on R D obtained from equation (6.3) with marginal distributions F k, k = 1,..., D and G k, k = 1,..., D, respectively. Suppose we have F D ν 1,R 1 (x) = G D ν 2,R 2 (x) for all x. Then, F k (x) = G k (x) for all k, ν 1 = ν 2 and R 1 = R 2. In order to prove identifiability of (ν, R) in Theorem 6.1.1, we first must state and prove several lemmas. Lemma Fix ν. Let F D ν,r 1 and G D ν,r 2 denote two cumulative distribution functions on R D obtained from equation (6.3) with marginal distributions F k, k = 1,..., D and G k, k = 1,..., D, respectively. Suppose we have for all x. Then, F k (x) = G k (x) for all k and R 1 = R 2. F D ν,r 1 (x) = G D ν,r 2 (x) (6.4) Proof: By taking x i, i k tending to, we get that F D ν,r 1 (x) F k (x k ) and G D ν,r 2 (x) G k (x k ). It follows that F k (x) = G k (x) for all k. Next we show that the correlation matrices R 1 and R 2 are equal. We first prove this result for D = 2. Note that when D = 2, R 1 and R 2 can be determined by one correlation parameter, namely, ρ 1 and ρ 2, respectively, so, we have Fν,ρ 2 (x) = 1 Cν,ρ 1 {F 1 (x 1 ), F 2 (x 2 )} w t 1 ν (v) f2 ν,ρ (w) dw (6.5) 1 and G 2 ν,ρ (x) = 2 Cν,ρ 2 {F 1 (x 1 ), F 2 (x 2 )} w t 1 ν (v) f2 ν,ρ (w) dw (6.6) 2 where v = (F 1 (x 1 ), F 2 (x 2 )) T, so, one only needs to prove ρ 1 = ρ 2. Now, for any real numbers a and b, the bivariate t-cumulative distribution function, t 2 ν,ρ(a, b), with ν degrees of freedom and correlation parameter ρ is a strictly increasing function of ρ. Note that t 2 ν,ρ(a, b) = = a b f2 ν,ρ(m, n)dm dn a b φ(m, n, ρ, w) gν(w) dw dm dn, 0 58

67 where { 1 φ(m, n, ρ, w) = 2πw(1 ρ 2 exp m2 2ρmn + n 2 } ) 1/2 2w(1 ρ 2 ) and gν(w) is the probability density function of the inverse Gamma distribution defined as Differentiating t 2 ν,ρ(a, b) with respect to ρ, we get t 2 ν,ρ(a, b) ρ = = = gν(w 2 ) IG( ν 2, 2 ). (6.7) ν a b φ(m, n, ρ, w) gν(w) dw dm dn 0 ρ a b w d2 φ(m, n, ρ, w) gν(w) dw dm dn 0 dm dn w φ(m, n, ρ, w) gν(w) dw > 0, 0 using the fact that φ(m, n, ρ, w)/dρ = w 2 φ(m, n, ρ, w)/ m n. t 1 ν Note that (6.5) and (6.6) can be re-written as F 2 ν,ρ j (x) = t 2 ν,ρ j (a, b) for j = 1, 2 with a = (F 1 (x 1 )) and b = t 1 ν (F 2 (x 2 )). So, Fν,ρ 2 (x) = F 2 1 ν,ρ (x), which gives 2 t 2 ν,ρ 1 (a, b) = t 2 ν,ρ 2 (a, b). Since when ν is fixed, for any a and b, t 2 ν,ρ j (a, b) is an strictly increasing function of ρ, the equality above implies that ρ 1 = ρ 2 must hold. The proof is completed. Now, we prove Lemma for general D. Let R 1 = ((ρ kk,1 )) and R 2 = ((ρ kk,2 )) be the representation of the correlation matrices R 1 and R 2 in terms of their entries. Note that the condition (6.4) can be reduced to the case D = 2 by taking x i, i k, k tending to infinity. Thus, we have F 2 ν,ρ kk,1 (x k, x k ) = G 2 ν,ρ kk,2 (x k, x k ), but this implies ρ kk,1 = ρ kk,2 from the case D = 2. Next, we state and prove two more relevant lemmas: Lemma For a > 0, tν{ (aν) 1/2 } is a decreasing function of ν. Proof: Let Y tν( ). Then Z = Y/ν 1/2 has pdf fν(z) = [Γ{(ν + 1)/2}/{Γ(ν/2)}] ( 1 + z 2) (ν+1)/2 π 1/2. One can see that if ν1 ν 2, {fν 2 ( z )/fν 1 ( z )} is a strictly decreasing function of z. Since I { Z a 1/2 } is a nondecreasing function of a, mimicking the 59

68 proof of Lemma 2 in [28], one can prove that Eν(I { Z a 1/2 } ) is a decreasing function of ν. It follows that tν{ (aν) 1/2 } = P{Y (aν) 1/2 } = P{Z a 1/2 } = Eν(I { Z a 1/2 } ) 2 is decreasing in ν. Lemma For the bivariate t-copula Cν,ρ(u, v), let λ = lim v 0 + Then, it follows that { } C ν,ρ(u, v) v u=v λ { = t ν+1 [ (ν + 1) 1 ρ } ] 1/2 1 + ρ and µ = lim v 0 + { } C ν,ρ(u, v) v u=1 v. and µ { = t ν+1 [ (ν + 1) 1 + ρ } ] 1/2. 1 ρ Proof: It is not hard to see that if (X, Y ) T t 2 ν,ρ, then given Y = y, Therefore ( ) ν + 1 1/2 X ρy ν + y 2 (1 ρ 2 ) 1/2 t ν+1. (6.8) Cν,ρ(u, v) = P(X t 1 ν (u), Y t 1 ν (v)) t 1 [ ν (v) ( ) ν + 1 1/2 {t 1 }] = t ν+1 ν + y 2 ν (u) ρy (1 ρ 2 ) 1/2 fν(y) dy. Taking derivative with respect to v, we get { } 1/2 { Cν,ρ(u, v) = t ν + 1 t 1 v ν+1 ν + t 1 ν (u) ρt 1 } ν (v) ν (v) 2 (1 ρ 2 ) 1/2 { } 1/2 { = t ν + 1 t 1 ν+1 ν + t 1 ν (v) ρt 1 } ν (v) ν (v) 2 (1 ρ 2 ) 1/2, putting u = v { } 1/2 { = t ν + 1 t 1 } ν+1 ν + t 1 ν (v)(1 ρ) ν (v) 2 (1 ρ 2 ) 1/2 { t ν+1 [ (ν + 1) 1 ρ } ] 1/2 1 + ρ 60

69 as v 0+. The other expression can be derived similarly. Remark: Lemma tells the property of tail dependence in t-copulas. Proof of Theorem Using similar arguments as before, we only need to prove Theorem for D = 2. It easily follows by taking x i for i k, that F k (x k ) = G k (x k ) for k = 1, 2. It follows that Cν 1,ρ 1 (u, v) = Cν 2,ρ 2 (u, v) (6.9) for all pairs of (u, v) of the form (F 1 (x 1 ), F 2 (x 2 )). Thus, (6.9) holds for all values of (u, v) in (0, 1) 2 as in the case when both marginals are continuous. Nevertheless, since both marginals have only a finite number of discontinuities, there are infinite values of (u, v) for which the limits and derivatives in Lemma can be applied to obtain λ and µ. Since (6.9) holds, we must have λ 1 = λ 2 and µ 1 = µ 2. Now without loss of generality, assume ρ 1 ρ 2 and ρ 1 ρ 2, from the equality λ 1 = λ 2 and Lemma 6.1.2, we must have ν 1 ν 2. On the other hand, the equality µ 1 = µ 2 gives ν 1 ν 2. Hence, ρ 1 ρ 2 implies that ν 1 = ν 2 = ν, say. Now, using Lemma 6.1.1, we get ρ 1 = ρ 2. The proof of theorem is completed. Remark: Note that as the degrees of freedom ν, the t-distribution converges asymptotically to a Gaussian distribution, so does the t-copula. For the Gaussian copulas, the counterpart of Theorem also holds. Therefore, we have the following lemma: Lemma Let F R D and G D 1 R denote two distribution functions on R D in the Gaussian 2 copula family with marginal distributions F k, k = 1,..., D and G k, k = 1,..., D, respectively. Suppose we have F R D (x) = G D 1 R (x) for all x. Then, F 2 k (x) = G k (x) for all k, and R 1 = R 2. Proof: Without loss of generality, we prove Lemma for the case that D = 2, i.e., ρ 1 = ρ 2. Notice that it suffices to prove that for any fixed (a, b) R 2, Pρ(Z 1 a, Z 2 b) = Φ 2,ρ (a, b) is increasing in ρ, where Z i N(0, 1), i = 1, 2, with correlation coefficient ρ. 61

70 Note that which gives us Φ 2,ρ (a, b) = ρ = = a b Φ 2,ρ (a, b) = φ ρ(z 1, z 2 )dz 1 dz 2, a { b φ ρ(z 1, z 2 )dz 1 dz 2 a b [ 1 2π (1 ρ 2 ) exp a z2 1 2ρz 1 z 2 +z2 2 2(1 ρ 2 ) b } ρ 1 ρ 2 + (z 1 ρ z 2 )(z 2 ρ z 1 ) (1 ρ 2 ) 2 dz 1 dz 2 2 φρ(z 1, z 2 ) z 1 z 2 dz 1 dz 2 ] = φρ(a, b) which is always positive. Therefore, we have proved the uniqueness of ρ. The proof of Lemma is completed. Remark: It is not hard to see that for the Gaussian copulas, the tail dependence between random variables no longer exists. Using notations defined in Chapter 5, the density of F D ν,r at a fixed point x RD, df D ν,r (x), is a function on R D that satisfies F ν,r D w x (x) = df ν,r D (w), (6.10) where df ν,r D (x) is given by... L l=1 [F d (D ),F l d l,i dl (D dl,i )] c ν,r (u 1,..., u D )du d,..., du 1 dl dl dl H fc h (xc ), h h=1 where uc h = Fc h (xc h ), and H and L could be any value among {0, 1,..., D} such that H + L = D. Note that the above density can be generally written as c H (ν, R, F 1,..., F D )(x) fc h (xc ). (6.11) h h=1 62

71 6.2 Estimation of R and ν Estimation of R for fixed ν Let X 1,..., Xn be n independent and identically distributed D-dimensional random vectors arising from the joint distribution F R,ν D in (6.3), ˆF kn denote the empirical distribution function of F k and F kn = n ˆF kn /(n + 1). From (6.11), the log-likelihood function corresponding to the n i.i.d. observations X j, j = 1,..., n, is given by τ(ν, R) = = n log df ν,r D (X j ) j=1 n log c (ν, R, F 1,..., F D )(X j ) j=1 (6.12) For fixed ν, our estimator of R is taken to be the maximizer of ˆτ(ν, R), that is, ˆR(ν) = arg max R ˆτ(ν, R). (6.13) Note that there are two main challenges in maximizing the likelihood ˆτ(ν, R): Firstly, ˆτ(ν, R) involves several integrals corresponding to discrete components in X j, j = 1,..., n; c is not available in a closed form. Secondly, ˆτ(ν, R) needs to be maximized over all D D correlation matrices. The space of all correlation matrices is a compact space of [ 1, 1] D2 since all entries of a correlation matrix lie between 1 and 1. However, this space is not easy to maximize over due to the constraint of positive definiteness placed on correlation matrices. The EM Algorithm The difficulties mentioned can be overcome with the use of the EM algorithm; see, for example, [31]. The EM algorithm is a well-known algorithm used to find the maximum likelihood estimator (MLE) of a parameter θ based on data y distributed according to the likelihood l obs ( y; θ). In 63

72 many situations, obtaining the MLE, ˆθ MLE = arg max θ l obs ( y; θ), via maximization of l obs ( y; θ) over θ turns out to be difficult. In such cases, the observed likelihood can usually be expressed in terms of an integral over missing components of a complete likelihood; in other words, if z and lcom( y, z; θ), respectively, denote the missing observations and the complete likelihood corresponding to (y, z), it follows that l obs ( y; θ) = Z(y) l com( y, z; θ) dz where Z(y) is the range of integration of z subject to the observed data being y. The EM algorithm is an iterative procedure that can be formulated in two steps: First, (i) the E-step, where the quantity Q(θ (N), θ) = E π( z θ (N),y) {log lcom( y, z; θ)} (6.14) is formed; the expectation in (6.14) is taken with respect to the conditional distribution of Z given y when evaluated at a particular value, θ (N), and the conditional distribution of Z given y is given by π( z θ, y) = l com( y, z; θ). (6.15) l obs (y; θ) Second, (ii) the M-step, where Q(θ (N), θ) is maximized with respect to θ to obtain θ (N+1), that is, θ (N+1) = arg max θ Q(θ (N), θ). Starting from an initial value θ (0), the sequence θ (N), N 0 converges to ˆθ MLE under suitable regularity conditions. The integrals in ˆτ(ν, R) corresponding to the discrete components can be formulated as missing components of a complete likelihood. Recall that the notations c h and d l were used to denote the discrete and continuous components in the vector x. We now extend this notation to represent discrete and continuous components in the j-th observation vector X j, j = 1,..., n. For j = 64

73 1,..., n, let d lj, l = 1,..., L j and c hj, h = 1,..., H j, respectively, denote the discrete and continuous components of X j with respect to the corresponding marginal distributions. Next, define the vector u j in the following way: The d lj -th component of u j, u dlj, is a number that is allowed to vary between [F dlj (x d lj ), F dlj (x dlj )], that is, u lj [F dlj (x d lj ), F dlj (x dlj )] S lj, say, for l = 1,..., L j. The c hj -th component of u j, u c hj, is taken to be uc hj = Fc hj (xc hj ) for h = 1,..., H j. In that case, we have c (ν, R, F 1,..., F D )(X j ) =... S 1j S L j c D ν,r (u j ) du d 1j... du dl j Making the transformation z j = t 1 ν (u j ), the likelihood corresponding to the j-th observation in (6.12) can be written as c (ν, R, F 1,..., F D )(X j ) = NUM j /DENOM j, where NUM j = t 1 ν (S 1j )... t 1 ν (S L ) fd ν,r (z j ) dz d... dz 1j dl j j and H j DENOM j = fν(zc hj ), h=1 where zc = t 1 hj ν {Fc hj (xc )} and t 1 hj ν (S lj ) = [t 1 ν {F dlj (x )}, t d 1 ν {F dlj (x dlj )}]. We lj make several important observations. First, where the maximization of R is concerned for fixed ν, it is enough to consider the NUM j terms. Thus, in the EM framework, we define NUM j to be the observed likelihood corresponding to the x j : l obs (x j ; ν, R) = t 1 ν (S 1j )... t 1 ν (S L ) fd ν,r (z j ) dz d... dz 1j dl. (6.16) j j If the variables z dlj j = 1,..., L j in (6.16) are treated as missing, the complete likelihood corresponding to the j-th observation becomes L lcom(x j, Z j ; ν, R) = f ν,r D (z j ) j I {zdlj t 1 l=1 ν (S lj )}, 65

74 where Z j = { z dlj, j = 1,..., L j }. Next, we note that the t density, fd ν,r, is an infinite scale mixture of Gaussian densities, namely, where f D ν,r (z j ) = φ D R,σ j (z) = σ 2 j =0 φd R,σ j (z j ) gν(σ 2 j ) dσ2 j 1 (2π) D 2 σ D j R 1 2 exp zt R 1 z 2σ 2 j and the mixing distribution on σ j is the inverse Gamma distribution as defined in (6.7). In other words, an extra missing component can be added into the E-step, namely, the mixing parameter, σ j. The complete likelihood specification for the j-th observation now becomes L lcom(x j, Z j, σ j ; ν, R) = φ D R,σ (z j j ) gν(σ j 2 ) j I {zdlj t 1 l=1 ν (S lj )}. (6.17) The conditional distribution π required to obtain Q in (6.14) is defined as where n π(z j, σ j, j = 1,..., n; ν, R ) = π j (Z j, σ j ; ν, R ) j=1 π j (Z j, σ j ; ν, R ) = l com(x j, Z j, σ j ; ν, R ) ; (6.18) l obs (x j ; ν, R ) see (6.15). Two distributions derived from (6.18) will be used subsequently: (i) the conditional distribution of σ j given Z j, given by π j (σ j 2 Z ν + D j, ν, R) = IG 2, ν + zt j R 1 z j 2 and (ii) the marginal of Z j s (after integrating out σ j ), given by 1, f ν,r D (z j ) L j l=1 I {z dlj t 1 ν (t 1 ν (S lj )} π j (Z j ; ν, R) =. t 1 ν (S 1j ) t 1 ν (S L ) fd ν,r (z j ) dz d dz 1j dl j j 66

75 The E-step entails that the expected value of the logarithm of the complete likelihood (6.17) is taken with respect to the missing components, Z j, and the N-th iterate of R, R (N) : E π(zj,σ j ; ν,r (N) ) {logl com(x j, Z j, σ j ; ν, R ) } E j {loglcom(x j, Z j, σ j ; ν, R )}, where π j (Z j, σ j ; ν, R ) is as defined in (6.18). The expected value can be simplified to z T j E j R 1 z j 2σ j 2 1 log R (6.19) 2 plus other terms that do not involve the correlation matrix R, and hence are irrelevant for the subsequent M-step. The expectation in (6.19) is taken in two steps, namely, E j = E j1 E j2 where E j2 is the conditional expectation of σ j given Z j, and E j1 is the expectation with respect to the marginal of Z j. On taking E j2, the expression in (6.19) simplifies to ν + D 2 where W (N) = ((w (N) kk z j T R 1 z j E j1 ν + z j T (R(N) ) 1 z j = ν + D 2 ( tr R 1 W (N)), )) is a D D matrix with entries z kj z k j ν + z j T (R(N) ) 1 z π j (Z j ; ν, R(N) ) dz j j w (N) kk = for z j = (z 1j,..., z Dj ) T. The last integral is approximated by numerical integration on a grid. We partition each interval t 1 ν (S lj ), l = 1,..., L j into a large number of subintervals and evaluate the integrand on the partition points. Finally, a Reimann sum is obtained as an approximation to the integral. The M-step consists of maximizing the complete likelihood function (6.19), z T j E j R 1 z j 2σ j D log R = ν tr(r 1 W (N) ) 1 log R, (6.20) with respect to the correlation matrix R. Since we have to maximize the objective function in the space of all correlation matrices, this is somewhat a difficult task. We adopt the methodology 67

76 presented by Barnard et al. [1] where an iterative procedure to maximize (6.20) is developed by considering the maximization of one element of R, say ρ, each time. In order to preserve the positive definiteness of R, one can show (see Barnard et al. [1] for details) that ρ should lie in an interval [ρ l, ρu]. The lower and upper limits of this interval are derived from the fact that in order for R to be positive definite, it is both necessary and sufficient that all the principal submatrices of R are positive definite. This is equivalent to a non-negativity condition on the determinant of each principal submatrix, which translates to an interval for the range of values of ρ. This procedure is repeated for the other elements of R and cycled until convergence before going to the (N + 1)-st iteration of the EM algorithm Selection of the Degrees of Freedom, ν The above procedure is carried out for a collection of degrees of freedom ν A, where A is finite. For each fixed ν, we obtain the estimator of the correlation matrix ˆR(ν) based on the EM algorithm above. We select the degrees of freedom in the following way: Select ˆν such that ˆν = arg max ν A 1 n ˆτ{ν, ˆR(ν)}. (6.21) 6.3 Summary In this Chapter, we introduced the t-copula family in R D, and showed that for any joint distribution F D in R D, there exists a unique pair (ν, R) such that F ν,r D (x) = C ν,r (F 1 (x 1 ),..., F D (x D )) within this family. Furthermore, we developed a semi-parametric technique to estimate the unknown pair (ν, R) using EM algorithm. The application of this approach will be presented in the next chapter. 68

77 CHAPTER 7 SIMULATION AND REAL DATA RESULTS 7.1 Simulation Results The results in this section are based on simulated data of four experiments. The first three are for the bivariate case, whereas the fourth is for a trivariate distribution. The true distributions for the observations X j, j = 1,..., n, in all experiments are of the type as defined in (6.3). In experiment 1, ν 0 = in (6.3) corresponding to the Gaussian copula function. We choose ρ = The two mixed marginals, F 1 and F 2, have the following cumulative distribution Sample size, n Average Absolute Relative Average Coverage Bias Absolute Bias % 92.4 % % 91.6 % % 92.3 % % 93.2 % % 94.7 % Table 7.1. Simulation results for Experiment 1 with ν 0 = and ρ = The absolute bias and relative absolute bias of the estimator ˆρn are provided, together with the empirical coverage of the approximate 95% confidence interval for ρ based on the asymptotic normality of ˆρn. 69

78 functions: F 1 (x 1 ) = 0.3I {x1 0.2} + 0.7φ(x 1 ), and F 2 (x 2 ) = 0.2I {x2 0.1} + 0.8φ(x 2 ), where φ( ) is the standard normal density function. Thus, we have d 1 = d 2 = 1 with D 11 = 0.2, and D 21 = 0.1 with probabilities p 11 = 0.3, and p 21 = 0.2, respectively. The set of jump points are J (F 1 ) = {0.2} and J (F 2 ) = {0.1} with continuous components of F 1 and F 2 corresponding to f 1 (x) = f 2 (x) = φ(x). The sample size n is taken from n = 500 to n = 2500 in increments of 500. In each trial, we simulate n observations from F, (x 1, x 2 ) and estimate ρ within the Gaussian copulas using the methodology presented in Section 6.2. Table 7.1 contains the simulation result from Experiment 1. For each sample size, the experiment was repeated 1,000 times. The average absolute bias and relative average absolute bias of the estimator ˆρn are reported, together with the empirical coverage of the approximate 95% confidence interval for ρ based on the asymptotic normality of ˆρn. Experiment 2 consists of the following choices: ν 0 = in (6.3) corresponding to the Gaussian copula function. We choose ρ 0 = The two mixed marginals, F 1 and F 2, have the following cumulative distribution functions: F 1 (x 1 ) = 0.25I {x1 0.2} + 0.6t 10 (x 1 ) I {x 1 0.7}, and F 2 (x 2 ) = 0.2I {x2 0.3} + 0.5t 10 (x 2 ) + 0.3I {x 2 0.6}. Thus, we have d 1 = d 2 = 2 with D 11 = 0.2, D 12 = 0.7, D 21 = 0.3 and D 22 = 0.6 with probabilities p 11 = 0.25, p 12 = 0.15, p 21 = 0.2 and p 22 = 0.3, respectively. The set of jump points are J (F 1 ) = {0.2, 0.7} and J (F 2 ) = {0.3, 0.6} with continuous components of F 1 and F 2 corresponding to f 1 (x) = f 2 (x) = t 10 (x). 70

79 df =3 df=5 df=10 df=15 df=20 df=25 normal Figure 7.1. Density curves for t distribution with degrees of freedom ν = 3, 5, 10, 15, 20, 25 and normal distribution. For interpretation of the references to color in this and all other figures, the reader is referred to the electronic version of this dissertation. For now, we select the set for the degrees of freedom of the t-copula to be A = {3, 5, 10, } for illustrative purposes. Note that we choose ν 0 values in A so that the corresponding t-distributions are significantly different from one another. Figure 7.1 gives the t-densities for several ν 0 values, including values in A. There exist significant gaps between the t-density curves corresponding to ν 0 = {3, 5, 10}, but relatively smaller gaps for ν 0 = {15, 20, 25, }. Table 7.2 provides the L 1 - distances between some t- distributions in Figure 7.1. The sample size n is taken from n = 500 to n = 2500 in increments of 500. In each trial, we 71

80 ρ 0 ν ρ 0 = ρ 0 = Table 7.2. L 1 distances for paired values of ν 0 corresponding to two values of ρ 0, 0.20 and Sample size, n Percentage of times Mean ˆρ(ˆν) MSE(ˆρ(ˆν)) ˆν = ν % % % % % Table 7.3. Simulation results for Experiment 2 with ν 0 = and ρ 0 = simulate n observations from F, (x 1, x 2 ) and estimate ρ 0 and ν 0 based on the methodology presented in Section 6.2. The trial is repeated 50 times. The simulation results, including percentage of times (out of 50) that the true value of ν 0, mean of ˆρ(ˆν), and the MSE of ˆρ(ˆν) is chosen, are presented in Table 7.3. In Experiment 3, we took ν 0 = 10. The two generalized marginal distributions are the same as in Experiment 2. The correlation parameter ρ 0 were selected to be 0.20 and 0.75, respectively. The results are presented in Table 7.4. From the entries of Table 7.3 and Table 7.4, we see that the estimation procedure is more effective in selecting the true degrees of freedom when ν 0 = compared to ν 0 = 10. The reason of being that is the distribution corresponding to ν 0 = is further away from all the other candidate distributions in A. Also, the estimation procedure is less effected by the value of ρ 0 as illustrated by the percentage of times ˆν = ν 0 column in Table

81 ρ 0 Sample size, n Percentage of times Mean ˆρ(ˆν) MSE(ˆρ(ˆν)) ˆν = ν % % ρ 0 = % % % % % ρ 0 = % % % Table 7.4. Simulation results for Experiment 3 with ν 0 = 10. The two correlation values considered are ρ 0 = 0.2 and ρ 0 = sample percentage mean mean mean total size of getting ρˆ 1 (ˆν) ρˆ 2 (ˆν) ρˆ 3 (ˆν) MSE true ν % % % % % Table 7.5. Simulation results for Experiment 4. 73

82 In Experiment 4, we took D = 3 and ν 0 = 10. The first two marginal distributions are the same as before. The third marginal distribution is taken to be the t-distribution with 10 degrees of freedom (thus, having no points of discontinuity). We took the correlation matrix R 0 as R 0 = = ρ 1 ρ 2 ρ 1 1 ρ 3, say. (7.1) ρ 2 ρ For different sample sizes, the experiment were repeated 20 times (instead of 50) to reduce computational time. The estimators of ρ i, ˆρ i, i = 1, 2, 3, were obtained based on the iterative procedure outlined in Section 6.2. Since the maximization step involves another loop within the M-step, the objective function was maximized over ρ-intervals in steps of 0.01 to reduce computational time. The iterative procedure within the M-step was not required when D = 2 which enabled us to maximize the objective function over a finer grid (steps of ). The results are given in Table 7.5; note that (i) the estimators converge and (ii) the MSE reduces as n tends to infinity. 7.2 Application to Multimodal Fusion in Biometric Recognition Introduction Biometric recognition refers to the automatic identification of individuals based on their biological or behavioral characteristics [20]. In recent years, recognition of an individual based on his/her biometric trait has become an increasingly important method for testing you are who you claim you are, see, for example, [18] and [29]. Biometric recognition is more reliable compared to traditional approaches, such as password-based or token-based approaches, as biometric traits cannot be easily stolen or forgotten. Some examples of biometric traits include fingerprint, face, signature, voice and hand geometry (See Figure 7.2). A number of commercial recognition systems based on 74

(a) (b) (c) (d) (e) (f) (g) (h) Figure 7.2.

scan, (d) signature, (e) voice, (f) hand geometry, (g) retina, and (h) ear.

Biometric technology has now become a viable alternative to traditional

biometric passport which is capable of storing biometric information of the

With increasing applications involving human-computer interactions, there is a

Recognition of an individual can be viewed as a test of statistical hypothesis.

test H 0 : I t = Ic vs. H 1 : I t Ic, (7.

83 (a) (b) (c) (d) (e) (f) (g) (h) Figure 7.2. Some examples of biometric traits: (a) fingerprint, (b) iris scan, (c) face scan, (d) signature, (e) voice, (f) hand geometry, (g) retina, and (h) ear. these traits have been deployed and are currently in use. Biometric technology has now become a viable alternative to traditional government applications (e.g., US-VISIT program [48] and the proposed biometric passport which is capable of storing biometric information of the owner in a chip inside the passport). With increasing applications involving human-computer interactions, there is a growing need for recognition techniques that are reliable and secure. Recognition of an individual can be viewed as a test of statistical hypothesis. Based on the biometric input Q and a claimed identity Ic, we would like to test H 0 : I t = Ic vs. H 1 : I t Ic, (7.2) where I t is the true identity of the user. The testing in (7.2) is performed by a matcher which computes a similarity measure, S(Q, T ), based on Q and T ; large (respectively, small) values of S indicate that T and Q are close to (far 75

Semi-parametric Distributional Models for Biometric Fusion

Semi-parametric Distributional Models for Biometric Fusion Wenmei Huang and Sarat C. Dass Department of Statistics and Probability, Michigan State University, East Lansing, MI, 48824, USA Abstract The