Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas

Size: px

Start display at page:

Download "Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas"

Maryann Thomas
5 years ago
Views:

1 Adaptive estimation of the copula correlation matrix for semiparametric elliptical copulas Department of Mathematics Department of Statistical Science Cornell University London, January 7, 2016

2 Joint work with Yue Zhao Supported by NSF Grant DMS

3 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

4 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

5 Copula Let the continuous random vector X = (X 1,..., X d ) R d have marginal distribution functions F j (x) = P { X j x }. The copula function C(u 1,..., u d ) is the joint cumulative distribution function of the transformation U = (F 1 (X 1 ),..., F d (X d ) ), i.e., for u = (u 1,..., u d ) [0, 1] d, C(u 1,..., u d ) = P {F 1 (X 1 ) u 1,..., F d (X d ) u d }.

6 Copula The copula C always describes a distribution on the unit square. Copula is invariant under strictly increasing transformations T j of the marginals X j allows separate specifications of the marginals and the dependence structure. In this talk we mostly study semiparametric copula models. The dependence structure is given by a parametric copula, while the marginals are left unspecified (nonparametric).

7 Elliptical distribution A random vector X R d has an elliptical distribution if its characteristic function E(exp(it X)) can be written as e it µ Ψ(t Σt) with parameters µ R d, covariance matrix Σ R d d, and characteristic generator Ψ : [0, ) R. For instance, when Ψ(t) = e t/2, X N(µ, Σ).

8 Semiparametric elliptical distributions and the copula correlation matrix Elliptical copulas are the copulas of elliptical distributions. Elliptical copula is characterized by copula parameters: Ψ and the copula correlation matrix Σ, with Σ ij = Σ ij Σ ii Σ jj.

9 Semiparametric Gaussian graphical copula model Let X N(0, Σ), and Ω = Σ 1 be the precision matrix. If [Ω] kl = 0, then X k X l X \{k,l}. Accurate estimator of Ω can be obtained by feeding the sample covariance matrix Σ into the graphical lasso, CLIME, graphical Dantzig selector, etc. Assume instead that we merely have f (X) N(0, Σ), for some strictly increasing, unknown function f (i.e., X has a Gaussian copula with copula correlation matrix Σ), and again we let Ω = Σ 1. Now, if [Ω] kl = 0, then f (X k ) f (X l ) f (X) \{k,l} = X k X l X \{k,l}.

10 Semiparametric Gaussian graphical copula model Using a rank-based estimator Σ rank-based (using Kendall s tau or Spearman s rho) of Σ, Liu et al 2012, Xue & Zou 2012 recovers the same optimal parametric rate (obtained under the Gaussian model) under the semiparametric model. This crucially depends on estimating Σ rank-based Σ max at the rate of log(d)/n (w.h.p.), which is in turn based on Hoeffding s classical bound for scalar U-statistic.

11 Kendall s tau Let X R d be an independent copy of X. Kendall s tau between the kth and lth coordinates is [ ( τ kl = E sgn (X k X k )(X l X )] l ) Let T be the matrix of Kendall s tau: [T ] kl = τ kl.

12 Kendall s tau Kendall s tau statistic is, for samples X 1,..., X n that are independent copies of X, τ kl = 2 n(n 1) 1 i<j d ) sgn ((X k i X j k )(X l i X j l ). Let T be the matrix of Kendall s tau statistics: [ T ] kl = τ kl.

13 Kendall s tau matrix as plug-in estimator For elliptical copulas, independent of the characteristic generator, ( π ) Σ = sin 2 T with the sine function acting component-wise (Kruskal 1958, Kendall & Gibbons 1990, Fang, Fang & Kotz 2002). Hence, the natural plug-in estimator for Σ is ( π Σ = sin 2 T ).

14 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

15 For the rest of the talk, we assume X R d follows a semiparametric elliptical distribution, The copula correlation matrix of X is Σ R d d, The plug-in estimator Σ = Σ n of Σ is constructed from n independent copies of X, (e.g., Demarta & McNeil 2004, Klüppelberg & Kuhn 2009)

16 Some recent advanced in elliptical/gaussian copulas Klüppelberg & Kuhn 2009 study the property of Σ in the asymptotic regime, i.e., d fixed, n. Liu et al 2012, Xue & Zou 2012 study estimation of the precision matrix Σ 1 of Gaussian copulas in high dimensions. Sharp bound on Σ Σ max is a key step of the study.

17 Proposed Research: Part I First, we establish a sharp bound of the operator norm Σ Σ 2. (See also Han & Liu 2013.) It has often been observed (e.g., Demarta & McNeil 2004, Klüppelberg & Kuhn 2009) that the plug-in estimator Σ is not always positive semidefinite. A bound on Σ Σ 2 quantifies the extent to which the non-positive semidefinite problem may happen. If Σ is not positive semidefinite, we can project it onto the set of correlation matrices in some specific way to obtain Σ +. Furthermore, Σ + Σ 2 2 Σ Σ 2.

18 Proposed Research: Part II Later, we study a factor model for Σ. The factor model assumes that Σ = Θ + V for a low-rank (or nearly low-rank) matrix Θ and a diagonal matrix V. X and [L, V 1/2 ]Y have the same copula for some elliptical distribution with Σ = I r+d. Here Θ = LL for some L R d r. Within this context, the effective number of parameters is O(r d), for r = rank(θ), instead of O(d 2 ) under the generic model. We propose a refined estimator Σ for Σ using a penalized least square procedure. The bound on Σ Σ 2 serves to scale the penalty term.

19 Bounding T T 2 Bounding Σ Σ 2 Outline for bounding Σ Σ 2 Bounding T T 2 (matrix counterpart of Hoeffding 1963 classical bound for scalar U-statistic). Bounding Σ Σ 2 in terms of T T 2 (matrix counterpart of the Lipschitz property).

20 Bounding T T 2 Bounding Σ Σ 2 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

21 Bounding T T 2 Bounding Σ Σ 2 Matrix U-statistic Theorem With probability at least 1 α, we have with { T T 2 < max T 2 f n,d,α, fn,d,α 2 T 2 fn,d,α 2 + f n,d,α 2 /4 + f n,d,α 2 { } /2 < max T 2 f n,d,α, fn,d,α 2 + fn,d,α 2 16 f n,d,α = 3 d log(2α 1 d). n The above bounds hold for all n, d, α. }

22 Bounding T T 2 Bounding Σ Σ 2 Matrix U-statistic: proof We need a Bernstein-type inequality, { e.g., } Tropp (2012), to bound the tail probability P T T 2 t. That theorem, as many other results on bounding the tail probability of the operator norm of a sum of random matrices, requires that the summands be independent. The U-statistics T does not satisfy this condition. Many results on tail bound rely on the Laplace transform technique to convert the tail probability into an expectation of some convex function of T T. A technique pointed out by Hoeffding (1963) allows us to convert the problem of bounding T T 2 into a problem involving a sum of independent random matrices.

23 Bounding T T 2 Bounding Σ Σ 2 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

24 Bounding T T 2 Bounding Σ Σ 2 Matrix Lipschitz property Theorem We have, with probability at least 1 α, that Σ Σ 2 π T T 2 + 3π2 16 f 2 n,d,α. Combining earlier bound for T T 2, we have, with probability at least 1 2α, that Σ Σ 2 < π max { T 2 f n,d,α, f 2 n,d,α } + 3π2 16 f 2 n,d,α. In addition, 2 π Σ 2 T 2 Σ 2.

25 Bounding T T 2 Bounding Σ Σ 2 Potential improvement We can obtain from the corollary that, for n d, Σ 2 d log(d) E Σ Σ 2. n In contrast, for Gaussian X and Σ its sample covariance matrix, Koltchinskii & Lounici 2015 shows that, for n d, Σ 2 d E Σ Σ 2. n

26 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

27 A factor model for Σ From now on, we assume that the copula correlation matrix Σ satisfies a factor model: Σ = Θ + V for some low-rank (or nearly low-rank), positive semidefinite, symmetric matrix Θ and diagonal matrix V.

28 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

29 Prelude: an elementary factor model for Σ In an elementary factor model for Σ = Θ + V, we assume that rank(θ) = r < d, and V = σ 2 I d. Let the plug-in estimator Σ have the eigen-decomposition d λ k û k û k, λ1 λ d. k=1

30 Prelude: an elementary factor model for Σ Construct the following closed-form estimators for r, σ 2, Θ and Σ, based on some threshold µ: r = σ 2 = Θ = d k=1 1 d r r k=1 1 { λk λ } d + µ, λ j, j> r ( λ k σ 2 )û k û k Σ = Θ diag( Θ) + I.

31 Prelude: an elementary factor model for Σ, continued Theorem Assume λ r (Θ) 2µ. On the event { } 2 Σ Σ 2 < µ, we have r = r, Σ Σ 2 F Θ Θ 2 F 8r Σ Σ 2 2, σ 2 σ 2 Σ Σ 2. Here, F denotes the Frobenius norm.

32 Prelude: an elementary factor model for Σ, continued Corollary Set µ = 8 T 2 fn,d,α 2 + f n,d,α 4 /4 + 8f n,d,α 2 µ = 8 T 2 f n,d,α + 12fn,d,α 2. Assume λ r (Θ) 2 µ, and T 2 fn,d,α 2. Then, Σ Σ 2 F 2r µ2, hold with probability exceeding 1 2α.

33 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

34 The refined estimator Σ of Σ = Θ + V In the factor model we can write Σ = Θ diag(θ) + I d := Θ o + I d. This motivates us to estimate the low-rank matrix Θ by Θ as the solution to the convex program { } 1 Θ = argmin Θ R d d 2 Θ o Σ o 2 F + µ Θ. Here, denotes the nuclear norm. Finally, we let the refined estimator Σ of the copula correlation matrix Σ be Σ = Θ o + I d.

35 Comparison of Σ with existing estimators Saunderson et al 2012 study a factor model for Σ in the noiseless setting using minimum trace factor analysis: Σ = Θ + V (Θ, V ) = argmin tr(θ ) subject to Θ 0 V diagonal If Θ has column space U such that its coherence max P Ue i 2 < 1/ 2, 1 i d then the above algorithm recovers Θ, V exactly.

36 Comparison of Σ with existing estimators, continued For Θ with effective low-rank and without sparse corruption, Lounici 2013 proposes { } 1 Θ = argmin Θ R d d 2 Θ Θ 2 F + µ Θ for an unbiased initial estimator Θ of Θ (which we lack). For Σ = Θ + S with low-rank matrix Θ and generic sparse matrix S, Chandrasekaran et al 2012, Hsu et al 2011 propose ( Θ, S) = argmin Θ,S R d d This formulation requires λ > 0. { } 1 2 Θ + S Σ 2 F + µ Θ + λ S l1.

37 Comparison of Σ with existing estimators, continued We are in between the above scenarios. We will extend the result of the former and relax the condition of the latter with more precise analysis.

38 Oracle inequality for the refined estimator Σ Theorem Let Θ r be best rank-r approximation to Θ, with reduced SVD Θ r = U r D r U r. Set µ = 10 T 2 f 2 n,d,α + f 4 n,d,α /4 + 10f 2 n,d,α, µ = 10 max [ T 2 f n,d,α, f 2 n,d,α Then, with probability at least 1 2α, ] + 15f 2 n,d,α. Σ Σ 2 F j>r λ 2 j (Θ) + 18r µ2 for all r with γ r = U r U r max 1/9.

39 Outline 1 Introduction 2 Bounding T T 2 Bounding Σ Σ 2 3

40 Simulation result The object of interest is the ratio Σ Σ 2 F, which our Σ Σ 2 F theory predicts to be roughly proportional to d/r as n. We (somewhat arbitrarily) set C = 0.1, α = 0.1, and forgo the log(d) factor when choosing the regularization parameter. We estimate this ratio by simulation.

41 Simulation setup Σ 2 = (Larger is better) 10 2 k b '! 'k 2 F =je '! 'k 2 F d=128, r=1 d=32, r=1 d=128, r=4 d=8, r=1 d=32, r= d=8, r= Sample size

42 Summary We have obtained a sharp bound on the operator norm of Σ Σ, for the plug-in estimator Σ of the copula correlation matrix Σ. We have applied the above bound to the setting of a factor model of Σ to obtain a refined estimator Σ; a sharp oracle inequality for Σ is established.

43 Thanks!

The Nonparanormal skeptic

The Nonparanormal skeptic The Nonpara skeptic Han Liu Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Fang Han Johns Hopkins University, 615 N. Wolfe Street, Baltimore, MD 21205 USA Ming Yuan Georgia Institute