Copulas MOU Lili December, 2014
Outline Preliminary Introduction Formal Definition Copula Functions Estimating the Parameters Example Conclusion and Discussion
Preliminary MOU Lili SEKE Team 3/30
Probability on Continous Variables Probability density function (pdf) f f(x) = Cumulative distribution function (cdf) F P (x ɛ X x + ɛ) 2ɛ F (x) = P (X x) Relationship between pdf and cdf f(x) = F (x) = x F (x) x f(t)dt MOU Lili SEKE Team 4/30
Introduction MOU Lili SEKE Team 5/30
From Marginal to Joint Probability We have random variables X = (X 1, X 2,, X d ) If we know the marginal distribution F 1(x 1),, F d (x d ), Say X i U[0, 1] What is the joint distribution? Is the joint distribution unique? No, because we do not know the dependencies among the variables. MOU Lili SEKE Team 6/30
Basic Idea Life will be easier, if... we have some effective way of modeling dependencies between variables. (Copula) Copula + Marginal distribution Joint distribution MOU Lili SEKE Team 7/30
Formal Definition MOU Lili SEKE Team 8/30
Copulas Let U = (U 1,, U d ) be d dimensional random variables, with marginal distribution U i U[0, 1] Define a copula function as a joint cdf on U C(u 1,, u d ) = P (U 1 u 1,, U d u d ) Essentially, a copula is defined as a joint distribution on a unit hyper-cube. MOU Lili SEKE Team 9/30
Sklar s Theorem Theorem For a multivariate distribution F (x 1,, x n), with marginal distribution F i(x i), a copula C s.t. C(F 1(x 1),, F d (x d )) = F (x 1,, x d ) Furthermore, if the marginal distributions are continuous, the copula is unique. MOU Lili SEKE Team 10/30
Proof 1. To prove F (X) U[0, 1], Let y = F X(x) and x = F 1 X (y) = H(y) f Y (y) = H (y) f X[H(y)] = dh(y) f X[H(y)] = dx dy df fx(x) = 1 X(x) 2. By the definition of copula C(u 1,, u d ) = P (U 1 u 1,, U d u d ), we have C(F 1(x 1),, F d (x d )) = P (F 1(X 1) F 1(x 1),, F d (X d ) F d (X d )) = P (X 1 x 1,, X d x d ) = F X (x 1,, x d ) MOU Lili SEKE Team 11/30
Joint Probability Density Function Through differentiation of C(F 1(x 1),, F d (x d )) = F (x 1,, x d ) We obtain d f(x 1,, x n) = c(f 1(x 1),, F d (x d )) f i(x i) i=1 where c(u 1,, u d ) = d C(u 1,, u d ) u 1 u d MOU Lili SEKE Team 12/30
Importance of Sklar s Theorem Instead of specifying directly an odd joint distribution on X 1,, X d, we Model the marginal distribution, which is easy, and Specify a coplua C, modeling nontrivial dependencies between variables. Sklar s Theorem guarantees: Copula(Marginal)=Joint MOU Lili SEKE Team 13/30
Copula Functions MOU Lili SEKE Team 14/30
Archimedean Copula Family C φ (u 1, u d ) = φ 1 ( d i=1 φ(u i) ) Constraints on Φ Verify the marginal distribution F (u 1) = C φ (u 1, +,, + ) = φ 1 φ(u 1) = u 1 U i U[0, 1] MOU Lili SEKE Team 15/30
Examples Glayton, φ(x) = x α 1, α > 0, Gumbel, ( log(x)) α, α > 1, Frank, φ(x) = log ( ) exp( αx) 1, α (, + ), α 0 exp( α) 1 The parameters are learned with training examples. Specially, if α = 1 in Gumbel copula, then variables are independent. MOU Lili SEKE Team 16/30
Glayton φ(x) = x α 1, α > 0 MOU Lili SEKE Team 17/30
"Correct" Copula Substitute u i = F i(x i) into C(F 1(x 1),, F d (x d )) = F (x 1,, x d ) We obtain C(u i,, u d ) = F (F 1 1 (u 1), F 1 d (u d ) Note that for any valid multivariate cumulative function C is valid regardless of F and F i. C(u 1, 1,, 1) = F (F 1 1 (u 1),,, ) = P (F 1 1 (U 1) < F 1 1 (u 1)) = P (U 1 < u 1) = u 1 Let F i to be standard norm distribution, and F to be norm with 0 mean and covariance matrix Σ. MOU Lili SEKE Team 18/30
Example ρ =.4 [Source: http://en.wikipedia.org/wiki/copula_(probability_theory)] MOU Lili SEKE Team 19/30
Estimating the Parameters MOU Lili SEKE Team 20/30
Semi-Parametric Estimation Log-likelihood l(α, θ 1,, θ d ) = ( ) n ( ) n d log c α F θ1 (x (j) 1 ),, F θ d (x (j) d ) + log(f θ1 (x (j) i )) j=1 j=1 i=1 Inference for Margins (IFM), 2 two-stage procedure: 1. Estimate marginal distributions 2. Estimate copula parameters l(α) = log ( c α ( ˆF1(x (j) 1 ),, ˆF d (x (j) d ) )) MOU Lili SEKE Team 21/30
Semi-Parametric Estimation The marginal distribution is estimated empirically. F i(y) = P (X i y) 1 n n j=1 1{x (j) i y} The log-likelihood ( ( l(α) = log c α F1(x (j) 1 ),, F )) d (x (j) d ) MOU Lili SEKE Team 22/30
Example MOU Lili SEKE Team 23/30
Teany Tiny Baby Example Bivariate normal distribution µ = (00) T Σ = ( 1.25 0.43 0.43 1.75 ) 1000 training data samples Frank Archimedean copula semiparametric version of IFM MOU Lili SEKE Team 24/30
Results: Density MOU Lili SEKE Team 25/30
Results: Squared Error MOU Lili SEKE Team 26/30
Results: Marginal Distribution MOU Lili SEKE Team 27/30
Conclusion and Discussion MOU Lili SEKE Team 28/30
Conclusion A copula is defined as a joint cdf on a unit hypercude with marginal distributions U[0, 1]. Copulas link joint distribution with marginal distributions. Copula ( Marginals ) = Joint We can specify copulars explicitly, or via a (joint, marginals) pair Parameters are learned by MLE or its variations. + Copulas provide a means of modeling non-trivial dependencies. - Choosing copulas is a $64,000,000 question. Inappropriate copulas, which fail to model true dependencies in data, may also cause considerable financial losses. MOU Lili SEKE Team 29/30
Discussion Q&A Thanks for listening! MOU Lili SEKE Team 30/30