MFM Practitioner Module: Risk & Asset Allocation September 11, 2013
Before we define dependence, it is useful to define Random variables X and Y are independent iff For all x, y. In particular, F (X,Y ) (x, y) = F X (x)f Y (y) E (XY ) = (E X ) (E Y ) We can differentiate ( ) to see that It is also true that f (X,Y ) (x, y) = f X (x)f Y (y) φ (X,Y ) (t X, t Y ) = φ X (t X )φ Y (t Y ) ( )
Marginal Density From Fubini s theorem, it is generally possible to derive marginal densities for a joint density, regardless of any dependence. f X (x) = f Y (y) = f (X,Y ) (x, y) dy f (X,Y ) (x, y) dx Of course, if X and Y are independent, then f (X,Y ) (x, y) = f X (x)f Y (y) but this does not need to be true in general.
Conditional Density Conditioning a random variable is a powerful concept! The marginal characteriztion of a dependent variable is adequate if we do not know or care about the value of any potentially related dependent variables Conditioning, on the other hand, allows us to incorporate synthesis Say we know the joint density of (X, Y ), and we have learned that an event, say Y = y, is true. We can adjust the marginal distribution of X to account for this fact f X Y (x) = f X (x) f (X,Y )(x, y) f X (x)f Y (y) Note the analogy here to the Radon-Nikodým change of measure.
Conditional Expectation A natural application of conditioning is the conditional expectation of a random variable. E X Y = x f X Y (x) dx Tower Property Sometimes it is useful to condition on unknown events. In this case, the conditional expectation is the same as the unconditional expectation. E (E X Y ) = E X The lesson here is that conditioning has to exclude some outcomes in order to be consequential.
The bivariate normal has five parameters: two means, two standard deviations, and a correlation. Standard Bivariate The standard version has only one parameter, 1 ρ 1. Its density is f (X,Y ) (x, y) = 1 x 2 2ρxy+y 2 1 2π e 2(1 ρ 2 ) 1 ρ 2 1. Evaluate the marginal density of X 2. Calculate its entropy 3. Evaluate the density of X Y for the event Y = 1 4. Show that conditioning had reduced its entropy
To the extent that the joint density is not just a product of the marginal densities, there is dependence. Factorization This ratio can be expressed as f U (F X1 (x 1 ), F X2 (x 2 ),...) f (X 1,X 2,...)(x 1, x 2,...) f X1 (x 1 )f X2 (x 2 ) Copula Sklar s theorem says this is always possible. means f U 1. More generally, f U : [0, 1] N R + is a density function that characterizes a new random variable, U, that encapsulates the dependence structure of X. Two random variables that have the same copula are said to be co-monotonic.
(Gaussian) Copula When the dependence can be described pair-wise, the normal copula can be appropriate. For U R 2 this has density [ 1 ρ ( f U (u) = exp 1 ρ 2 1 ρ 2 ρ erfc 1 (2u 1 ) 2 2 erfc 1 (2u 1 ) erfc 1 (2u 2 ) + ρ erfc 1 (2u 2 ) 2)] 0.0 0.5 1.0 2 1 1.0 0 0.5 0.0 copula density for ρ = 1 2
Tail Upper & Lower Tail Tail dependence is a pair-wise measure of the concordance of extreme outcomes. λ U = lim p 1 P {X > Q X (p) Y > Q Y (p)} λ L = lim p 0 P {X Q X (p) Y Q Y (p)} The normal copula fails to exhibit tail dependence: extreme outcomes are essentially independent. This is a problem, because in practice an extreme outcome in one dimension often acts to cause extreme outcomes in other dimensions. Developing practical alternatives that include this contagion effect is an active area of research.
Measures of Several measures of concordance have been developed. Their definitions are motivated by the properties of their estimators, which we will not discuss just yet. Each ranges from 1 to 1, with 0 for independence. In order of generality, we have 1. Pearson s rho. This is the classical correlation measure, which we will discuss below. 2. Spearman s rho. This is correlation applied to the grades, F X (X ). It is a simple measure of dependence that is not sensitive to margins. 3. Kendall s tau. This is a pure copula measure. It is based on rank-order correlations.
Kendall s tau Kendall s tau can be defined as τ = 4 E F U (U 1, U 2 ) 1 where F U is the distribution function characterizing the copula of X. It is the probability of concordance minus the probability of discordance for two independent draws of X. Relationship with other measures In general Spearman s rho is bounded by 3 τ 1 2 sgn τ & 1 + 2 τ τ 2 sgn τ 2 For a Gaussian copula, Pearson s rho is ρ = sin ( π 2 τ) One can use this to define the pseudo-correlation.
Kendall s tau The relationship between Kendall s tau and Spearman s and Pearson s rho is illustrated by this graph. 1.0 0.5 1.0 0.5 0.5 1.0 0.5 For a given level of Kendall s tau, Spearman s rho is bounded by the two outer curves. Pearson s rho for a Gaussian copula is the curve through the origin.
Bayes rule tells us how to reverse the roles in conditional probability. f Y X {x} (y) f Y (y)f X Y {y} (x) With Bayes rule we can update the density of Y taking into account the information contained in a revelation about X. Unless X is independent of Y, the posterior density of Y will have a lower entropy then the prior density of Y. Bayesian Statistics When we interpret Y as (unknown) parameter associated with the characterization of X, and the event X = x as an observation, provides an important foundation for estimation.
The covariance summarizes all pair-wise second central moments into a symmetric matrix. cov X = E ( XX ) E X E X The diagonal entries are the marginal variances. var X = diag cov X Correlation If the margins have standard location and dispersion, the covariance is called the correlation. The diagonal entries of a correlation matrix are ones. The N(N 1)/2 correlations completely specify a Gaussian copula for N = dim X. But not all dependence is pair-wise!
Linear Transformations Affine equivariance can be extended to the multivariate setting. In particular, because of the linearity of the expectation operator, E (AX + b) = A (E X ) + b cov (AX + b) = A (cov X ) A For constant matrix A and vector b. Quadratic Form An important special case is a univariate random variable defined by Y = a X for a vector a of weights. The standard deviation of Y is std Y = a (var X ) a
In the univariate setting, we take the square root of the variance to get the standard deviation, which is a useful measure of dispersion. We can do something analogous in the multivariate setting. Choleseky The decomposition of a symmetric positive-definite matrix Σ is the unique square lower-diagonal matrix L such that Σ = LL The decomposition provides a recipe for constructing correlated normals from independent normals. cov Z = I cov (LZ) = Σ
Dimension Reduction The spectral theorem gives up a way of defining a latent factor model that approximates X by a transformation of a lower-dimensional random variable. Since the covariance matrix is positive-definite, we can write cov X = EΛE where Λ is a diagonal matrix of (non-negative) eigenvalues. By cutting off the eigenvalues at some threshold, we can form truncated versions Ẽ and Λ that have fewer columns then the originals, and define a new random vector Z with dim Z < dim X, E Z = 0, and cov Z = I. Then X E X + Ẽ ΛZ approximates X.
Location and dispersion can be generalized to the multivariate setting. An important application of this is the inverse problem: How many standard deviations is an observations away from the mean? Ma(x) = (x E X ) (cov X ) 1 (x E X ) This is sometimes also called the z-score. The Mahalanobis distance is invariant under an affine transformation The Chebyshev inequality tells us that large values of the Mahalanobis distance are unlikely. P {Ma(X ) k} dim X k 2