Log Covariance Matrix Estimation

Size: px

Start display at page:

Download "Log Covariance Matrix Estimation"

Austin Sanders
6 years ago
Views:

1 Log Covariance Matrix Estimation Xinwei Deng Department of Statistics University of Wisconsin-Madison Joint work with Kam-Wah Tsui (Univ. of Wisconsin-Madsion) 1

2 Outline Background and Motivation The Proposed Log-ME Method Simulation and Real Example Summary and Discussion 2

3 Background Covariance matrix estimation is important in multivariate analysis and many statistical applications. Suppose x 1,...,x n are i.i.d. p-dimensional random vectors N(0,Σ). Let S = n i=1 x ix i /n be the sample covariance matrix. The negative log-likelihood function is proportional to L n (Σ) = log Σ 1 +tr[σ 1 S]. (1) Recent interests of p is large or p n. S is not a stable estimate. The largest eigenvalues of S overly estimate the true eigenvalues. When p > n, S is singular and the smallest eigenvalue is zero. How to estimate Σ 1? 3

4 Recent Estimation Methods on Σ or Σ 1 Reduce number of nonzeros estimates of Σ or Σ 1. Σ: Bickel and Levina (2008), using thresholding. Σ 1 : Yuan and Lin (2007), l 1 penalty on Σ 1. Friedman et al., (2008), Graphical Lasso. Meinshausen and Buhlmann (2006), Reformulated as regression. Shrinkage estimates of the covariance matrix. Ledoit and Wolf (2006), ρσ+(1 ρ)µi. Won et al. (2009), control the condition number (largest eigenvalue/smallest eigenvalue). 4

5 Motivation Estimate of Σ or Σ 1 needs to be positive definite. The mathematical restriction makes the covariance matrix estimation problem challenging. Any positive definite Σ can be expressed as a matrix exponential of a real symmetric matrix A. Σ = exp(a) = I + A+ A2 2! + Expressing the likelihood function in terms of A log(σ) releases the mathematical restriction. Consider the spectral decomposition of Σ = T DT with D = diag(d 1,...,d p ). Then A = T MT with M = diag(log(d 1 ),...,log(d p )). 5

6 Idea of the Proposed Method Leonard and Hsu (1992) used this log-transformation method to estimate Σ by approximating the likelihood using Volterra integral equation. Their approximation based on on S being nonsingular not applicable when p n. We extend the likelihood approximation to the case of singular S. Regularize the largest and smallest eigenvalues of Σ simultaneously. An efficient iterative quadratic programming algorithm to estimate A (log Σ). Call the resulting estimate Log-ME, Logarithm-transformed Matrix Estimate. 6

7 A Simple Example Experiment: simulate x i s from N(0,I), i = 1,...,n where n = 50. For each p varying from 5 to 100, consider the the largest and smallest eigenvalues of the covariance matrix estimate. For each p, repeat the experiment 100 times and compute the average of the largest eigenvalues and the average of the smallest eigenvalues for The sample covariance matrix. The Log-ME covariance matrix estimate 7

8 The averages of the largest and smallest eigenvalues of covariance matrix estimates over the dimension p.the true eigenvalues are all equal to 1. 8

9 The Transformed Log-Likelihood In terms of the covariance matrix logarithm A, the negative log-likelihood function in (1) becomes L n (A) = tr(a)+tr[exp( A)S]. (2) The problem of estimating a positive definite matrix Σ now becomes a problem of estimating a real symmetric matrix A. Because of the matrix exponential term exp( A)S, estimating A by directly minimizing L n (A) is nontrivial. Our approach: Approximate exp( A)S using the Volterra integral equation (valid even for S singular case). 9

10 The Volterra Integral Equation The Volterra integral equation (Bellman, 1970, page 175) is exp(at) = exp(a 0 t)+ t 0 exp(a 0 (t s))(a A 0 )exp(as)ds. (3) Repeatedly applying (3) leads to exp(at) = exp(a 0 t)+ + t s 0 0 t 0 exp(a 0 (t s))(a A 0 )exp(a 0 s)ds exp(a 0 (t s))(a A 0 )exp(a 0 (s u))(a A 0 )exp(a 0 u)duds + cubic and higher order terms, (4) where A 0 = log(σ 0 ) and Σ 0 is an initial estimate of Σ. The expression of exp( A) can be obtained by letting t = 1 in (4) and replacing A,A 0 in (4) with A, A 0. 10

11 Approximation to the Log-Likelihood The term tr[exp( A)S] can be written as tr[exp( A)S] =tr(sσ 1 0 ) s tr[(a A 0 )Σ s 0 SΣs 1 0 ]ds tr[(a A 0 )Σ u s 0 (A A 0 )Σ u 0 SΣs 1 0 ]duds + cubic and higher order terms. (5) By leaving out the higher order terms in (5), we approximate L n (A) by using l n (A): [ 1 ] l n (A) =tr(sσ 1 0 ) tr[(a A 0 )Σ s 0 SΣs 1 0 ]ds tr(a) + 1 s tr[(a A 0 )Σ u s 0 (A A 0 )Σ u 0 SΣs 1 0 ]duds. (6) 11

12 Explicit Form of l n (A) The integrations in l n (A) can be analytically solved through the spectral decomposition of Σ 0 = T 0 D 0 T 0. Some Notation: Here D 0 = diag(d (0) 1,...,d(0) p ) with d (0) i s as the eigenvalues of Σ 0. T 0 = (t (0) 1,...,t(0) p ) with t (0) i as the corresponding eigenvector for d (0) i. Let B = T 0 (A A 0)T 0 = (b i j ) p p, and S = T 0 ST 0 = ( s i j ) p p. The l n (A) can be written as a function of b i j : l n (A) = p i=1 1 2 ξ iib 2 ii + i< j ξ i j b 2 i j + 2 p i=1 τ i j b ii b i j + j i p k=1 i< j,i k, j k η ki j b ik b k j [ p β ii b ii + 2 β i j b i j ], (7) i=1 i< j up to some constant. Getting B Getting A. 12

13 Some Details For the linear term, β ii = s ii d (0) i 1, β i j = s i j(d (0) i d (0) j )/(d (0) i d (0) j ) (logd (0) i logd (0) j ). For the quadratic term, ξ ii = s ii d (0), i ξ i j = s ii/d (0) i s j j /d (0) j logd (0) j logd (0) i τ i j = η ki j = 1/d(0) j 1/d (0) i (logd (0) j logd (0) 1/d (0) i 1/d (0) j + (d(0) i /d (0) j 1) s ii /d (0) i +(d (0) j /d (0) i 1) s j j /d (0) j (logd (0) j logd (0), i ) 2 (0) i ) + 1/di 2 logd (0) j logd (0) s i j, i log(d (0) k /d (0) j )log(d (0) j /d (0) i ) + 1/d (0) j 1/d (0) i log(d (0) k /d (0) i )log(d (0) i /d (0) j ) + 2/d (0) k 1/d (0) i 1/d (0) j log(d (0) k /d (0) i )log(d (0) k /d (0) j ) s i j. 13

14 The Log-ME Method Propose a regularized method to estimate Σ by using the approximate log-likelihood function l n (A). Consider the penalty function A 2 F = tr(a2 ) = p i=1 (log(d i)) 2, where d i is the eigenvalue of the covariance matrix Σ. If d i goes to zero or diverges to infinity, the value of log(d i ) goes to infinity in both cases. Such a penalty function can simultaneously regularize the largest and smallest eigenvalues of the covariance matrix estimate. Estimate Σ, or equivalently A, by minimizing where λ is a tuning parameter. l n,λ (B) l n,λ (A) = l n (A)+λtr(A 2 ), (8) 14

15 An Iterative Algorithm The l n,λ (B) depends on an initial estimate Σ 0, or equivalently, A 0. Propose to iteratively use l n,λ (B) to obtain its minimizer ˆB: Algorithm: Step 1: Set an initial covariance matrix estimate Σ 0, a positive definite matrix. Step 2: Use the spectral decomposition Σ 0 = T 0 D 0 T 0, and set A 0 = log(σ 0 ). Step 3: Compute ˆB by minimizing l n,λ in (10). Then obtain Â = T 0 ˆBT 0 + A 0, and update the estimate of Σ by ˆΣ = exp(â) = exp(t 0 ˆBT 0 + A 0 ). Step 4: Check if ˆΣ Σ 0 2 F is less than a pre-specified positive tolerance value. Otherwise, set Σ 0 = ˆΣ and go back to Step 2. Set an initial Σ 0 in Step 1 to be S+εI. 15

16 Simulation Study Six different covariance models of Σ = (σ i j ) p p are used for comparison, Model 1: Homogeneous model with Σ = I. Model 2: MA(1) model with σ ii = 1,σ i,i 1 = σ i 1,i = Model 3: Circle model with σ ii = 1,σ i,i 1 = σ i 1,i = 0.3, σ 1,p = σ p,1 = 0.3. Compare four estimation methods: the banding estimate (Bickel and Levina, 2008), the LW estimate (Ledoit and Wolf, 2006), the Glasso estimate (Yuan and Lin, 2007), and the CN estimate (Won et al., 2009). Consider two loss functions to evaluate the performance of each method, KL = log ˆΣ 1 +tr( ˆΣ 1 Σ) ( log Σ 1 + p), 1 = ˆ d 1 / ˆ d p d 1 /d p, where d 1 and d p are the largest and smallest eigenvalue of Σ. Denote dˆ 1 and dˆ p to be their estimates. 16

17 Simulation Results Averages and standard errors from 100 runs in the case of n = 50, p = 50. Log-ME Banding LW Glasso CN Model KL 1 KL 1 KL 1 KL 1 KL (0.00) (0.00) (0.04) (0.52) (0.01) (0.02) (0.02) (0.02) (0.02) (0.01) * (0.02) (0.05) (882.90) (152.82) (0.02) (0.04) (0.03) (0.03) (0.02) (0.02) (0.01) (0.01) (0.13) (0.39) (0.01) (0.03) (0.02) (0.02) (0.01) (0.02) Note: The value marked with means it is affected by the matrix singularity. 17

18 Portfolio Optimization of Stock Data Apply the Log-ME method in an application of portfolio optimization. In mean-variance optimization, the risk of a portfolio w = (w 1,...,w p ) is measured by the standard deviation w T Σ 1 w, where w i 0 and p i w i = 1. The estimated minimum variance portfolio optimization problem is min w wt ˆΣ 1 w (9) s.t. p i w i = 1, where ˆΣ is an estimate of the true covariance matrix Σ. An accurate covariance matrix estimate ˆΣ can lead to a better portfolio strategy. 18

19 The Setting-up Consider the weekly returns of p = 30 components of the Dow Jones Industrial Index from January 8th, 2007 to June 28th, Use the first n = 50 observations as the training set, the next 50 observations as the validation set, and the remaining 83 observations for the test set. Let X ts be the test set and S ts be the sample covariance matrix of X ts. The performance of a portfolio w is measured by the realized return and the realized risk R(w) = x X ts w T x, σ(w) = w T S ts w. The optimal portfolio w is computed with ˆΣ estimated by the Log-ME method, the CN method (Won et al., 2009) and the S, separately. 19

20 The Comparison Results Table 1. The comparison of the realized return and the realized risk. Log-ME CN S Realized return R( w) Realized risk σ( w) The Log-ME method produced a portfolio with a larger realized return but smaller realized risk. 20

21 Comparison in Different Periods Consider the portfolio strategy using the Log-ME method for various covariance matrix estimation methods. Given a stating week, use the first 50 observations as the training set, the next 50 observations as a validation set, and the third 50 observations as a test set. Shift the starting week one ahead every time, and evaluate the portfolio strategy of 33 different consecutive test periods. The optimal portfolio w is computed with ˆΣ estimated by the Log-ME method, the CN method and the sample covariance matrix method, separately. 21

22 The Realized Returns Realized Return Log ME CN S 0.5 Realized Return The proposed Log-ME covariance matrix estimate can lead to higher returns. 22

23 The Realized Risks Realized Risk Log ME CN S Realized Risk The log-me method has relatively higher risks than the CN method, but it provides much larger realized returns than the CN method. 23

24 Summary Estimate the covariance matrix through its matrix logarithm based on a penalized likelihood function. The Log-ME method regularizes the largest and smallest eigenvalues simultaneously by imposing a convex penalty. Other penalty functions can be considered to improve the estimation in different perspectives. Extend to Bayesian covariance matrix estimation for the large-p-small-n problem. 24

25 Thank you! 25

26 The Log-ME Method (Con t) Note that tr(a 2 ) = tr((t 0 BT 0 + A 0) 2 ) is equivalent to tr(b 2 )+2tr(BΓ) up to some constant, where Γ = (γ i j ) p p = T 0 A 0T 0. In terms of B, the function l n,λ (A) becomes l n,λ (B) = p i=1 + λ 1 2 ξ iib 2 ii + i< j ( p β ii b ii + 2 i=1 i< j [ 1 2 p b 2 ii + i=1 ξ i j b 2 i j + 2 p i=1 β i j b i j ) p b 2 i j + i< j τ i j b ii b i j + j i p k=1 p γ ii b ii + 2 γ i j b i j ]. i=1 i< j The l n,λ (B) is still a quadratic function of B = (b i j ). i< j,i k, j k η ki j b ik b k j (10) 26

27 The CN Method The CN method is to estimate Σ with a constraint on its condition number (Won et al., 2009). They consider ˆΣ = Tdiag(û 1 1,...,û 1 p )T, where T is from the spectral decomposition of S = Tdiag(l 1,...,l p )T. The û 1,...,û p are obtained by solving the constraint optimization min u,u 1,...,u p p i where κ max is a tuning parameter. (l i u i logu i ) s.t. u u i κ max u, i = 1,..., p, The tuning parameter is computed through an independent validation set. 27

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results

Sparse Permutation Invariant Covariance Estimation: Motivation, Background and Key Results David Prince Biostat 572 dprince3@uw.edu April 19, 2012 David Prince (UW) SPICE April 19, 2012 1 / 11 Electronic