1 Smooth Common Principal Component Analysis Michal Benko Wolfgang Härdle Center for Applied Statistics and Economics benko@wiwi.hu-berlin.de Humboldt-Universität zu Berlin
Motivation 1-1 Volatility Surface Volatility Surface (3500.0,0.0,0.5) Vola (3500.0,0.0,0.5) 0.4 Maturit t 0.5 Vola Maturit t Moneyness 0.4 (3500.0,1.0,0.3) 0.8 0.3 0.5 0.2 (3500.0,0.0,0.3) 4375.0 5250.0 6125.0 (7000.0,0.0,0.3) 0.4 Moneyness (3500.0,1.0,0.3) 0.8 0.3 0.5 0.2 (3500.0,0.0,0.3) 4375.0 5250.0 6125.0 (7000.0,0.0,0.3) Figure 1: Implied Volatility Surfaces: t 1 = 990104 and t 2 = 990201, ODAX <film> XFGiv02.xpl
Motivation 1-2 Time Series of PCs: 1 month 0 50 100 150 200 250 Time Figure 2: 1st, 2nd and 3rd principal component of the 1 months maturity
Motivation 1-3 Parallel Coordinate Plot: 1. Eigenvector Factor loading -0.2 0 0.2 0.4 0.6 1 2 3 4 5 6 Index of moneyness Figure 3: 1 st eigenvectors (sep. PCA) for 1, 2 and 3 months maturity index 1 to 6 is κ {0.85, 0.90, 0.95, 1.00, 1.05, 1.10}
Motivation 1-4 Parallel Coordinate Plot: 2. Eigenvector Factor loading -0.5 0 0.5 1 2 3 4 5 6 Index of moneyness Figure 4: 2 nd eigenvectors (sep. PCA) for 1, 2 and 3 months maturity index 1 to 6 is κ {0.85, 0.90, 0.95, 1.00, 1.05, 1.10}
Motivation 1-5 Simulated CPC Model -10-5 0 5 10-5 0 5 10 Figure 5: Simulated CPC model as observable in vole data; compare Flury (1988)
Motivation 1-6 Factor loading CPC Coordinate Plot: First three Eigenvectors -0.6-0.4-0.2 0 0.2 0.4 0.6 1 2 3 4 5 6 Index of eigenvectors Figure 6: First three eigenvectors under CPC 1, 2 and 3 months maturity index 1 to 6 is κ {0.85, 0.90, 0.95, 1.00, 1.05, 1.10} XFGiv06.xpl
Motivation 1-7 + CPC for IV yields the desired dimension reduction + trading signals and strategies can be obtained from CPC model - the problem thought is of functional nature - eigenvectors are rough? it is possible to (elegantly) combine CPC & FDA
Motivation 1-8 Outline 1. Motivation 2. Functional Data Analysis 3. Functional Data Example 4. Smooth Functional Principal Components Analysis 5. Smooth Common Principal Components 6. Application
Functional Data Analysis 2-1 Functional Data Analysis - Problem Analyze (random) functions on an interval J: X(t), t J Using n-observations of X: X 1 (t),..., X n (t)
Functional Data Analysis 2-2 Moments X µ (t) = E X(t), t J Var(X) = E{X(t) E X(t)} 2, t J Γ(s, t) = E{X(s) E X(s)}{X(t) E X(t)}, s, t J In order to simplify the notation, assume X µ (t) = 0, t J.
Functional Data Analysis 2-3 Summary Statistics X(t) = 1 n σ(t) = 1 n Γ(s, t) = 1 n n X i (t) i=1 n { Xi (t) X(t) } 2 i=1 n { Xi (s) X(s) } { X i (t) X(t) } i=1
Functional PCA 3-1 Functional Principal Components Analysis: Analogue to Multivariate PCA in Functional Space: arg max (var < γ k, X >) (1) <γ l,γ k >=δ lk,l k Where < f, g > def = J f(t)g(t)dt Results are eigenfunctions γ k
Functional PCA 3-2 Theoretical Aspects of FDA Eigenfunctions are solutions of the eigenequation: (homogenous Fredholm equation of the second kind) Γ(, t)γ(t)dt = λγ( ) (2) Ramsay, J.,& Silverman, B., (1997)
Functional PCA 3-3 Eigenelements: f k - Principal Components f k =< γ k, X > γ k (t) - eigenfunctions λ k - eigenvalues = Var(f k ) = Var{ J γ k (t)x(t)dt} = γ k (s)γ(s, t)γ k (t)dsdt
Functional PCA 3-4 How to Estimate Functional Principal Components? Replace Γ in (2) by sample covariance function ˆΓ: ˆΓ(, t)ˆγ(t)dt = ˆλˆγ( ) (3) Resulting ˆλ k, ˆγ k are consistent estimators of λ k, γ k
Functional PCA 3-5 Basis Function Expansion Assume that X i (t) can be expanded in terms of a function basis Φ = (Φ 1,..., Φ L ) : L X i (t) c il Φ l (t) (4) l=1 Simultaneous expansion of all X i, i = 1,..., n: X CΦ Summary Statistics: X(t) CΦ(t) (5) ˆΓ(s, t) Φ(s) Cov(C)Φ(t) (6)
Functional PCA 3-6 Fourier Basis Well known example of functional Basis is a Fourier Basis: (1, sinωt, cosωt, sin2ωt, cos2ωt,...) where ω is frequency and determines the period J = 2π ω Functional expansion using Fourier basis: X i (t) a L 0i 2 + (a li cosωlt + b li sinωlt) l=1
Functional PCA 3-7 The eigenequation (2) can then be expressed: where Φ(t) Cov(C)Wb = λφ(t) b γ(t) b = (b 1, b 2,..., b L ) L b l φ l (t) l=1 W, w l1,l 2 def = < φ l1, φ l2 > Note: < γ l, γ k > corresponds with b l Wb k.
Functional PCA 3-8 Remark In the most practical applications even X i (t) can be observed only on discrete grids: Data Function X i is n-times observed only on discrete grid (t 1i,..., t pi ) J (X i (t 1i ), X i (t 2i ),..., X i (t pi )), i = 1,..., n
Functional PCA 3-9 The X i (t) itself need to be estimated from (X i (t 1i ), X i (t 2i ),..., X i (t pi )), i.e. for Basis expansion technique we need to estimate Ĉ this can be done by standard numerical or statistical tools.
Functional PCA 3-10 Functional Data Example Temperature Functions Y -30-20 -10 0 10 20 0 100 200 300 X XCSfdaex01.xpl Canadian temperature dataset using FFT with L = 31
Functional PCA 3-11 Mean and Variance Function Temperature Functions Temperature Functions Y -15-10 -5 0 5 10 15 0 100 200 300 X Y 20 40 60 80 0 100 200 300 X XCSfdaMom.xpl
Functional PCA 3-12 Covariance Function Covariance Function XCSfdaCOV.xpl
Smooth Principal Components 4-1 Smooth Principal Components Analysis Eigenfunctions are usually too rough for clear interpretation, e.g. Functional PCA Y*E-2-10 -5 0 5 0 1 2 3 X*E2 Eigenfunctions for temperature dataset, using Fourier expansion with L = 31. XCSfda02.xpl
Smooth Principal Components 4-2 IDEA: smooth eigenfunctions: 1. Plug the roughness penalty into objective function - SPCA 1 2. Plug the roughness penalty into scalar product - SPCA 2
Smooth Principal Components 4-3 Roughness Penalty in Objective Function ) arg max (var(< γ k, X >) α k < γ k, γ k > <γ l,γ k >=δ lk,l k (7) Idea: penalize rough functions direct in the maximization criterion Smoothing parameter α k has to be chossen (e.g. CV criterion)
Smooth Principal Components 4-4 Change inner product arg max (var < γ k, X >) (8) (γ l,γ k )=δ lk,l k (.,.) is of Sobolev type, e.g. (f, g) =< f, g > +α k < f g > Idea is to insert the roughness penalty in to the inner product
Smooth Principal Components 4-5 Remark Changing the Objective Function max var < γ k, X > +α k < γ k, γ k > < γ k, γ k > Changing the inner product var < γ k, X > max < γ k, γ k > +α k < γ k, γ k >
Smooth Principal Components 4-6 Problems How to choose α in roughness penalty? Is e.g. Cross-Validation appropriate? How to choose Φ and L?
Smooth Principal Components 4-6 Example Roughness Penalty by Changing the Norm Smoothed eigenfunctions (Fourier), 2000 Y*E-2-10 -5 0 5 0 1 2 3 X*E2 XCSfda03.xpl
Common Principal Components 5-1 Common Principal Components k-sample problem in the multivariate framework: k random vectors: X (1), X (2),..., X (k) R p. Model: Ψ j = Cov(X (j) ) = ΓΛ j Γ i.e. eigenvectors same across samples, eigenvalues (variances) differs. Let us denote sample covariance by S j.
Common Principal Components 5-2 Maximum Likelihood Estimation Further: L(Ψ 1, Ψ 2,..., Ψ k ) = C. S i W p (n i, Ψ i /n i ) k exp i=1 { ( tr n )} i 2 Ψ is i (Ψ i ) n i/2 Maximization of L is equivalent to: k [ det diag(γ ] S i Γ) Φ(Γ) = det(γ S i Γ) i=1 (9)
Common Principal Components 5-3 Testing the CPC Hypothesis Test Statistics χ 2 CP C = 2log L( ˆΨ 1, ˆΨ 2,..., ˆΨ k ) L(S 1, S 2,..., S k ) = n i log det ˆΨ i detŝ i χ (k 1)p(p 1)/2
Common Principal Components 5-4 Estimation SCPC SPCA are implemented through Spectral Analysis of Matrices Multivariate CPC criterion (9) - F G algorithm may be employed Remark: In the view of minimization property of FG algorithm this is reasonable approach, however obtained estimator may not be a ML estimator.
Common Principal Components 5-5 Testing the SCPC hypothesis Idea motivated from application: test the distribution of PC (f j i ) Takeda, Y., & Sugiyama, T., (2003) use this idea for testing λ 1 1 = λ 2 1... Small Simulation indicates poor behavior for SCPC hypothesis
Common Principal Components 5-6 Simulated SCPC Estimation Example } (x 500)2 Y 1 (t) = X 1,1 exp { 50 2 } (x 500)2 Y 2 (t) = X 1,2 exp { 50 2 } (x 600)2 + X 2,1 exp { 25 2 } (x 600)2 + X 2,2 exp { 25 2 where X 1,1 N(7, 0.5), X 2,1 N(3, 0.5) X 1,2 N(7, 0.75), X 2,2 N(3, 0.5) See Caglar, H., & Caglar, N., (2003)
Common Principal Components 5-7 Simulation Setup Functions X i simulated on discrete grid Basis: Fourier Basis L = 21 Coefficient estimated using FFT T = {350, 351,... 700} Number of Simulated Processes - n = 30
Common Principal Components 5-8 Simulated Processes Simulated Processes Y1 Simulated Processes Y2 0 2 4 6 8 Y 0 2 4 6 8 Y 0 100 200 300 X 0 100 200 300 X Simulated Processes {Y 1i (t), } n i=1, {Y 2i(t)} n i=1 t T, blue - true function
Common Principal Components 5-9 Fourier Expansion of the Simulated Processes Estimated Functions Y1 Estimated Functions Y2 0 2 4 6 Y 0 2 4 6 Y 0 100 200 300 X 0 100 200 300 X Fourier Expansion of the Simulated Processes {Ŷ1i(t)} n i=1, {Ŷ2i(t)} n i=1 (Fast Fourier Transformation, L = 21)
Common Principal Components 5-10 Principal Components Y*E-2 0 5 10 0 1 2 3 X*E2 1st components of simulated process, α = 10000 Process 1, Process 2, FG estimator (CPC)
Common Principal Components 5-11 Principal Components Y*E-2-10 -5 0 5 0 1 2 3 X*E2 2st components of simulated process, α = 10000 Process 1, Process 2, FG estimator (CPC)
Common Principal Components 5-12 Challenges in Implied Volatility Modelling 1. Data Preparation MD*Base 2. Strings 3. Time Shift of the strings
Common Principal Components 5-13 Summary and Outlook estimation of SCPC using simultaneous diagonalization statistical properties of estimation procedure proper functional bases and diagonalization method for the applications testing the SCPC hypothesis
References 6-1 References Dauxois, J., Pousse, A.,& Romain, Y., (1982). Asymptotic Theory for the Principal Component Analysis of a Vector Random Function: Some Applications to Statistical Inference, Journal of Multivariate Analysis 12,p. 136-154 Rice, J.,& Silverman, B., (1991). Estimating the Mean and Covariance Structure Nonparametrically when the Data are Curves, Journal of Royal Statistical Society, Ser. B 53,p. 233-243 Flury, B., (1988). Common Principal Components and Related Models, Wiley, New York Ramsay, J.,& Silverman, B., (1997). Functional Data Analysis, Springer, New York Härdle, W., (1990). Applied Nonparametric Regression, Econometric Society Monographs
References 6-2 Fengler, M., & Härdle, W., (2003). Voles, Volas, Values, Talk CASE, HU-Berlin Fengler, M., Härdle, W., & Villa, C., (2001). The Dynamics of Implied Volatilities: A CPC Approach, under submission Fengler, M., Härdle, W., & Mammen, E., (2003). Implied Volatitlity String Dynamics, under submission Takeda, Y., & Sugiyama, T., (2003). Some tests in Functional Principal Components Analysis, unpublished, Chuo University Caglar, H., & Caglar, N., (2003). The PCA of Continous Sample Curves with Higher-order B-spline Functions, unpublished, Istanbul University