Functional principal component analysis of financial time series

Size: px

Start display at page:

Download "Functional principal component analysis of financial time series"

Janis Wade
6 years ago
Views:

1 Vichi M., Monari P., Mignani S., Montanari A. (Eds.) New Developments in Classification and Data Analysis Springer-Verlag, Berlin, 2005, Functional principal component analysis of financial time series Salvatore Ingrassia and G. Damiana Costanzo Dipartimento di Economia e Statistica, Università della Calabria Arcavacata di Rende (CS), Italy s.ingrassia@unical.it, dm.costanzo@unical.it Abstract. We introduce functional principal component techniques for the statistical analysis of a set of financial time series from an explorative point of view. We show that this approach highlights some relevant statistical features of such related datasets. A case study is here considered concerning the daily traded volumes of the shares in the MIB30 basket from January 3rd, 2000 to December 30th, Moreover, since the first functional principal component accounts for the 89.4% of the whole variabilitity, this approach suggests the construction of new financial indices based on functional indicators. 1 Introduction Functional domain supports many recent methodologies for statistical analysis of data coming from measurements concerning continuous phenomena; such techniques constitute nowadays a new branch of statistics named functional data analysis, see Ramsay and Silverman (1997, 2002). Financial markets offer an appealing field of application since the phase of dealing is continuous and then the share prices, as well as other related quantities, are updated with a very high frequency. This paper focuses on functional principal component based approach to the statistical analysis of financial data. In finance principal component based techniques have been considered sometimes e.g. for construction of uncorrelated indices in multi-index models, see Elton and Gruber (1973); moreover they have been suggested in high frequency trading models by Dunis et al. (1998). Here we show that the functional version provides an useful tool for the statistical analysis of a set of financial series from an explorative perspective. Furthermore we point out as this approach suggests the possibility of the construction of stock market indices based on functional indicators. The analysis is here illustrated by considering the data concerning the daily traded volumes of the 30 shares listed in the MIB30 basket in the period January 3rd, December 30th, The rest of the paper is organised as follows. In the next section we outline functional data modeling and give some details about functional principal component analysis; in Section 3 we introduce the MIB30 basket dataset and present the main results of our analysis; finally in Section 4 we discuss further methodological aspects

2 350 Ingrassia and Costanzo and open a problem concerning the construction of new stock market indices on the ground of the obtained results. 2 Functional PCA Functional data are essentially curves and trajectories, the basic rationale is that we should think of observed data functions as single entities rather than merely a sequence of individual observations. Even though functional data analysis often deals with temporal data, its scope and objectives are quite different from time series analysis. While time series analysis focuses mainly on modeling data, or in predicting future observations, the techniques developed in FDA are essentially exploratory in nature: the emphasis is on trajectories and shapes; moreover unequally-spaced and/or different number of observations can be taken into account as well as series of observations with missing values. From a practical point of view, functional data are usually observed and recorded discretely. Let {ω 1,..., ω n} be a set of n units and let y i = (y i(t 1),..., y i(t p)) be a sample of measurements of a variable Y taken at p times t 1,..., t p T = [a, b] in the i-th unit ω i, (i = 1,..., n). As remarked above, such data y i (i = 1,..., n) are regarded as functional because they are considered as single entities rather than merely sequences of individual observations, so they are called raw functional data; indeed the term functional refers to the intrinsic structure of the data rather than to their explicit form. In order to convert raw functional data into a suitable functional form, a smooth function x i(t) is assumed to lie behind y i which is referred to as the true functional form; this implies, in principle, that we can evaluate x at any point t T. The set X T = {x 1(t),..., x n(t)} t T is the functional dataset. In functional data analysis the statistical techniques posit a vector space of real-valued functions defined on a closed interval for which the integral of their squares is finite. If attention is confined to functions having finite norms, then the resulting space is a Hilbert space; however often we require a stronger assumption so we assume H be a reproducing kernel Hilbert space (r.k.h.s.), see Wahba (1990), which is a Hilbert space of real-valued functions on T with the property that, for each t T, the evaluation functional L t, which associates f with f(t), L tf f(t), is a bounded linear functional. In such spaces the objective in principal component analysis of functional data is the orthogonal decomposition of the variance function: v(t, u) := 1 n 1 nx {x i(t) x(t)}{x i(u) x(u)} (1) i=1 (which is the counterpart of the covariance matrix of a multidimensional dataset) in order to isolate the dominant components of functional variation, see e.g. also Pezzulli (1994). In analogy with the multivariate case, the functional PCA problem is characterized by the following decomposition of the variance function: v(t, u) = X j λ jξ j(t)ξ j(u) (2)

3 Functional PCA of Financial Time Series 351 where λ j, ξ j(t) satisfy the eigenequation: and the eigenvalues: Z λ j := v(s, ), ξ j h = λ jξ j(u). (3) T ξ j(t)v(t, u)ξ j(u)dt du are positive and non decreasing while the eigenfunctions must satisfy the constraints: Z Z ξj(t)dt 2 = 1 and ξ jξ i(t)dt = 0 (i < j). T The ξ j s are usually called principal component weight functions. Finally the principal component scores (of ξ(t)) of the units in the dataset are the values w i given by: Z w (j) i := x i, ξ j = ξ(t)x i(t)dt. (4) T The decomposition (2) defined by the eigenequation (3) permits a reduced rank least squares approximation to the covariance function v. Thus, the leading eigenfunctions ξ define the principal components of variation among the sample functions x i. T 3 An explorative analysis of the MIB30 basket dataset Data considered here consist of the total value of the traded volumes of the shares composing the MIB30 index in the period January 3rd, December 30th, 2002, see also Costanzo (2003) for details. An important characteristic of this basket is that it is open in that the composition of the index is normally updated twice a year, in the months of March and September (ordinary revisions). However, in response to extraordinary events, or for technical reasons ordinary revisions may be brought forward or postponed with respect to the scheduled date; furthermore, in the interval between two consecutive revisions, the shares in the basket may be excluded due to particular reasons, see the website for further details. Raw data have been collected in a matrix. There are 21 companies which have remaining in the basket for the three years: Alleanza, Autostrade, Banca Fideuram, Banca Monte Paschi Siena, Banca Naz. Lavoro, Enel, Eni, Fiat, Finmeccanica, Generali, Mediaset, Mediobanca, Mediolanum, Olivetti, Pirelli Spa, Ras, San Paolo Imi, Seat Pagine Gialle, Telecom Italia, Tim, Unicredito Italiano; the other 9 places in the basket have been shared by a set of other companies which have been remaining in the basket for shorter periods. Such mixed trajectories will be called here homogeneous piecewise components of the functional data set and they will be referred as T 1,..., T 9. An example, concerning T 1, T 2, T 3 is given in Table 1. Due to the connection among the international financial markets, data concerning the closing days (as week-ends and holidays) are regarded here as missing data. In literature functional PCA is usual performed from the original data (x ij); here we preferred to work on the daily standardized raw functional data: z ij := xij xj s j (i = 1,..., 30, j = 1,..., 758), (5)

4 352 Ingrassia and Costanzo Date T1 T2 T3 03/01/2000 AEM Banca Commerciale Italiana Banca di Roma 04/04/2000 AEM Banca Commerciale Italiana Banca di Roma 18/09/2000 AEM Banca Commerciale Italiana Banca di Roma 02/01/2001 AEM Banca Commerciale Italiana Banca di Roma 19/03/2001 AEM Italgas Banca di Roma 02/05/2001 AEM Italgas Banca di Roma 24/08/2001 AEM Italgas Banca di Roma 24/09/2001 AEM Italgas Banca di Roma 18/03/2002 Snam Rete Gas Italgas Banca di Roma 01/07/2002 AEM Italgas Capitalia 15/07/2002 AEM Italgas Capitalia 23/09/2002 Banca Antonveneta Italgas Capitalia 04/12/2002 Banca Antonveneta Italgas Capitalia Table 1. The homogeneous piecewise components T 1, T 2, T 3. where x j and s j are respectively the daily mean and standard deviation of the e.e.v of the shares in the basket. We shall exhibit later how such transformation can gain an insight into the PC trajectories understanding. The functional dataset has been obtained from such data according to the procedure illustrated in Ramsay (2001). The trajectories of the first two functional principal components are plotted in Figure 1; they show the way in which such set of functional data varies from its mean, and, in terms of these modes of variability, quantifies the discrepancy from the mean of each individual functional datum. The analysis showed that the first PC alone accounts for the 89.4% and the second PC accounts for the 6.9% of the whole variability. The meaning of functional principal component analysis is a more complicated task than the usual multidimensional analysis, however here it emerges the following interpretation: i. The first functional PC is always positive, then shares with large scores of this component during the considered period have a large traded volume as compared with the mean value on the basket; it can be interpreted as a long term trend component. ii. The second functional PC changes sign at t = 431 which corresponds to September 11th, 2001 and the final values, in absolute value, are greater than the initial values: this means that shares having good (bad) performances before the September 11th, 2001 have been going down(rising) after this date; it can be interpreted as a shock component. This interpretation is confirmed by the following analysis of the raw data. As it concerns the first PC, for each company we considered its minimum standardized value over the three years z (min) i = min j=1,...,758 z ij (i = 1,..., 30). In particular z (min) i is positive (negative) when the traded volumes of the i-th share are always greater (less) than the mean value of the MIB30 basket during the three years. As for the second PC, let x Bi be the average of the traded volumes of the ith company over the days: 1,...,431 (i.e. before September 11th, 2001) and x Ai be the

5 Functional PCA of Financial Time Series PCA function 1 (Percentage of variability 89.4 ) PCA function 2 (Percentage of variability 6.9 ) Fig. 1. Plot of the first 2 functional principal components. corresponding mean value after September 11th, Let us consider the variation per cent: xai xbi δ i := 100% i = 1,..., 30. x Bi If δ i is positive (negative) then the ith company increased (decreased) its mean e.e.v. after the September 11, Finally consider the scores on the two first PCs given in (4), respectively w (1) i and w (2) i (see Figure 2). We observe that : i) companies with large positive (negative) value w (1) j present a large (small) value than the mean during the entire considered period, i.e. of z (min) i, see Table 2; ii) companies with large positive (negative) value w (2) i show a large decrement (increment) after September 11th, 2001 (Day=431), i.e. of δ i, see Table 3. Further details are given in Costanzo and Ingrassia (2004). 4 Further remarks and methodological perspectives The results illustrate the capability of functional PCs to highlight statistical features of a set of financial time series as the subsequent analysis on the raw data has been also confirmed. As we remarked above, the functional data set has been here constructed using the standardized data (z ij) defined in (5) rather than the original data (x ij); Figure 1 shows how this approach clarifies the contribute of the PC trajectories with respect to the mean trajectory. For the sake of completeness, we point out that the first two PCs computed on the non standardized data (x ij) explained respectively the 88.9% and the 7.1% of the whole variability; the plot of the scores on such two harmonics is practically the same of the one given in Figure 2.

6 354 Ingrassia and Costanzo SeatP.Gialle Olivetti Scores on Harmonic Enel T3 Mediaset Finmecc. T1 Pirelli T5 MedioB. T6 Alleanza B.N.L Fiat Generali MontePaschi B.Fideraum T7 Mediolanum T8 T2Ras SanPaolo T4 Autostrade Unicred.It. Telecom Tim Eni T Scores on Harmonic 1 Fig. 2. Scores on the two first harmonics z (min) i Company w (1) i Eni Telecom Tim Enel Generali Olivetti Unicredito T Mediaset Seat Pagine Gialle Table 2. Comparison between z (min) i and w (1) i for some companies. In our opinion, the obtained results open some methodological perspectives for the construction of new financial indices having some suitable statistical properties. As a matter of fact, the construction of some existing stock market indices has been criticized by several authors, see e.g. Elton and Gruber (1995). For example, it is well known that the famous U.S. Dow Jones presents some statistical flaws, but, despite these drawbacks in the methodology used in their computation, it continues to be widely employed.

7 Functional PCA of Financial Time Series 355 δ j Company w (2) j 80.20% Seat Pagine Gialle % Olivetti % Enel % Unicredito % Autostrade % T Table 3. Comparison between δ i and w (2) i for some companies. In Italy, the MIB30 share basket is summarized by the MIB30 index which is calculated according to the formula:! X30 p it p i0q i0 MIB30 = w i0 r 0 with w i0 = P p 30 (6) i0 i=1 pi0qi0 i=1 where p it is the current price of the i-th share at time t; p i0 is the base price of the i-th share which is the opening price on the day on which the updating of the index takes effect (multiplied, where appropriate, by an adjustment coefficient calculated by the Italian Exchange in the event of actions involving the i-th company s capital); q i0 is the base number of the shares in circulation of the i-th stock. The weight w i0 of the i-th share in (6) is given by the ratio of the company s market capitalisation to the market capitalisation of all the companies in the basket. Finally r 0 is a factor with the base set equal to one, used to maintain the continuity of the index when the basket is updated and the value 10,000 is the base of the index on December 31st, However such indices don t take into account the variability of the share prices (or of the traded volumes, or other related quantities) during any time interval (e.g. between two consecutive updating of the basket composition). Due to the resulted presented above, the shares scores on this harmonic seem constitute a good ingredient for a new family of financial indices trying to capture as most as possible of the variability of the prices in the share basket. This provides ideas for further developments of functional principal component techniques in the financial field. Acknowledgments Dataset used in this paper have been collected by the Italian Stock Exchange. The authors thank Research & Development DBMS (Borsa Italiana). References CERIOLI, A., LAURINI, F. and CORBELLINI, A. (2003), Functional cluster analysis of financial time series. In: M. Vichi and P. Monari (Eds.): Book of Short Paper of Cladag 2003.

8 356 Ingrassia and Costanzo COSTANZO, G.D. (2003), A graphical analysis of the dynamics of the MIB30 index in the period by a functional data approach. In: Atti Riunione Scientifica SIS 2003, Rocco Curto Editore, Napoli. COSTANZO, G.D., INGRASSIA, S. (2004): Analysis of the MIB30 basket in the period by functional PC s. In: J. Antoch (Ed.): Proceedings of Compstat 2004, Prague August 23-27, DUNIS, C., GAVRIDIS, M., HARRIS, A., LEONG, S. and NACASKUL P. (1998): An application of genetic algorithms to high frequence trading models: a case study. In: C. Dunis and B. Zhou (Eds.): Nonlinear Modeling of High Frequency Financial Time Series. John Wiley & Sons, New York, ELTON, E.J. and GRUBER, M.J. (1973): Estimating the dependence structure of share prices. Implications for portfolio, Journal of Finance, ELTON, E.J. and GRUBER, M.J. (1995): Modern Portfolio Theory and Investment Analysis. John Wiley & Sons, New York. PEZZULLI, S. (1994): L analisi delle componenti principali quando i dati sono funzioni, Tesi di Dottorato. RAMSAY, J.O. (2001) Matlab and S-PLUS functions for Functional Data Analysis, McGill University. RAMSAY, J.O. and SILVERMAN, B.W. (1997): Functional Data Analysis. Springer-Verlag, New York. RAMSAY, J.O. and SILVERMAN, B.W. (2002): Applied Functional Data Analysis. Springer-Verlag, New York. WAHBA, G. (1990): Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics, SIAM, Philadelphia.

ANALYSIS OF THE MIB30 BASKET IN THE PERIOD BY FUNCTIONAL PC S

COMPSTAT 2004 Symposium c Physica-Verlag/Springer 2004 ANALYSIS OF THE MIB30 BASKET IN THE PERIOD 2000-2002 BY FUNCTIONAL PC S Damiana G. Costanzo and Salvatore Ingrassia Key words: Functional data, principal