Journal of Maheaics and Saisics, 9 (1): 65-71, 2013 ISSN 1549-3644 2013 Science Publicaions doi:10.3844/jssp.2013.65.71 Published Online 9 (1) 2013 (hp://www.hescipub.co/jss.oc) FORECASTING THE FINANCIAL RETURNS FOR USING MULTIPLE REGRESSION BASED ON PRINCIPAL COMPONENT ANALYSIS Nop Sopipan Progra of Maheaics and Applied Saisics, Faculy of Science and Technology, Nakhon Rachasia Rajabha Universiy, Nakhon Rachasia, Thailand Received 2012-08-30, Revised 2013-01-11; Acceped 2013-04-17 ABSTRACT The ai of his sudy was o forecas he reurns for he Sock Exchange of Thailand (SET) Index by adding soe explanaory variables and saionary Auoregressive order p (AR (p)) in he ean equaion of reurns. In addiion, we used Principal Coponen Analysis (PCA) o reove possible coplicaions caused by ulicollineariy. Resuls showed ha he uliple regressions based on PCA, has he bes perforance. Keywords: SET Index, Forecasing, Principal Coponen Analysis, Mulicollineariy Science Publicaions 1. INTRODUCTION In order o forecas he reurn r for specific purposes, any researchers have ade differen assupions for µ as appears in Equaion (2). Kyiaz and Beruen (2001) assue µ o be a regression odel wih a one-week delay; Supoj (2003) assues µ o be an auoregressive process; Ozurk (2008) assues µ o be a consan and Saayaha e al. (2012) assue µ o be an ARMA process wih a one-week delay. The financial reurns r ( r = 100 ln(p / P -1 ) for = 1,2,,T-1, P denoing he financial price a ie depend concurrenly and dynaically on any econoic and financial variables. Since he reurns have a saisically significan auocorrelaion heselves, lagged reurns igh be useful in predicing fuure reurns. In order o odel hese financial reurns ssues ha r follows a siple ie series odel such as a saionary AR (p) odel wih soe explanaory variables i. In oher words, r saisfies he following Equaion 1: r =µ +ε, n ε =µ + α + β r, 0 i i j - j p (1) 65 Where Equaion 2: P = (2) i i 100 ln( ) Pi( 1) Here P i denoes he financial price asse i for i = 1,2,,n a ie, r -j, j = 1,2,.,p is he reurns a lag j-h, ε represens errors assued o be a whie noise series wih an 2 i.i.d. ean of zero and a consan variance σ ε, µ 0,α i and β j are consans and n, p are posiive inegers. Noe ha he variance of errors ε in he odel (2) is assued o be a consan; soe auhors use his assupion in he odeling of ground-level ozone (Agirre-Basurko e al., 2006; Pires e al., 2008). The objecive of his sudy is o forecas reurns for he SET Index by using odel (1). We vary he process µ using four differen ypes and copare he perforance of he differen ypes. In he nex secion, we presen he basics of principal coponen analysis o reove possible coplicaions caused by he ulicollineariy of explanaory variables. The epirical sudy and ehodology is discussed in secion 3. Forecasing he reurns is described in secion 4 and he conclusions are presened in secion 5.
Nop Sopipan /Journal of Maheaics and Saisics 9 (1): 65-71, 2013 2. PRINCIPAL COMPONENT ANALYSIS An iporan opic in ulivariae ie series analysis is he sudy of he covariance (or correlaion) srucure of he series. For exaple, he covariance srucure of a vecor reurn series plays an iporan role in porfolio selecion. In wha follows, we discuss soe saisical ehods useful in sudying he covariance srucure of a vecor ie series. Given a -diensional rando variable R = (,,...,,r,...,r ) wih covariance arix, 1 2 n -1 -p a Principal Coponen Analysis (PCA) is concerned wih using a few linear cobinaions of R o explain he srucure of Σ R. If R f denoes he onhly log reurns of asses, hen PCA can be used o sudy he source of variaions of hese asse reurns. Here he keyword is few so ha siplificaion can be achieved in ulivariae analysis. PCA applies o eiher he covariance arix Σ R R or he correlaion arix (ρ R ) of R f. Since he correlaion arix is he covariance arix of he * -1 sandardized rando vecor R = S R, where S is he diagonal arix of sandard deviaions of he coponens of R, we use covariance arix in our heoreical discussion. Le δ i = (δ i1,...,δ i ) be a - diensional vecor, where I = l,. Then Z i = δ i R = δ ij R j is a linear cobinaion of Science Publicaions he rando vecor R. If R consiss of he siple reurns of socks, hen Z i is he reurn of a porfolio ha assigns weigh δ ij o he jh sock. Since uliplying a consan o δ i does no affec he proporion of allocaion assigned o he jh sock, we sandardize he vecor δ i so 2 haδδ i i = δ ij = 1. Using properies of a linear cobinaion of rando variables, we have Var(Z i ) = δ i Σ R δ i, Cov(Z i,z j ) = δ i R δ j, for i,j = 1,2,,. The idea of PCA is o find linear cobinaions δ i such ha Z i and Z j are uncorrelaed for i j and he variances of Z i are as large as possible. More specifically: The firs principal coponen of R is he linear cobinaion Z 1 = δ 1 R ha axiizes Var(Z 1 ) subjec o he consrainδδ = 1. 1 1 The second principal coponen of R is he linear cobinaion Z 2 = δ2r ha axiizes Var(Z 2) R 66 subjec o he consrains Cov(Z 1,Z 2 ) = 0. δδ 2 2 = 1 and The ih principal coponen of R is he linear cobinaion Z i = δ i R ha axiizes Var(Z i ) subjec o he consrains for j = 1,...,i -1 δδ i i = 1 and Cov(Z i,z j ) = 0 Since he covariance arix Σ R is non-negaive definie, i has a specral decoposiion. Le (λ,e ),...,(λ,e ) be he eigenvalue-eigenvecor pairs of 1 1 Σ R, where λ 1 ³λ 2 ³... ³λ ³ 0. We have he following saisical resul as follow: The ih principal coponen of r is Z i = e i R = e ij R j for i = l,, Moreover: Var(Z ) = e e = λ, i = 1,...,, i i R i i Cov(Z,Z ) = e e = 0, i j i j i R i If soe eigenvalues λ i are equal, he choices of he corresponding eigenvecors e i and hence Z i are no unique. In addiion, we have Var(R i ) = r( R ) = λ i = Var(Z i ) The resul says ha: Var(Z ) i Var(Z ) i. λ i = λ 1 +... +λ Consequenly, he proporion of oal variance in R explained by he ih principal coponen is siply he raio beween he ih eigenvalue and he su of all eigenvalues of Σ R. One can also copue he cuulaive proporion of oal variance explained by he firs i i principal coponens (i.e., λ j / λ j ). In pracice, one selecs a sall i such ha he prior cuulaive proporion is large. In order o cope wih he proble of ulicollineariy, we ransfor he explanaory variables in odel (1) ino he principal coponens. Then he new odel for forecasing r is Equaion 3: r =µ + α Z +ε, (3) 0 i i where, Z i, i = 1,2,, are i-h principal coponens of explanaory variables a ie.
Nop Sopipan /Journal of Maheaics and Saisics 9 (1): 65-71, 2013 We follow Tsay (2005) by assuing ha he asse reurn series r is a weekly saionary process. 3. EMPIRICAL STUDIES AND METHODOLOGY Naurally, he Thai sock arke has unique characerisics, so he facors influencing he price of socks raded in his arke are differen fro he facors influencing oher sock arkes (Chaigusin e al., 2008). Exaples of facors ha influence he Thai sock arke and he saisics used by researchers who have sudied hese facors in forecasing he SET Index are shown in Table 1. 3.1. Daa The daa ses used in his sudy are he daily reurn closing prices for he SET Index a ie (dependen variables) and he daily reurn closing prices for welve facors (explanaory independen variables). These welve facors are he following: The Dow Jones Index a ie -1 (DJIA) The Financial Ties 100 Index a ie -1 (FSTE) The S&P 500 Index a ie -1 (SP) The Nikkei225 Index a ie (NI) The Hang Seng Index a ie (HSKI) The Singapore Srais Ties Indusrial Index a ie (SES009) The Taiwan Sock Weighed Index a ie (TWII) The Souh Korea Sock Exchange Index a ie (KOSPI) The Oil Price in he New York Mercanile Exchange a ie (OIL) The Gold Price in he New York Mercanile Exchange a ie (GOLD) The Currency Exchange Rae in Thai Bah for one US dollar a ie (THB/USD) The Currency Exchange Rae in Thai Bah for one Hong Kong dollar a ie (THB/HKD) The acual closing prices for hese welve facors were obained fro hp://www.efinancehai.co. We used daa ses fro April 5, 2000, o July 5, 2012. We divided hese daa ino wo disjoin ses. The firs se, fro April 5, 2000, o Deceber 30, 2011, was used as a saple (2,873 observaions). The second se, fro January 3, 2012, o July 5, 2012, was used as ou-ofsaple (125 observaions). The plo for he SET Index closing prices and reurns is given in Fig. 1. Descripive saisics and he correlaions arix are given in Table 2 and 3. As can be seen fro Table 3, here are highly significan correlaions (p<0.01) beween he dependen variables and he explanaory variables. Therefore, hese explanaory variables were used o predic he SET Index. Also, here are highly significan correlaions (p<0.01) aong he explanaory variables. Fro Table 4 here are significan correlaions beween SET and lagged reurns of he SET wih firs and second laggs. These correlaions provide a easure for he linear relaions beween wo variables and also indicae he exisence of ulicollineariy beween he explanaory variables. However, uliple regression analysis based on his daase also shows ha here was a ulicollineariy proble wih he variance inflaion facor (VIF> = 5.0) as shown in Table 2. One approach o avoid his proble is PCA. Hence, we used welve explanaory variables o find he principal coponens and overall descripive saisics for seleced Principal Coponens (PCs), as shown in Table 5 and 6, respecively. Table 1. Ipac facors on he Sock Exchange of Thailand Index (SET Index) Facors Researchers -------------------------------------------------------------------------- 1 2 3 4 5 6 7 8 The Nasdaq Index The Down Jones Index The S&P 500 Index The Nikkei Index The Hang Seng Index The Srais Ties Indusrial Index The Currency Exchange Rae in Thai Bah o one US dollar The Currency Exchange Rae in Thai Bah o 100 Japan Yen The Currency Exchange Rae in Thai Bah o one Hong Kong dollar The Currency Exchange Rae in Thai Bah o one Singapore dollar Gold Prices Oil Prices Miniu Loan Raes * is seleced in uliple regression Science Publicaions 67
Nop Sopipan /Journal of Maheaics and Saisics 9 (1): 65-71, 2013 Fig. 1. Graph of he SET Index (a) and reurns of he SET Index (b) 3.2. Resuls of Principal Coponen Analysis Barle s sphericiy es for esing he null hypohesis where he correlaion arix is an ideniy arix was used o verify he applicabiliy of PCA. The value of Barle s sphericiy es for he SET Index was 18,167.07, which iplies ha he PCA is Science Publicaions 68 applicable o our daases (Table 2). Moreover, Kaiser s easure of sapling adequacy was also copued as 0.788, which indicaes ha he saple sizes were sufficien for us o apply he PCA. The resuls for PCA (Table 5) indicae ha here are welve Principal Coponens (PCs) for uliple regression analysis.
Nop Sopipan /Journal of Maheaics and Saisics 9 (1): 65-71, 2013 Table 2. Descripive saisics of he SET Index and explanaory variables Variables Mean Sd. Deviaion Skewness Kurosis Correlaion wih SET close VIF SET 0.0373 1.4644-0.690 9.194 1.000 DJIA 0.0047 1.2792-0.017 7.626 0.219** 14.5810 FSTE -0.0043 1.3280-0.169 5.718 0.166** 1.5270 SP -0.0031 1.3647-0.128 7.764 0.239** 15.1970 NI -0.0273 1.5986-0.499 7.609 0.369** 2.0100 HSKI 0.0053 1.6593-0.067 8.960 0.495** 2.4050 SES900 0.0122 1.3011-0.337 7.674 0.507** 2.1500 TWII -0.0096 1.5716-0.202 3.348 0.351** 1.6180 KOSPI 0.0272 1.7733-0.867 9.737 0.410** 2.1520 OIL 0.0413 2.5662 0.087 7.578 0.119** 1.0570 GOLD 0.0581 1.1831 0.137 6.383 0.077** 1.0680 THB/USD -0.0063 0.4258 0.511 20.223-0.152** 2.1970 THB/HKD -0.0059 0.5304 0.570 32.596-0.107** 2.1750 Jarque-Bera Noraliy es in SETclose 10741.72** Augened Dickey-Fuller es in SETclose -52.76** Kaiser-Meyer-Olkin Measure of Sapling Adequacy 0.79 Barles sphericiy es Approx. Chi-Square 18167.07342 df 66 Sig. 0 **Significan a he 0.01 level (2-ailed) Table 3. Correlaion arix of he SET Index and explanaory variables Correlaions SET DJIA FSTE SP NI HSKI SES900 TWII KOSPI OIL GOLD THB/USD THB/HKD SET 1.00 DJIA 0.22** 1.00 FSTE 0.17** 0.55** 1.00 SP 0.24** 0.96** 0.56** 1.00 NI 0.37** 0.45** 0.39** 0.47** 1.00 HSKI 0.50** 0.37** 0.29** 0.40** 0.59** 1.00 SES900 0.51** 0.33** 0.20** 0.35** 0.53** 0.70** 1.00 TWII 0.35** 0.30** 0.23** 0.32** 0.45** 0.49** 0.47** 1.00 KOSPI 0.41** 0.31** 0.26** 0.34** 0.59** 0.61** 0.57** 0.57** 1.00 OIL 0.12** 0.01-0.01 0.01 0.06** 0.10** 0.11** 0.06** 0.06** 1.00 GOLD 0.08** 0.04* 0.03 0.05** 0.07** 0.09** 0.07** 0.02 0.07** 0.20** 1.00 THB/USD -0.15** -0.07** -0.05** -0.08** -0.08** -0.12** -0.12** -0.10** -0.13** -0.04* -0.13** 1.00 THB/HKD -0.11** 0.00-0.01-0.02 0.00-0.07** -0.10** -0.11** -0.08** -0.12** -0.02-0.10** 1.00 **Correlaion significan a he 0.01 level (2-ailed) Table 4. Correlaion arix of he SET Index and lagged reurns of he SET Correlaions SET SET -1 SET -2 SET -3 SET -4 SET 1.00 SET -1 0.036* 1.00 SET -2 0.073** 0.036* 1.00 SET -3 0.007 0.073** 0.036* 1.00 SET -4-0.018 0.007 0.073** 0.036* 1.00 *,**Correlaion significan a he 0.05, 0.01 level (2-ailed) respecively. 4. FORECASTING THE RETURNS THE SET INDE BY MEAN EQUATIONS In his secion, we forecas he reurns for he SET Index (r := µ + ε ) using hree ean equaions (µ ): consan, AR (2) and uliple regression based on PCA. Aferwards, we copare error using wo loss funcions, i.e. Mean Square Error (MSE) and Mean Science Publicaions 69 Absolue Error (MAE). The paraeers for ean equaions for forecasing he SET Index and he value of loss funcions are shown in Table 6. We found ha he ean equaion ARMA (1,1) ha includes uliple regression based on PCAs (Table 6) has he bes perforance (MSE = 0.8886, MAE = 0.7463). So, we use his ean equaion for forecasing he reurns for he SET Index.
Nop Sopipan /Journal of Maheaics and Saisics 9 (1): 65-71, 2013 Table 5. Descripive saisics of seleced PCs Iniial Eigenvalues ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ PC Toal % of Var Su DJIA FSTE SP NI HSKI SES900 1 4.285 30.606 30.606 0.171 0.134 0.176 0.181 0.181 0.170 2 1.743 12.449 43.055 0.141 0.122 0.136 0.045-0.014-0.033 3 1.487 10.625 53.680-0.365-0.307-0.353 0.104 0.223 0.254 4 1.169 8.350 62.030 0.066 0.051 0.066-0.036-0.037-0.046 5 1.001 7.149 69.180-0.047-0.015-0.053-0.053-0.023-0.012 6 0.954 6.812 75.992-0.040-0.028-0.030 0.002-0.007 0.056 7 0.789 5.633 81.624-0.076 0.012-0.062 0.087 0.051-0.015 8 0.606 4.331 85.956-0.240 0.606-0.242 0.031-0.379-0.596 9 0.570 4.070 90.026 0.407-0.794 0.376-0.446-0.261-0.056 10 0.448 3.198 93.225-0.114 0.590-0.122-1.019 0.437 0.571 11 0.353 2.521 95.745-0.117-0.038-0.109 0.842-0.089 0.261 12 0.298 2.127 97.872-0.025 0.222-0.051 0.031-1.358 1.125 13 0.263 1.882 99.754-0.008-0.047-0.007-0.014 0.253-0.186 14 0.035 0.246 100.000 3.762 0.021-3.848 0.079 0.085 0.003 Weigh for he PCs ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- PC TWII KOSPI OIL GOLD TH/US TH/HK RT1 RT2 1 0.153 0.173 0.025 0.028-0.053-0.050 0.023 0.029 2-0.004-0.032-0.092-0.163 0.490 0.484-0.080 0.009 3 0.217 0.250 0.144 0.027 0.158 0.179-0.076 0.039 4-0.120-0.095 0.602 0.573 0.129 0.156 0.249 0.146 5 0.040 0.029-0.188-0.219-0.007 0.011 0.541 0.782 6 0.018 0.091-0.049-0.137 0.070 0.054 0.796-0.609 7-0.063 0.081-0.781 0.782 0.067 0.090 0.055-0.009 8 0.759 0.275 0.079 0.072-0.017 0.010-0.018-0.035 9 0.723-0.012-0.063 0.139 0.004 0.043-0.001-0.016 10 0.319-0.432-0.072 0.060 0.001 0.044-0.012-0.043 11 0.558-1.301-0.051 0.024-0.058 0.004 0.099-0.012 12-0.096 0.281-0.012 0.060 0.229-0.209-0.077 0.046 13 0.090-0.086 0.010 0.044 1.360-1.351 0.024 0.030 14-0.021 0.022-0.011 0.032-0.009 0.002 0.021-0.037 Table 6. Mean equaions for reurns of he SET Index and loss funcions Model Mean Equaion MSE MAE 1. Consan ean. µ = E[r ],µ = 0.0373 0.8914 0.7576 2. AR (2) µ =µ 0 +α1r -1 +α2r -2,µ = 0.34r -1 + 0.72r -2. 0.8900 0.7570 3. Muliple regressions based on PCA. n µ =µ + β Z 0 i i µ = 0.718Z 1-0.132Z 2 + 0.319Z 3-0.14Z 8 + 0.141Z 10-0.063Z 13 0.8886 0.7463 Science Publicaions 5. CONCLUSION We considered he proble of forecasing reurns for he SET Index by using a saionary Auoregressive order p (AR (p)) wih soe explanaory variables. Afer considering four ypes of ean equaions, we ransfored AR and explanaory variables o PC. We found ha uliple regressions based on PCA, has he bes perforance(mse = 0.8886, MAE = 0.7463). 70 6. REFERENCES Agirre-Basurko, E., G. Ibarra-Berasegi and I. Madariaga, 2006. Regression and ulilayer percepron-based odels o forecas hourly O 3 and NO 2 levels in he Bilbao area. Environ. Model. Sofware, 21: 430-446. DOI: 10.1016/j.envsof.2004.07.008
Nop Sopipan /Journal of Maheaics and Saisics 9 (1): 65-71, 2013 Chaigusin, S., C, Chirahajaree and J. Clayden, 2008. Sof copuing in he forecasing of he Sock Exchange of Thailand (SET). Proceedings of he 4h IEEE Inernaional Conference on Manageen of Innovaion and Technology, Sep. 21-24, IEEE plore Press, Bangkok, pp: 1277-1281. DOI: 10.1109/ICMIT.2008.4654554 Kyiaz, H. and H. Beruen, 2001. The day of he week effec on Sock Marke Volailiy. J. Econ. Finance, 25: 181-193. Ozurk, M., 2008. Geneic aspecs of hepaocellular carcinogenesis. Sein. Liver Dis., 19: 235-242. DOI: 10.1055/s-2007-1007113 Pires, J.C.M., F.G. Marins, S.I.V. Sousa, M.C.M. Alvi-Ferraz and M.C. Pereira, 2008. Selecion and validaion of paraeers in uliple linear and principal coponen regressions. Environ. Model. Sofware, 23: 50-55. DOI: 10.1016/j.envsof.2007.04.012 Saayaha, P., Sopipan, N. and B. Preanode, 2012. Forecasing he sock exchange of Thailand uses day of he week effec and arkov regie swiching GARCH. A. J. Econ. Bus. Adin., 4: 84-93. DOI: 10.3844/ajebasp.2012.84.93 Supoj, C., 2003. Invesigaion on Regie Swiching in Sock Marke. Thaasa Universiy, Bangkok, Thailand. Tsay, S., 2005. Analysis of Financial Tie Series. 2nd Edn., Wiley, Hoboken, New Jeresey, ISBN-10: 0471746185, pp: 605. Science Publicaions 71