Analysis of Big Dependent Data in Economics and Finance

Size: px

Start display at page:

Download "Analysis of Big Dependent Data in Economics and Finance"

Milo Stokes
5 years ago
Views:

1 Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72

2 Outline 1 Big data? Machine learning? Data science? What is in for economics and finance? 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off between simplicity and reality 5 Some methods useful for analyzing big dependent data in economics and finance 6 Examples 7 Concluding remarks Ruey S. Tsay Big Dependent Data 2 / 72

3 Big dependent data 1 Accurate information is the key to success in the competitive global economy. Information age. 2 What is big data? High dimension (many variables)? Large sample size? Both? 3 Not all big data sets are useful. Confounding & Noises 4 Need to develop methods to efficiently extract useful information from big data 5 Know the limitations of big data 6 Issues emerged from big data: privacy? ethical issues? 7 Focus on methods for analyzing big dependent data in economics and finance Ruey S. Tsay Big Dependent Data 3 / 72

4 What are available? Statistical methods: 1 Focus on sparsity (Simplicity) 2 Various penalized regressions, e.g. Lasso and its extensions 3 Various dimension reduction methods and models 4 Common framework used: Independent observations, with limited extensions to stationary data Real data are often dynamically dependent! Some useful concepts in analyzing big data: 1 Parsimony vs sparsity: Parsimony Sparsity 2 Simplicity vs reality: trade-off btw feasibility & sophistication Ruey S. Tsay Big Dependent Data 4 / 72

5 Parsimonious, not sparse A simple example y t = c + k βx it + ɛ t = c + β i=1 k x it + ɛ t, where k is large, x it are not perfectly correlated, and ɛ t are iid N(0, σ 2 ). The model has three parameters so it is parsimonious, but not sparse because y depends on all explanatory variables. In some applications, k i=1 x it is a close approximation to the first principal component. For example, the level of interest rates is important to an economy. Fused-Lasso can solve this difficulty in some situations. i=1 Ruey S. Tsay Big Dependent Data 5 / 72

6 What is LASSO regression? Model: (assume mean-adjusted) y i = p β j X j,i + ɛ i. j=1 Matrix form: X is the design matrix Y = Xβ + ɛ. Objective function: In particular, if p > T β(λ) = arg min β ( Y Xβ 2 2 /T + λ β 1), where λ 0 is a penalty parameter, β 1 = p j=1 β j, Y Xβ 2 2 = T i=1 (y i X i β)2 Ruey S. Tsay Big Dependent Data 6 / 72

7 What is the big deal? Sparsity Using convexity, LASSO is equivalent to β opt (R) = arg min Y Xβ 2 2 /T. β; β 1 R Old friend: Ridge regression β Ridge (λ) = arg min β ( Y Xβ 2 2 /T + λ β 2 2 ), or β(r) = arg min Y Xβ 2 β; β 2 2 R 2 /T. Special case: p = 2. Y Xβ 2 2 /T is quadratic. β 1 is a region of diamond shape, yet β 2 2 is a circle. Thus, LASSO leads to sparsity. Ruey S. Tsay Big Dependent Data 7 / 72

8 Computation and extensions 1 Optimization: Least angle regression (lars) by Efron et al. (2004) makes the computation very efficient. 2 Extensions: Group lasso: Yuan and Lin (2006). Subsets of X have specific meaning, e.g. treatment Elastic net: Zou and Hastie (2005). Using a combination of L 1 and L 2 penalties SCAD: Fan and Li (2001). Nonconcave penalized likelihood. [Smoothly clipped absolute deviation (SCAD).] Various Bayesian methods: penalty function is the prior. 3 Packages available in R: lars, glmnet, gamlr, gbm and many others. Ruey S. Tsay Big Dependent Data 8 / 72

9 A simulated example p = 300, T = 150, X iid N(0, 1), ɛ i iid N(0, 0.25). y i = x 3i +2(x 4i +x 5i +x 7i ) 2(x 11,i +x 12,i +x 13,i +x 21,i +x 22,i +x 30,i )+ɛ i 1 How? R demonstration 2 Selection of λ? Cross-validation (10-fold), measurement of prediction accuracy 3 The commands lars and cv.lars of the package lars 4 The commands glmnet and cv.glmnet of the package glmnet 5 Relationship between the two packages (alpha = 0) Ruey S. Tsay Big Dependent Data 9 / 72

10 Lasso may fail for dependent data 1 Data generating model: scalar Gaussian autoregressive, AR(3), model x t = 1.9x t 1 0.8x t 2 0.1x t 3 + a t, a t N(0, 1). Generate 2000 observations. See Figure 1. 2 Big data setup Dependent x t : t = 11,..., 2000 Regressors: X t = [x t 1, x t 2,..., x t 10, z 1t,..., z 10,t ], where z it are iid N(0, 1). Dimension = 20, sample size Run the Lasso regression via the lars package of R. See Figure 2 for results. Lag 3, x t 3 was not selected. Lasso fails in this case. Ruey S. Tsay Big Dependent Data 10 / 72

11 xt Time Figure: Time plot of simulated AR(3) time series with 2000 observations Ruey S. Tsay Big Dependent Data 11 / 72

12 LASSO Standardized Coefficients 2e+05 0e+00 2e+05 4e+05 * * ** * * * * * ** * * * * ***** * ** * * * * ** * * ** * * ** ****** ** * ** * * ** * **** * * * * * * * * * * *** * * * * * * * * * * * ** * * * beta /max beta Figure: Results of Lasso regression for the AR(3) series Ruey S. Tsay Big Dependent Data 12 / 72

13 OLS works if we entertain AR models Run the linear regression using the first three variables of X t. Fitted model x t = 1.902x t x t x t 3 + ɛ t, σ ɛ = All estimates are statistically significant with p-value less than The residuals are well behaved, e.g. Q(10) = with p-value 0.20 (after adjusting the df). Simple time series method works for dependent data. Ruey S. Tsay Big Dependent Data 13 / 72

14 Why does lasso fail? Two possibilities: 1 Scaling effect: Lasso standardizes each variable in X t. For unit-root non-stationary time series, standardization might wash out the dependence in the stationary part 2 Multicollinearity: Unit-root time series have strong serial correlations. [ACF approach 1 for all lags.] This artificial example highlights the difference between independent and dependent data. Need to develop methods for big dependent data! Ruey S. Tsay Big Dependent Data 14 / 72

15 Possible solutions 1 Re-parameterization using time series properties 2 Use different penalties for different parameters The first approach is easier. For the particular time series, we can define x t = (1 B)x t and 2 x t = (1 B) 2 x t. Then, x t = 1.9x t 1 0.8x t 2 0.1x t 3 + a t = x t 1 + x t x t 1 + a t = double + single + stationary + a t. The coefficients of x t 1, x t 1, 2 x t 1 are 1, 1, an 0.1, respectively. Ruey S. Tsay Big Dependent Data 15 / 72

16 Different frameworks for LASSO The X-matrix of conventional LASSO consists of (x t 1, x t 2,..., x t 10, z 1t,..., z 10,t ), where z it are iid N(0, 1). Under the re-parameterization, the X-matrix becomes (x t 1, x t 1, 2 x t 1,..., 2 x t 8, z 1t,..., z 10,t ). These two X-matrices provide theoretically the same information. However, the first one has high multicollinearity, but the 2nd one does not, especially after standardization. Ruey S. Tsay Big Dependent Data 16 / 72

17 β :20 β :20 β : β :22 β β : :22 Figure: Comparison of β-estimates of lars results Ruey S. Tsay Big Dependent Data 17 / 72

18 Theoretical justification Focus on the particular series x t used. Some properties of the series are 1 T 4 T t=1 x t 2 1 W 0 2, where W = 1 0 W (s)ds with W (s) the standard Brownian motion. 2 T 5/2 T t=1 x t 1 W 0 3 T 3 T t=1 x t x t 1 W 0 W 4 T 2 T t=1 ( x t) W 2 Standardization may wash out the x t 1 and 2 x t 1 parts. Ruey S. Tsay Big Dependent Data 18 / 72

19 Examples of big dependent data 1 Daily returns of U.S. stocks 2 Demand of electricity every 30-m intervals 3 Daily spreads of CDS (credit default swaps) of selected companies 4 Monthly unemployment rates of the 50 states of U.S. 5 Interest rates of an economy 6 Air pollution measurements of multiple locations and health risk. Complex spatio-temporal data in general. Ruey S. Tsay Big Dependent Data 19 / 72

20 N(stocks) days size Figure: Sample sizes of U.S. daily stock returns in 2012 and 2013: mean 6681, range = (6593,6774) Ruey S. Tsay Big Dependent Data 20 / 72

21 Time series plot Densities of 2012 Densities of 2013 density density lnreturn lnreturn Figure: Densities of daily log returns of U.S. stocks in 2012 and Ruey S. Tsay Big Dependent Data 21 / 72

22 Monday Tuesday Wednesday Thursday Friday Saturday Sunday demand demand demand demand demand demand demand Figure: Empirical densities of electricity demand, 30 minute intervals, from July 6, 1997 to March 31, Adelaide, Australia Ruey S. Tsay Big Dependent Data 22 / 72

23 State UNRATE: to urate year Figure: Time plots of monthly state unemployment rates of the U.S. from to Ruey S. Tsay Big Dependent Data 23 / 72

24 Some statistical methods Goal: Extract useful information, including pooling. 1 Classification and cluster analysis K means Tree-based classification Model-based classification 2 Factor models & Extensions Orthogonal factor model Approximate factor model Dynamic factor model Constrained factor models (column, row constraints) X t = Rf t C + e t 3 Generalizations of Lasso methods to dependent data, e.g. LASSO for nowcasting vs MIDAS Ruey S. Tsay Big Dependent Data 24 / 72

25 Constrained factor models Column (variable) constraint only: Tsai & Tsay (2010) Let z t be a k-dimensional time series z t = Hωf t + ɛ t, t = 1,..., T where H is a k r known matrix, f t is m-dimensional common factor, ω is r m unknown loading parameters. For observed data in matrix form Z = F ω H + ɛ Ruey S. Tsay Big Dependent Data 25 / 72

26 A simple illustration Monthly log returns of 10 stocks from 2001 to Semi-conductor: TXN, MU, INTC, TSM 2 Pharmaceutical: PFE, MRK, LLY 3 Investment bank: JPM, MS, GS The constraints H = [h 1, h 2, h 3 ], where h 1 = (1, 1, 1, 1, 0, 0, 0, 0, 0, 0) h 2 = (0, 0, 0, 0, 1, 1, 1, 0, 0, 0) h 3 = (0, 0, 0, 0, 0, 0, 0, 1, 1, 1) Ruey S. Tsay Big Dependent Data 26 / 72

27 Table: Estimation Results of Constrained and Orthogonal Factor Models Stock Constrained Model: L = H ω Orthogonal Model: PCA Tick L 1 L 2 L 3 Σ ɛ,i L 1 L 2 L 3 Σ ɛ,i TXN MU INTC TSM PFE MRK LLY JPM MS GS e.v Variability explained: 70.6% Variability explained: 72.4% Ruey S. Tsay Big Dependent Data 27 / 72

28 Both row and column constraints : Tsai, et al (2016) T observations and k variables. Data matrix form Z = F 1 ω 1 H + GF 2 ω 2 + GF 3ω 3 H + E, where G denotes a known T m row constraint matrix. Ruey S. Tsay Big Dependent Data 28 / 72

29 Ruey S. Tsay Big Dependent Data 29 / 72

30 New England West Noth Central West South Central Middle Atlantic East North Central South Atlantic moutain East South Central Pacific year year year Figure: Time plots of monthly housing starts (in logarithms) of 9 U.S. divisions: Ruey S. Tsay Big Dependent Data 30 / 72

31 F1[,1] F1[,2] F2[,1] F2[,2] F3[,1] F3[,2] Figure: Time series plots of common factors for a DCF model of order (r,p,q) = (2,2,2) via maximum likelihood estimation. Ruey S. Tsay Big Dependent Data 31 / 72

32 ts(gterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure: Time series plots for GF 2 ω 2 of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 32 / 72

33 ts(hterm) East North Central Pacific Mountain West South Central East South Central South Atlantic West North Central Middle Atlantic New England Index Figure: Time series plots for F 1 ω 1 H of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 33 / 72

34 ts(ghterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure: Time series plots for GF 3 ω 3 H of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 34 / 72

35 Matrix-valued variables Consider simultaneously n macroeconomic variables in k countries U.S. Italy Spain Canada GDP X 11,t X 12,t X 13,t X 1k,t Unem X 21,t X 22,t X 23,t X 2k,t CPI X 31,t X 32,t X 33,t X 3k,t... M1 X n1,t X n2,t X n3,t X nk,t On-going: only preliminary results are available. See Chen et al (2016) Ruey S. Tsay Big Dependent Data 35 / 72

36 Classification A possible approach: Use a two-step procedure 1 Transform dependent big data into functions, e.g. probability densities 2 Apply classification methods to functional data The density functions of daily log returns of U.S. stocks serve as an example. We can then classify the density functions to make statistical inference Ruey S. Tsay Big Dependent Data 36 / 72

37 Illustration of classification Cluster Analysis of density functions Consider the time series of density functions {f t (x)}. For simplicity, assume the densities are evaluated at equally-spaced grid point {x 1 < x 2 <... < x N } D with increment x. The data we have become {f t (x i ) t = 1,..., T ; i = 1,..., N}. Using Hellinger distance (HD), we consider two methods: K means Tree-based classification Ruey S. Tsay Big Dependent Data 37 / 72

38 Hellinger distance of two density functions Let f (x) and g(x) be two density functions on the common domain D R. Assume both density functions are absolutely continuous w.r.t. the Lebesgue measure. The Hellinger distance (HD) between f (x) and g(x) is defined as H(f, g) 2 = 1 ( ) 2 f (x) g(x) dx = 1 f (x)g(x)dx 2 D Basic properties: 1 H(f, g) 0 2 H(f, g) = 0 if and only if f (x) = g(x) almost surely. D Ruey S. Tsay Big Dependent Data 38 / 72

39 K-means method For a given K, the K-means method seeks partitions of the densities, say, C 1,..., C K, such that 1 K k=1 C k = {f t (x)} 2 C i Cj = for i j 3 Sum of within-cluster variation V = K k=1 V (C k) is minimized, where the within-cluster variation is V (C k ) = t 1,t 2 C k H(f t1, f t2 ) 2 It turns out this can easily be done by applying the K-means method with squared Euclidean distance to the squared-root densities { f t (x)}. Ruey S. Tsay Big Dependent Data 39 / 72

40 Example of K-means Consider the 48 density functions of half-hour demand of electricity on Monday in Adelaide, Australia. With K = 4 clusters, we have k Elements (time index) Calendar Hours 1 17 to 44 8:00 AM to 10:00 PM 2 15, 16, 45 to 48, 1, 2, 3 7:00 8:00 AM; 10:00 PM 1:30 AM 3 4, 5, 13, 14 1:30 2:30 AM; 6:00 7:00 AM 4 6 to 12 2:30 6:00 AM Result: capture daily activities, namely, (1) active period, (2) transition period, (3) light sleeping period, and (4) sound sleeping period. Ruey S. Tsay Big Dependent Data 40 / 72

41 Mondaydemand density megawatts Figure: Density functions of half-hour electricity demand on Monday at Adelaide, Australia. The sample period is from July 6, 1997 to March 31, Ruey S. Tsay Big Dependent Data 41 / 72

42 density Megawatts Figure: Results of K-means Cluster Analysis Based on Squared Hellinger Distance for Electricity Demands on Monday. Different colors denote different clusters. Ruey S. Tsay Big Dependent Data 42 / 72

43 Tree-based classification Let Z t = (z 1t,..., z pt ) denote p covariates. We use an iterative procedure to build a binary tree, starting with the root C 0 = {f t (x)}. 1 For each covariate z it, let z i (j) be the jth order statistic 1 Divide C 0 into two sub-clusters C i,j,1 = {f t (x) z it z i(j) }; C i,j,2 = {f t (x) z it > z i(j) } 2 Compute the sum of within-cluster variations H(i, j) = V (C i,j,1 ) + V (C i,j,2 ) 3 Find the smallest j, say v i, such that H(i, v i ) = min j {H(i, j)}. 2 Select i {1,..., p}, say I, such that H(I, v I ) = min i {H(i, v i )}. 3 Use covariate z It with threshold v I to grow two new leaves, i.e. C 1,1 = C I,vI,1, C 1,2 = C I,vI,2 Ruey S. Tsay Big Dependent Data 43 / 72

44 Tree-based procedure continued Next, consider C 1,1 and C 1,2 as the root of a branch and apply the same procedure with their associated covariates to find candidate for growth. The only modification is as follows: When considering C 1,1, we treat C 1,2 as a leaf in computing the sum of within-cluster variations. Similarly, when considering C 1,2 for further division, we treat C 1,1 as a leaf in computing the sum of within-cluster variations. This growth-procedure is iterated until the number of clusters K is reached. Ruey S. Tsay Big Dependent Data 44 / 72

45 Example of tree-based classification Consider the density functions of U.S. daily log stock returns in 2012 and Using the first-differenced VIX index as the explanatory variable and K = 4, we obtain 4 clusters as follows: (, 0.73], ( 0.73, 0.39], (0.39, 1, 19], (1.19, ). The cluster sizes are 104, 259, 86, and 53, respectively. Note that positive z t signifies an increase in market volatility (uncertainty). Ruey S. Tsay Big Dependent Data 45 / 72

46 What drove the U.S. financial market? The Fear Factor VIX days Change series of VIX diff(vix) days Figure: Time plots of the market fear factor (VIX index) and its change series: Ruey S. Tsay Big Dependent Data 46 / 72

47 dvix > >= dvix > 0.73 density log rtn density density log rtn 1.19 >= dvix > 0.39 dvix <= 0.73 density log rtn log rtn Figure: Results of Tree-based Cluster Analysis for the Daily Densities of Log Returns of the U.S. Stocks in 2012 and The first-differenced series of the VIX index is used as the explanatory variable. The numbers of element for the clusters are 53, 86, 259, and 104, respectively. The cluster classification is given in the heading of each plot. Ruey S. Tsay Big Dependent Data 47 / 72

48 Model-based classification Work directly on observed multiple time series 1 Postulate a general univariate model for all time series, e.g. an AR(p) model 2 Time series in a cluster follow the same model: Pooling data to estimate common parameters 3 Time series in different clusters follow different models 4 May be estimated by Markov chain Monte Carlo methods 5 May employ scaled-mixture of normal innovations to handle outliers Have been widely studied, e.g. Wang et al (2013) and Fruehwirth-Schnatter (2011), among others. Ruey S. Tsay Big Dependent Data 48 / 72

49 Application 1 Apply to monthly unemployment rates of 50 states of the U.S. 2 Use out-of-sample predictions to compare with other methods, including lasso. 3 For 1-step to 5-step ahead predictions, the model-based method works well in comparison. Wang et al (2013, JoF). Ruey S. Tsay Big Dependent Data 49 / 72

50 RMSE 10 4 MAE 10 4 Method m = 1 m = 2 m = 3 m = 4 m = 1 m = 2 m = 3 m UAR VAR Lasso Lasso G-Lasso LVAR Pls Pls Pls Pls Pls Pcr Pcr Pcr Pcr Pcr MBC rmbc Ruey S. Tsay Big Dependent Data 50 / 72

51 Functional PCA: Singular value decomposition 1 A tool to study the time evolution of the return distributions 2 Data set: In this particular instance, each density function is evaluated at 512 points and we have Y = [Y it = f t (x i ) i = 1,..., N; t = 1,..., T ] Perform singular value decomposition Ỹ = (N 1)UDV where Ỹ denotes column-mean adjusted data matrix, U is an N N unitary matrix, D is an N T rectangular diagonal matrix, and V is a T T unitary matrix. 4 This is a simple form of functional PCA. [Large samples, smoothing of PC is not needed.] Ruey S. Tsay Big Dependent Data 51 / 72

52 Scree plot Screeplot Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 Figure: Scree plot of PCA for daily return densities in 2012 and Ruey S. Tsay Big Dependent Data 52 / 72

53 The first 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The first 6 PC functions for daily log return densities in 2012 and Ruey S. Tsay Big Dependent Data 53 / 72

54 The next 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The 7th-12th PC functions for daily log return densities in 2012 and Ruey S. Tsay Big Dependent Data 54 / 72

55 Meaning of PC functions? 1st Mean density pm first PC pc lnreturn Figure: Mean density ± 1st PC: Peak and tails: mean+ standardized 1st PC (red). Ruey S. Tsay Big Dependent Data 55 / 72

56 Meaning of PC functions? 2nd Mean density pm 2nd PC pc lnrturn Figure: Mean density ± 2nd PC: Midrange returns Ruey S. Tsay Big Dependent Data 56 / 72

57 Meaning of PC functions? 3rd Mean density pm 3rd PC pc lnreturn Figure: Mean density ± 3rd PC: Curvature Ruey S. Tsay Big Dependent Data 57 / 72

58 Approximate factor models f t (x) = p λ t,i g i (x) + ɛ t (x), i=1 where g i (x) denotes the ith common factor and ɛ t (x) is the noise function. 1 A generalization of the orthogonal factor model, but allows the error functions to be correlated. 2 Only asymptotically identified under some regularity conditions. 3 FPCA provides a way to estimate approximate factor models. Ruey S. Tsay Big Dependent Data 58 / 72

59 Loadings of the first PC function Loadings dvix Figure: Scatter plot of loadings vs changes in VIX index. Red line denotes lowess fit Ruey S. Tsay Big Dependent Data 59 / 72

60 Functional PC via Thresholding 1 Zero appears to be a reasonable and natural threshold 2 Regime 1: dvix 0 with 244 days. [Volatile (bad) state] 3 Regime 2: dvix < 0 with 258 days. [Calm (good) state] 4 Perform PCA of density functions for each regime. 5 The differences are clearly seen. 6 Leads to different approximate factor models for the density functions Ruey S. Tsay Big Dependent Data 60 / 72

61 Scree plots dvix >= 0 Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 dvix < 0 Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 Figure: Scree plots of PCA for each regime Ruey S. Tsay Big Dependent Data 61 / 72

62 The first 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The first 6 PC functions for daily log return densities for each regime: red line is for the Calm state, Regime 2 Ruey S. Tsay Big Dependent Data 62 / 72

63 Approximate factor models 1 Use approximate factor models with the first 12 principal component functions 2 Compare overall fits with/without thresholding 3 For Regime 1 (positive dvix): randomly select day 17 4 For Regime 2 (negative dvix): randomly select day Check: (a) observed vs fits and (b) residuals of with/without thresholding 6 With 12 components, both approaches fair well, but thresholding provides improvements. Ruey S. Tsay Big Dependent Data 63 / 72

64 Comparison: day 17 (in Regime 1) density and its fits: day 17 density lnreturn Error in approximation: red (Thr) difference lnreturn Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot: all (black), Thr (red) Ruey S. Tsay Big Dependent Data 64 / 72

65 Comparison: day 420 (in Regime 2) density and its fits: day 420 density lnreturn Errors of approximation: day 420, red(thr) errors lnreturn Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot: all (black), Thr (red) Ruey S. Tsay Big Dependent Data 65 / 72

66 Lasso and beyond 1 Need to exploit parsimony, beyond sparsity 2 Need to take into account prior knowledge. We have accumulated lot of knowledge in diverse scientific areas. How to take advantages of this knowledge? 3 Variable selection is not sufficient. More importantly, what are the proper measurements to take? What questions can a given big data answer? Ruey S. Tsay Big Dependent Data 66 / 72

67 An illustration Every country has many interest series 1 have different maturities 2 serve different financial purposes 3 What is the information embedded in those interest rate series? Consider U.S. weekly constant maturity interest rates 1 From January 8, 1982 to October 30, Maturities: 3m, 6m, 1y, 2y, 3y, 5y, 7y, 10y, and 30y Ruey S. Tsay Big Dependent Data 67 / 72

68 Figure: Time plots of U.S. weekly interest rates with different maturities: 1/8/1982 to 10/30/2015. Ruey S. Tsay Big Dependent Data 68 / 72

69 p Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Figure: Screeplot of U.S. weekly interest rates. Ruey S. Tsay Big Dependent Data 69 / 72

70 Figure: Time plots of the first four principal components of U.S. weekly interest rates Ruey S. Tsay Big Dependent Data 70 / 72

71 Implication? In lasso-type of analysis, 1 should we use the interest rate series directly? Even with group lasso. This leads to sparsity. 2 should we apply PCA first, then use the PCs? This leads to parsimony. 3 should we develop other possibilities? Fused lasso? Factor models? Ruey S. Tsay Big Dependent Data 71 / 72

72 Concluding Remark 1 Big dependent data appear in many applications 2 Methods developed for independent big data may fail 3 Statistical methods for big dependent data are relatively under-developed 4 Some new challenges emerge, new opportunities exist 5 Simple modifications of the traditional methods might work well 6 Both theory and methods require further research Ruey S. Tsay Big Dependent Data 72 / 72

High-Dimensional Time Series Analysis

High-Dimensional Time Series Analysis Ruey S. Tsay Booth School of Business University of Chicago December 2015 Outline Analysis of high-dimensional time-series data (or dependent big data) Problem and