Analysis of Big Dependent Data in Economics and Finance

Size: px
Start display at page:

Download "Analysis of Big Dependent Data in Economics and Finance"

Transcription

1 Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72

2 Outline 1 Big data? Machine learning? Data science? What is in for economics and finance? 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off between simplicity and reality 5 Some methods useful for analyzing big dependent data in economics and finance 6 Examples 7 Concluding remarks Ruey S. Tsay Big Dependent Data 2 / 72

3 Big dependent data 1 Accurate information is the key to success in the competitive global economy. Information age. 2 What is big data? High dimension (many variables)? Large sample size? Both? 3 Not all big data sets are useful. Confounding & Noises 4 Need to develop methods to efficiently extract useful information from big data 5 Know the limitations of big data 6 Issues emerged from big data: privacy? ethical issues? 7 Focus on methods for analyzing big dependent data in economics and finance Ruey S. Tsay Big Dependent Data 3 / 72

4 What are available? Statistical methods: 1 Focus on sparsity (Simplicity) 2 Various penalized regressions, e.g. Lasso and its extensions 3 Various dimension reduction methods and models 4 Common framework used: Independent observations, with limited extensions to stationary data Real data are often dynamically dependent! Some useful concepts in analyzing big data: 1 Parsimony vs sparsity: Parsimony Sparsity 2 Simplicity vs reality: trade-off btw feasibility & sophistication Ruey S. Tsay Big Dependent Data 4 / 72

5 Parsimonious, not sparse A simple example y t = c + k βx it + ɛ t = c + β i=1 k x it + ɛ t, where k is large, x it are not perfectly correlated, and ɛ t are iid N(0, σ 2 ). The model has three parameters so it is parsimonious, but not sparse because y depends on all explanatory variables. In some applications, k i=1 x it is a close approximation to the first principal component. For example, the level of interest rates is important to an economy. Fused-Lasso can solve this difficulty in some situations. i=1 Ruey S. Tsay Big Dependent Data 5 / 72

6 What is LASSO regression? Model: (assume mean-adjusted) y i = p β j X j,i + ɛ i. j=1 Matrix form: X is the design matrix Y = Xβ + ɛ. Objective function: In particular, if p > T β(λ) = arg min β ( Y Xβ 2 2 /T + λ β 1), where λ 0 is a penalty parameter, β 1 = p j=1 β j, Y Xβ 2 2 = T i=1 (y i X i β)2 Ruey S. Tsay Big Dependent Data 6 / 72

7 What is the big deal? Sparsity Using convexity, LASSO is equivalent to β opt (R) = arg min Y Xβ 2 2 /T. β; β 1 R Old friend: Ridge regression β Ridge (λ) = arg min β ( Y Xβ 2 2 /T + λ β 2 2 ), or β(r) = arg min Y Xβ 2 β; β 2 2 R 2 /T. Special case: p = 2. Y Xβ 2 2 /T is quadratic. β 1 is a region of diamond shape, yet β 2 2 is a circle. Thus, LASSO leads to sparsity. Ruey S. Tsay Big Dependent Data 7 / 72

8 Computation and extensions 1 Optimization: Least angle regression (lars) by Efron et al. (2004) makes the computation very efficient. 2 Extensions: Group lasso: Yuan and Lin (2006). Subsets of X have specific meaning, e.g. treatment Elastic net: Zou and Hastie (2005). Using a combination of L 1 and L 2 penalties SCAD: Fan and Li (2001). Nonconcave penalized likelihood. [Smoothly clipped absolute deviation (SCAD).] Various Bayesian methods: penalty function is the prior. 3 Packages available in R: lars, glmnet, gamlr, gbm and many others. Ruey S. Tsay Big Dependent Data 8 / 72

9 A simulated example p = 300, T = 150, X iid N(0, 1), ɛ i iid N(0, 0.25). y i = x 3i +2(x 4i +x 5i +x 7i ) 2(x 11,i +x 12,i +x 13,i +x 21,i +x 22,i +x 30,i )+ɛ i 1 How? R demonstration 2 Selection of λ? Cross-validation (10-fold), measurement of prediction accuracy 3 The commands lars and cv.lars of the package lars 4 The commands glmnet and cv.glmnet of the package glmnet 5 Relationship between the two packages (alpha = 0) Ruey S. Tsay Big Dependent Data 9 / 72

10 Lasso may fail for dependent data 1 Data generating model: scalar Gaussian autoregressive, AR(3), model x t = 1.9x t 1 0.8x t 2 0.1x t 3 + a t, a t N(0, 1). Generate 2000 observations. See Figure 1. 2 Big data setup Dependent x t : t = 11,..., 2000 Regressors: X t = [x t 1, x t 2,..., x t 10, z 1t,..., z 10,t ], where z it are iid N(0, 1). Dimension = 20, sample size Run the Lasso regression via the lars package of R. See Figure 2 for results. Lag 3, x t 3 was not selected. Lasso fails in this case. Ruey S. Tsay Big Dependent Data 10 / 72

11 xt Time Figure: Time plot of simulated AR(3) time series with 2000 observations Ruey S. Tsay Big Dependent Data 11 / 72

12 LASSO Standardized Coefficients 2e+05 0e+00 2e+05 4e+05 * * ** * * * * * ** * * * * ***** * ** * * * * ** * * ** * * ** ****** ** * ** * * ** * **** * * * * * * * * * * *** * * * * * * * * * * * ** * * * beta /max beta Figure: Results of Lasso regression for the AR(3) series Ruey S. Tsay Big Dependent Data 12 / 72

13 OLS works if we entertain AR models Run the linear regression using the first three variables of X t. Fitted model x t = 1.902x t x t x t 3 + ɛ t, σ ɛ = All estimates are statistically significant with p-value less than The residuals are well behaved, e.g. Q(10) = with p-value 0.20 (after adjusting the df). Simple time series method works for dependent data. Ruey S. Tsay Big Dependent Data 13 / 72

14 Why does lasso fail? Two possibilities: 1 Scaling effect: Lasso standardizes each variable in X t. For unit-root non-stationary time series, standardization might wash out the dependence in the stationary part 2 Multicollinearity: Unit-root time series have strong serial correlations. [ACF approach 1 for all lags.] This artificial example highlights the difference between independent and dependent data. Need to develop methods for big dependent data! Ruey S. Tsay Big Dependent Data 14 / 72

15 Possible solutions 1 Re-parameterization using time series properties 2 Use different penalties for different parameters The first approach is easier. For the particular time series, we can define x t = (1 B)x t and 2 x t = (1 B) 2 x t. Then, x t = 1.9x t 1 0.8x t 2 0.1x t 3 + a t = x t 1 + x t x t 1 + a t = double + single + stationary + a t. The coefficients of x t 1, x t 1, 2 x t 1 are 1, 1, an 0.1, respectively. Ruey S. Tsay Big Dependent Data 15 / 72

16 Different frameworks for LASSO The X-matrix of conventional LASSO consists of (x t 1, x t 2,..., x t 10, z 1t,..., z 10,t ), where z it are iid N(0, 1). Under the re-parameterization, the X-matrix becomes (x t 1, x t 1, 2 x t 1,..., 2 x t 8, z 1t,..., z 10,t ). These two X-matrices provide theoretically the same information. However, the first one has high multicollinearity, but the 2nd one does not, especially after standardization. Ruey S. Tsay Big Dependent Data 16 / 72

17 β :20 β :20 β : β :22 β β : :22 Figure: Comparison of β-estimates of lars results Ruey S. Tsay Big Dependent Data 17 / 72

18 Theoretical justification Focus on the particular series x t used. Some properties of the series are 1 T 4 T t=1 x t 2 1 W 0 2, where W = 1 0 W (s)ds with W (s) the standard Brownian motion. 2 T 5/2 T t=1 x t 1 W 0 3 T 3 T t=1 x t x t 1 W 0 W 4 T 2 T t=1 ( x t) W 2 Standardization may wash out the x t 1 and 2 x t 1 parts. Ruey S. Tsay Big Dependent Data 18 / 72

19 Examples of big dependent data 1 Daily returns of U.S. stocks 2 Demand of electricity every 30-m intervals 3 Daily spreads of CDS (credit default swaps) of selected companies 4 Monthly unemployment rates of the 50 states of U.S. 5 Interest rates of an economy 6 Air pollution measurements of multiple locations and health risk. Complex spatio-temporal data in general. Ruey S. Tsay Big Dependent Data 19 / 72

20 N(stocks) days size Figure: Sample sizes of U.S. daily stock returns in 2012 and 2013: mean 6681, range = (6593,6774) Ruey S. Tsay Big Dependent Data 20 / 72

21 Time series plot Densities of 2012 Densities of 2013 density density lnreturn lnreturn Figure: Densities of daily log returns of U.S. stocks in 2012 and Ruey S. Tsay Big Dependent Data 21 / 72

22 Monday Tuesday Wednesday Thursday Friday Saturday Sunday demand demand demand demand demand demand demand Figure: Empirical densities of electricity demand, 30 minute intervals, from July 6, 1997 to March 31, Adelaide, Australia Ruey S. Tsay Big Dependent Data 22 / 72

23 State UNRATE: to urate year Figure: Time plots of monthly state unemployment rates of the U.S. from to Ruey S. Tsay Big Dependent Data 23 / 72

24 Some statistical methods Goal: Extract useful information, including pooling. 1 Classification and cluster analysis K means Tree-based classification Model-based classification 2 Factor models & Extensions Orthogonal factor model Approximate factor model Dynamic factor model Constrained factor models (column, row constraints) X t = Rf t C + e t 3 Generalizations of Lasso methods to dependent data, e.g. LASSO for nowcasting vs MIDAS Ruey S. Tsay Big Dependent Data 24 / 72

25 Constrained factor models Column (variable) constraint only: Tsai & Tsay (2010) Let z t be a k-dimensional time series z t = Hωf t + ɛ t, t = 1,..., T where H is a k r known matrix, f t is m-dimensional common factor, ω is r m unknown loading parameters. For observed data in matrix form Z = F ω H + ɛ Ruey S. Tsay Big Dependent Data 25 / 72

26 A simple illustration Monthly log returns of 10 stocks from 2001 to Semi-conductor: TXN, MU, INTC, TSM 2 Pharmaceutical: PFE, MRK, LLY 3 Investment bank: JPM, MS, GS The constraints H = [h 1, h 2, h 3 ], where h 1 = (1, 1, 1, 1, 0, 0, 0, 0, 0, 0) h 2 = (0, 0, 0, 0, 1, 1, 1, 0, 0, 0) h 3 = (0, 0, 0, 0, 0, 0, 0, 1, 1, 1) Ruey S. Tsay Big Dependent Data 26 / 72

27 Table: Estimation Results of Constrained and Orthogonal Factor Models Stock Constrained Model: L = H ω Orthogonal Model: PCA Tick L 1 L 2 L 3 Σ ɛ,i L 1 L 2 L 3 Σ ɛ,i TXN MU INTC TSM PFE MRK LLY JPM MS GS e.v Variability explained: 70.6% Variability explained: 72.4% Ruey S. Tsay Big Dependent Data 27 / 72

28 Both row and column constraints : Tsai, et al (2016) T observations and k variables. Data matrix form Z = F 1 ω 1 H + GF 2 ω 2 + GF 3ω 3 H + E, where G denotes a known T m row constraint matrix. Ruey S. Tsay Big Dependent Data 28 / 72

29 Ruey S. Tsay Big Dependent Data 29 / 72

30 New England West Noth Central West South Central Middle Atlantic East North Central South Atlantic moutain East South Central Pacific year year year Figure: Time plots of monthly housing starts (in logarithms) of 9 U.S. divisions: Ruey S. Tsay Big Dependent Data 30 / 72

31 F1[,1] F1[,2] F2[,1] F2[,2] F3[,1] F3[,2] Figure: Time series plots of common factors for a DCF model of order (r,p,q) = (2,2,2) via maximum likelihood estimation. Ruey S. Tsay Big Dependent Data 31 / 72

32 ts(gterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure: Time series plots for GF 2 ω 2 of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 32 / 72

33 ts(hterm) East North Central Pacific Mountain West South Central East South Central South Atlantic West North Central Middle Atlantic New England Index Figure: Time series plots for F 1 ω 1 H of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 33 / 72

34 ts(ghterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure: Time series plots for GF 3 ω 3 H of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 34 / 72

35 Matrix-valued variables Consider simultaneously n macroeconomic variables in k countries U.S. Italy Spain Canada GDP X 11,t X 12,t X 13,t X 1k,t Unem X 21,t X 22,t X 23,t X 2k,t CPI X 31,t X 32,t X 33,t X 3k,t... M1 X n1,t X n2,t X n3,t X nk,t On-going: only preliminary results are available. See Chen et al (2016) Ruey S. Tsay Big Dependent Data 35 / 72

36 Classification A possible approach: Use a two-step procedure 1 Transform dependent big data into functions, e.g. probability densities 2 Apply classification methods to functional data The density functions of daily log returns of U.S. stocks serve as an example. We can then classify the density functions to make statistical inference Ruey S. Tsay Big Dependent Data 36 / 72

37 Illustration of classification Cluster Analysis of density functions Consider the time series of density functions {f t (x)}. For simplicity, assume the densities are evaluated at equally-spaced grid point {x 1 < x 2 <... < x N } D with increment x. The data we have become {f t (x i ) t = 1,..., T ; i = 1,..., N}. Using Hellinger distance (HD), we consider two methods: K means Tree-based classification Ruey S. Tsay Big Dependent Data 37 / 72

38 Hellinger distance of two density functions Let f (x) and g(x) be two density functions on the common domain D R. Assume both density functions are absolutely continuous w.r.t. the Lebesgue measure. The Hellinger distance (HD) between f (x) and g(x) is defined as H(f, g) 2 = 1 ( ) 2 f (x) g(x) dx = 1 f (x)g(x)dx 2 D Basic properties: 1 H(f, g) 0 2 H(f, g) = 0 if and only if f (x) = g(x) almost surely. D Ruey S. Tsay Big Dependent Data 38 / 72

39 K-means method For a given K, the K-means method seeks partitions of the densities, say, C 1,..., C K, such that 1 K k=1 C k = {f t (x)} 2 C i Cj = for i j 3 Sum of within-cluster variation V = K k=1 V (C k) is minimized, where the within-cluster variation is V (C k ) = t 1,t 2 C k H(f t1, f t2 ) 2 It turns out this can easily be done by applying the K-means method with squared Euclidean distance to the squared-root densities { f t (x)}. Ruey S. Tsay Big Dependent Data 39 / 72

40 Example of K-means Consider the 48 density functions of half-hour demand of electricity on Monday in Adelaide, Australia. With K = 4 clusters, we have k Elements (time index) Calendar Hours 1 17 to 44 8:00 AM to 10:00 PM 2 15, 16, 45 to 48, 1, 2, 3 7:00 8:00 AM; 10:00 PM 1:30 AM 3 4, 5, 13, 14 1:30 2:30 AM; 6:00 7:00 AM 4 6 to 12 2:30 6:00 AM Result: capture daily activities, namely, (1) active period, (2) transition period, (3) light sleeping period, and (4) sound sleeping period. Ruey S. Tsay Big Dependent Data 40 / 72

41 Mondaydemand density megawatts Figure: Density functions of half-hour electricity demand on Monday at Adelaide, Australia. The sample period is from July 6, 1997 to March 31, Ruey S. Tsay Big Dependent Data 41 / 72

42 density Megawatts Figure: Results of K-means Cluster Analysis Based on Squared Hellinger Distance for Electricity Demands on Monday. Different colors denote different clusters. Ruey S. Tsay Big Dependent Data 42 / 72

43 Tree-based classification Let Z t = (z 1t,..., z pt ) denote p covariates. We use an iterative procedure to build a binary tree, starting with the root C 0 = {f t (x)}. 1 For each covariate z it, let z i (j) be the jth order statistic 1 Divide C 0 into two sub-clusters C i,j,1 = {f t (x) z it z i(j) }; C i,j,2 = {f t (x) z it > z i(j) } 2 Compute the sum of within-cluster variations H(i, j) = V (C i,j,1 ) + V (C i,j,2 ) 3 Find the smallest j, say v i, such that H(i, v i ) = min j {H(i, j)}. 2 Select i {1,..., p}, say I, such that H(I, v I ) = min i {H(i, v i )}. 3 Use covariate z It with threshold v I to grow two new leaves, i.e. C 1,1 = C I,vI,1, C 1,2 = C I,vI,2 Ruey S. Tsay Big Dependent Data 43 / 72

44 Tree-based procedure continued Next, consider C 1,1 and C 1,2 as the root of a branch and apply the same procedure with their associated covariates to find candidate for growth. The only modification is as follows: When considering C 1,1, we treat C 1,2 as a leaf in computing the sum of within-cluster variations. Similarly, when considering C 1,2 for further division, we treat C 1,1 as a leaf in computing the sum of within-cluster variations. This growth-procedure is iterated until the number of clusters K is reached. Ruey S. Tsay Big Dependent Data 44 / 72

45 Example of tree-based classification Consider the density functions of U.S. daily log stock returns in 2012 and Using the first-differenced VIX index as the explanatory variable and K = 4, we obtain 4 clusters as follows: (, 0.73], ( 0.73, 0.39], (0.39, 1, 19], (1.19, ). The cluster sizes are 104, 259, 86, and 53, respectively. Note that positive z t signifies an increase in market volatility (uncertainty). Ruey S. Tsay Big Dependent Data 45 / 72

46 What drove the U.S. financial market? The Fear Factor VIX days Change series of VIX diff(vix) days Figure: Time plots of the market fear factor (VIX index) and its change series: Ruey S. Tsay Big Dependent Data 46 / 72

47 dvix > >= dvix > 0.73 density log rtn density density log rtn 1.19 >= dvix > 0.39 dvix <= 0.73 density log rtn log rtn Figure: Results of Tree-based Cluster Analysis for the Daily Densities of Log Returns of the U.S. Stocks in 2012 and The first-differenced series of the VIX index is used as the explanatory variable. The numbers of element for the clusters are 53, 86, 259, and 104, respectively. The cluster classification is given in the heading of each plot. Ruey S. Tsay Big Dependent Data 47 / 72

48 Model-based classification Work directly on observed multiple time series 1 Postulate a general univariate model for all time series, e.g. an AR(p) model 2 Time series in a cluster follow the same model: Pooling data to estimate common parameters 3 Time series in different clusters follow different models 4 May be estimated by Markov chain Monte Carlo methods 5 May employ scaled-mixture of normal innovations to handle outliers Have been widely studied, e.g. Wang et al (2013) and Fruehwirth-Schnatter (2011), among others. Ruey S. Tsay Big Dependent Data 48 / 72

49 Application 1 Apply to monthly unemployment rates of 50 states of the U.S. 2 Use out-of-sample predictions to compare with other methods, including lasso. 3 For 1-step to 5-step ahead predictions, the model-based method works well in comparison. Wang et al (2013, JoF). Ruey S. Tsay Big Dependent Data 49 / 72

50 RMSE 10 4 MAE 10 4 Method m = 1 m = 2 m = 3 m = 4 m = 1 m = 2 m = 3 m UAR VAR Lasso Lasso G-Lasso LVAR Pls Pls Pls Pls Pls Pcr Pcr Pcr Pcr Pcr MBC rmbc Ruey S. Tsay Big Dependent Data 50 / 72

51 Functional PCA: Singular value decomposition 1 A tool to study the time evolution of the return distributions 2 Data set: In this particular instance, each density function is evaluated at 512 points and we have Y = [Y it = f t (x i ) i = 1,..., N; t = 1,..., T ] Perform singular value decomposition Ỹ = (N 1)UDV where Ỹ denotes column-mean adjusted data matrix, U is an N N unitary matrix, D is an N T rectangular diagonal matrix, and V is a T T unitary matrix. 4 This is a simple form of functional PCA. [Large samples, smoothing of PC is not needed.] Ruey S. Tsay Big Dependent Data 51 / 72

52 Scree plot Screeplot Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 Figure: Scree plot of PCA for daily return densities in 2012 and Ruey S. Tsay Big Dependent Data 52 / 72

53 The first 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The first 6 PC functions for daily log return densities in 2012 and Ruey S. Tsay Big Dependent Data 53 / 72

54 The next 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The 7th-12th PC functions for daily log return densities in 2012 and Ruey S. Tsay Big Dependent Data 54 / 72

55 Meaning of PC functions? 1st Mean density pm first PC pc lnreturn Figure: Mean density ± 1st PC: Peak and tails: mean+ standardized 1st PC (red). Ruey S. Tsay Big Dependent Data 55 / 72

56 Meaning of PC functions? 2nd Mean density pm 2nd PC pc lnrturn Figure: Mean density ± 2nd PC: Midrange returns Ruey S. Tsay Big Dependent Data 56 / 72

57 Meaning of PC functions? 3rd Mean density pm 3rd PC pc lnreturn Figure: Mean density ± 3rd PC: Curvature Ruey S. Tsay Big Dependent Data 57 / 72

58 Approximate factor models f t (x) = p λ t,i g i (x) + ɛ t (x), i=1 where g i (x) denotes the ith common factor and ɛ t (x) is the noise function. 1 A generalization of the orthogonal factor model, but allows the error functions to be correlated. 2 Only asymptotically identified under some regularity conditions. 3 FPCA provides a way to estimate approximate factor models. Ruey S. Tsay Big Dependent Data 58 / 72

59 Loadings of the first PC function Loadings dvix Figure: Scatter plot of loadings vs changes in VIX index. Red line denotes lowess fit Ruey S. Tsay Big Dependent Data 59 / 72

60 Functional PC via Thresholding 1 Zero appears to be a reasonable and natural threshold 2 Regime 1: dvix 0 with 244 days. [Volatile (bad) state] 3 Regime 2: dvix < 0 with 258 days. [Calm (good) state] 4 Perform PCA of density functions for each regime. 5 The differences are clearly seen. 6 Leads to different approximate factor models for the density functions Ruey S. Tsay Big Dependent Data 60 / 72

61 Scree plots dvix >= 0 Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 dvix < 0 Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 Figure: Scree plots of PCA for each regime Ruey S. Tsay Big Dependent Data 61 / 72

62 The first 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The first 6 PC functions for daily log return densities for each regime: red line is for the Calm state, Regime 2 Ruey S. Tsay Big Dependent Data 62 / 72

63 Approximate factor models 1 Use approximate factor models with the first 12 principal component functions 2 Compare overall fits with/without thresholding 3 For Regime 1 (positive dvix): randomly select day 17 4 For Regime 2 (negative dvix): randomly select day Check: (a) observed vs fits and (b) residuals of with/without thresholding 6 With 12 components, both approaches fair well, but thresholding provides improvements. Ruey S. Tsay Big Dependent Data 63 / 72

64 Comparison: day 17 (in Regime 1) density and its fits: day 17 density lnreturn Error in approximation: red (Thr) difference lnreturn Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot: all (black), Thr (red) Ruey S. Tsay Big Dependent Data 64 / 72

65 Comparison: day 420 (in Regime 2) density and its fits: day 420 density lnreturn Errors of approximation: day 420, red(thr) errors lnreturn Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot: all (black), Thr (red) Ruey S. Tsay Big Dependent Data 65 / 72

66 Lasso and beyond 1 Need to exploit parsimony, beyond sparsity 2 Need to take into account prior knowledge. We have accumulated lot of knowledge in diverse scientific areas. How to take advantages of this knowledge? 3 Variable selection is not sufficient. More importantly, what are the proper measurements to take? What questions can a given big data answer? Ruey S. Tsay Big Dependent Data 66 / 72

67 An illustration Every country has many interest series 1 have different maturities 2 serve different financial purposes 3 What is the information embedded in those interest rate series? Consider U.S. weekly constant maturity interest rates 1 From January 8, 1982 to October 30, Maturities: 3m, 6m, 1y, 2y, 3y, 5y, 7y, 10y, and 30y Ruey S. Tsay Big Dependent Data 67 / 72

68 Figure: Time plots of U.S. weekly interest rates with different maturities: 1/8/1982 to 10/30/2015. Ruey S. Tsay Big Dependent Data 68 / 72

69 p Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Figure: Screeplot of U.S. weekly interest rates. Ruey S. Tsay Big Dependent Data 69 / 72

70 Figure: Time plots of the first four principal components of U.S. weekly interest rates Ruey S. Tsay Big Dependent Data 70 / 72

71 Implication? In lasso-type of analysis, 1 should we use the interest rate series directly? Even with group lasso. This leads to sparsity. 2 should we apply PCA first, then use the PCs? This leads to parsimony. 3 should we develop other possibilities? Fused lasso? Factor models? Ruey S. Tsay Big Dependent Data 71 / 72

72 Concluding Remark 1 Big dependent data appear in many applications 2 Methods developed for independent big data may fail 3 Statistical methods for big dependent data are relatively under-developed 4 Some new challenges emerge, new opportunities exist 5 Simple modifications of the traditional methods might work well 6 Both theory and methods require further research Ruey S. Tsay Big Dependent Data 72 / 72

High-Dimensional Time Series Analysis

High-Dimensional Time Series Analysis High-Dimensional Time Series Analysis Ruey S. Tsay Booth School of Business University of Chicago December 2015 Outline Analysis of high-dimensional time-series data (or dependent big data) Problem and

More information

Bayesian Variable Selection for Nowcasting Time Series

Bayesian Variable Selection for Nowcasting Time Series Bayesian Variable Selection for Time Series Steve Scott Hal Varian Google August 14, 2013 What day of the week are there the most searches for [hangover]? 1. Sunday 2. Monday 3. Tuesday 4. Wednesday 5.

More information

Linear regression methods

Linear regression methods Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response

More information

The prediction of house price

The prediction of house price 000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050

More information

Estimating Global Bank Network Connectedness

Estimating Global Bank Network Connectedness Estimating Global Bank Network Connectedness Mert Demirer (MIT) Francis X. Diebold (Penn) Laura Liu (Penn) Kamil Yılmaz (Koç) September 22, 2016 1 / 27 Financial and Macroeconomic Connectedness Market

More information

Mining Big Data Using Parsimonious Factor and Shrinkage Methods

Mining Big Data Using Parsimonious Factor and Shrinkage Methods Mining Big Data Using Parsimonious Factor and Shrinkage Methods Hyun Hak Kim 1 and Norman Swanson 2 1 Bank of Korea and 2 Rutgers University ECB Workshop on using Big Data for Forecasting and Statistics

More information

Prediction & Feature Selection in GLM

Prediction & Feature Selection in GLM Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis

More information

Generalized Elastic Net Regression

Generalized Elastic Net Regression Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1

More information

Linear Methods for Regression. Lijun Zhang

Linear Methods for Regression. Lijun Zhang Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived

More information

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Econ 423 Lecture Notes: Additional Topics in Time Series 1 Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes

More information

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson

Bayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n

More information

Shrinkage Methods: Ridge and Lasso

Shrinkage Methods: Ridge and Lasso Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and

More information

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables

Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)

More information

STA414/2104 Statistical Methods for Machine Learning II

STA414/2104 Statistical Methods for Machine Learning II STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements

More information

Regression: Ordinary Least Squares

Regression: Ordinary Least Squares Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression

More information

Chapter 3. Linear Models for Regression

Chapter 3. Linear Models for Regression Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear

More information

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring

More information

Robust Testing and Variable Selection for High-Dimensional Time Series

Robust Testing and Variable Selection for High-Dimensional Time Series Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional

More information

ISyE 691 Data mining and analytics

ISyE 691 Data mining and analytics ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)

More information

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.

Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods. TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin

More information

Factor models. March 13, 2017

Factor models. March 13, 2017 Factor models March 13, 2017 Factor Models Macro economists have a peculiar data situation: Many data series, but usually short samples How can we utilize all this information without running into degrees

More information

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014 Warwick Business School Forecasting System Summary Ana Galvao, Anthony Garratt and James Mitchell November, 21 The main objective of the Warwick Business School Forecasting System is to provide competitive

More information

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1

Variable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1 Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr

More information

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent

More information

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin

Introduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin 1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)

More information

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual

More information

Identifying Financial Risk Factors

Identifying Financial Risk Factors Identifying Financial Risk Factors with a Low-Rank Sparse Decomposition Lisa Goldberg Alex Shkolnik Berkeley Columbia Meeting in Engineering and Statistics 24 March 2016 Outline 1 A Brief History of Factor

More information

Generalized Autoregressive Score Models

Generalized Autoregressive Score Models Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas To capture the dynamic behavior of univariate and multivariate time series processes, we can allow parameters to be

More information

High-dimensional Ordinary Least-squares Projection for Screening Variables

High-dimensional Ordinary Least-squares Projection for Screening Variables 1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor

More information

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina

Direct Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from

More information

Bayesian Compressed Vector Autoregressions

Bayesian Compressed Vector Autoregressions Bayesian Compressed Vector Autoregressions Gary Koop a, Dimitris Korobilis b, and Davide Pettenuzzo c a University of Strathclyde b University of Glasgow c Brandeis University 9th ECB Workshop on Forecasting

More information

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

Final Overview. Introduction to ML. Marek Petrik 4/25/2017 Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,

More information

VAR Models and Applications

VAR Models and Applications VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

COMS 4771 Regression. Nakul Verma

COMS 4771 Regression. Nakul Verma COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the

More information

R = µ + Bf Arbitrage Pricing Model, APM

R = µ + Bf Arbitrage Pricing Model, APM 4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.

More information

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman

Linear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical

More information

Doubly Constrained Factor Models with Applications. Summary

Doubly Constrained Factor Models with Applications. Summary Doubly Constrained Factor Models with Applications Henghsiu Tsai 1 Institute of Statistical Science, Academia Sinica, Taiwan, R.O.C. Ruey S. Tsay Booth School of Business, University of Chicago, Illinois,

More information

A Modern Look at Classical Multivariate Techniques

A Modern Look at Classical Multivariate Techniques A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico

More information

Least Squares Regression

Least Squares Regression E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute

More information

Fast Regularization Paths via Coordinate Descent

Fast Regularization Paths via Coordinate Descent August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor

More information

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation

Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana

More information

Multivariate Time Series: VAR(p) Processes and Models

Multivariate Time Series: VAR(p) Processes and Models Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with

More information

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of

More information

Regularization and Variable Selection via the Elastic Net

Regularization and Variable Selection via the Elastic Net p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction

More information

Least Squares Regression

Least Squares Regression CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the

More information

Vector Auto-Regressive Models

Vector Auto-Regressive Models Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions

More information

Nonconcave Penalized Likelihood with A Diverging Number of Parameters

Nonconcave Penalized Likelihood with A Diverging Number of Parameters Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized

More information

Machine Learning for Economists: Part 4 Shrinkage and Sparsity

Machine Learning for Economists: Part 4 Shrinkage and Sparsity Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors

More information

Machine Learning for OR & FE

Machine Learning for OR & FE Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com

More information

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty

Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1

More information

Stochastic Processes

Stochastic Processes Stochastic Processes Stochastic Process Non Formal Definition: Non formal: A stochastic process (random process) is the opposite of a deterministic process such as one defined by a differential equation.

More information

Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago. August 23, 2018

Zhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago. August 23, 2018 Supplementary Material for Structural-Factor Modeling of High-Dimensional Time Series: Another Look at Approximate Factor Models with Diverging Eigenvalues Zhaoxing Gao and Ruey S Tsay Booth School of

More information

ECE521 week 3: 23/26 January 2017

ECE521 week 3: 23/26 January 2017 ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear

More information

Permutation-invariant regularization of large covariance matrices. Liza Levina

Permutation-invariant regularization of large covariance matrices. Liza Levina Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work

More information

Or How to select variables Using Bayesian LASSO

Or How to select variables Using Bayesian LASSO Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection

More information

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications Yongmiao Hong Department of Economics & Department of Statistical Sciences Cornell University Spring 2019 Time and uncertainty

More information

L11: Pattern recognition principles

L11: Pattern recognition principles L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction

More information

Bayesian Grouped Horseshoe Regression with Application to Additive Models

Bayesian Grouped Horseshoe Regression with Application to Additive Models Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne

More information

A short introduction to INLA and R-INLA

A short introduction to INLA and R-INLA A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk

More information

Bi-level feature selection with applications to genetic association

Bi-level feature selection with applications to genetic association Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and

More information

Functional time series

Functional time series Rob J Hyndman Functional time series with applications in demography 4. Connections, extensions and applications Outline 1 Yield curves 2 Electricity prices 3 Dynamic updating with partially observed functions

More information

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning

Lecture 3: More on regularization. Bayesian vs maximum likelihood learning Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting

More information

25 : Graphical induced structured input/output models

25 : Graphical induced structured input/output models 10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph

More information

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013) A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical

More information

FaMIDAS: A Mixed Frequency Factor Model with MIDAS structure

FaMIDAS: A Mixed Frequency Factor Model with MIDAS structure FaMIDAS: A Mixed Frequency Factor Model with MIDAS structure Frale C., Monteforte L. Computational and Financial Econometrics Limassol, October 2009 Introduction After the recent financial and economic

More information

Parameterized Expectations Algorithm

Parameterized Expectations Algorithm Parameterized Expectations Algorithm Wouter J. Den Haan London School of Economics c by Wouter J. Den Haan Overview Two PEA algorithms Explaining stochastic simulations PEA Advantages and disadvantages

More information

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)

Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation

More information

MSA220/MVE440 Statistical Learning for Big Data

MSA220/MVE440 Statistical Learning for Big Data MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification

More information

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010

Chris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010 Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,

More information

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso PIER Exchange Nov. 17, 2016 Thammarak Moenjak What is machine learning? Wikipedia

More information

Consistent high-dimensional Bayesian variable selection via penalized credible regions

Consistent high-dimensional Bayesian variable selection via penalized credible regions Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable

More information

Linear Regression (9/11/13)

Linear Regression (9/11/13) STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter

More information

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM. 4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.

More information

Dynamic Matrix-Variate Graphical Models A Synopsis 1

Dynamic Matrix-Variate Graphical Models A Synopsis 1 Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke

More information

Lecture 2 Part 1 Optimization

Lecture 2 Part 1 Optimization Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss

More information

Vector autoregressions, VAR

Vector autoregressions, VAR 1 / 45 Vector autoregressions, VAR Chapter 2 Financial Econometrics Michael Hauser WS17/18 2 / 45 Content Cross-correlations VAR model in standard/reduced form Properties of VAR(1), VAR(p) Structural VAR,

More information

The Econometric Analysis of Mixed Frequency Data with Macro/Finance Applications

The Econometric Analysis of Mixed Frequency Data with Macro/Finance Applications The Econometric Analysis of Mixed Frequency Data with Macro/Finance Applications Instructor: Eric Ghysels Structure of Course It is easy to collect and store large data sets, particularly of financial

More information

Lecture 14: Variable Selection - Beyond LASSO

Lecture 14: Variable Selection - Beyond LASSO Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)

More information

Time Series Models for Measuring Market Risk

Time Series Models for Measuring Market Risk Time Series Models for Measuring Market Risk José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department June 28, 2007 1/ 32 Outline 1 Introduction 2 Competitive and collaborative

More information

Forecast comparison of principal component regression and principal covariate regression

Forecast comparison of principal component regression and principal covariate regression Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric

More information

Macroeconomic nowcasting with big data through the lens of a sparse factor model 1

Macroeconomic nowcasting with big data through the lens of a sparse factor model 1 Macroeconomic nowcasting with big data through the lens of a sparse factor model 1 Laurent Ferrara (Banque de France) Anna Simoni (CREST, CNRS, ENSAE, École Polytechnique ) ECB Forecasting Conference June

More information

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo

Computer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain

More information

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2015, Mr. Ruey S. Tsay

THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2015, Mr. Ruey S. Tsay THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2015, Mr. Ruey S. Tsay Lecture 8: Seasonal Model, Principal Component Analysis and Factor Models Reference: Chapter 6 of

More information

Regularization: Ridge Regression and the LASSO

Regularization: Ridge Regression and the LASSO Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression

More information

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA

MACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR

More information

Inference in VARs with Conditional Heteroskedasticity of Unknown Form

Inference in VARs with Conditional Heteroskedasticity of Unknown Form Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg

More information

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016

Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Find the maximum likelihood estimate of θ where θ is a parameter

More information

Regression Shrinkage and Selection via the Lasso

Regression Shrinkage and Selection via the Lasso Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,

More information

Research Division Federal Reserve Bank of St. Louis Working Paper Series

Research Division Federal Reserve Bank of St. Louis Working Paper Series Research Division Federal Reserve Bank of St Louis Working Paper Series Kalman Filtering with Truncated Normal State Variables for Bayesian Estimation of Macroeconomic Models Michael Dueker Working Paper

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu

More information

Pathwise coordinate optimization

Pathwise coordinate optimization Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,

More information

Probabilistic Graphical Models

Probabilistic Graphical Models School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04

More information

Short T Panels - Review

Short T Panels - Review Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of

More information

Cross-Validation with Confidence

Cross-Validation with Confidence Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation

More information

Linear Models for Regression CS534

Linear Models for Regression CS534 Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict

More information

Predicting bond returns using the output gap in expansions and recessions

Predicting bond returns using the output gap in expansions and recessions Erasmus university Rotterdam Erasmus school of economics Bachelor Thesis Quantitative finance Predicting bond returns using the output gap in expansions and recessions Author: Martijn Eertman Studentnumber:

More information

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2

MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized

More information