Analysis of Big Dependent Data in Economics and Finance
|
|
- Milo Stokes
- 5 years ago
- Views:
Transcription
1 Analysis of Big Dependent Data in Economics and Finance Ruey S. Tsay Booth Shool of Business, University of Chicago September 2016 Ruey S. Tsay Big Dependent Data 1 / 72
2 Outline 1 Big data? Machine learning? Data science? What is in for economics and finance? 2 Real-world data are often dynamically dependent 3 A simple example: Methods for independent data may fail 4 Trade-off between simplicity and reality 5 Some methods useful for analyzing big dependent data in economics and finance 6 Examples 7 Concluding remarks Ruey S. Tsay Big Dependent Data 2 / 72
3 Big dependent data 1 Accurate information is the key to success in the competitive global economy. Information age. 2 What is big data? High dimension (many variables)? Large sample size? Both? 3 Not all big data sets are useful. Confounding & Noises 4 Need to develop methods to efficiently extract useful information from big data 5 Know the limitations of big data 6 Issues emerged from big data: privacy? ethical issues? 7 Focus on methods for analyzing big dependent data in economics and finance Ruey S. Tsay Big Dependent Data 3 / 72
4 What are available? Statistical methods: 1 Focus on sparsity (Simplicity) 2 Various penalized regressions, e.g. Lasso and its extensions 3 Various dimension reduction methods and models 4 Common framework used: Independent observations, with limited extensions to stationary data Real data are often dynamically dependent! Some useful concepts in analyzing big data: 1 Parsimony vs sparsity: Parsimony Sparsity 2 Simplicity vs reality: trade-off btw feasibility & sophistication Ruey S. Tsay Big Dependent Data 4 / 72
5 Parsimonious, not sparse A simple example y t = c + k βx it + ɛ t = c + β i=1 k x it + ɛ t, where k is large, x it are not perfectly correlated, and ɛ t are iid N(0, σ 2 ). The model has three parameters so it is parsimonious, but not sparse because y depends on all explanatory variables. In some applications, k i=1 x it is a close approximation to the first principal component. For example, the level of interest rates is important to an economy. Fused-Lasso can solve this difficulty in some situations. i=1 Ruey S. Tsay Big Dependent Data 5 / 72
6 What is LASSO regression? Model: (assume mean-adjusted) y i = p β j X j,i + ɛ i. j=1 Matrix form: X is the design matrix Y = Xβ + ɛ. Objective function: In particular, if p > T β(λ) = arg min β ( Y Xβ 2 2 /T + λ β 1), where λ 0 is a penalty parameter, β 1 = p j=1 β j, Y Xβ 2 2 = T i=1 (y i X i β)2 Ruey S. Tsay Big Dependent Data 6 / 72
7 What is the big deal? Sparsity Using convexity, LASSO is equivalent to β opt (R) = arg min Y Xβ 2 2 /T. β; β 1 R Old friend: Ridge regression β Ridge (λ) = arg min β ( Y Xβ 2 2 /T + λ β 2 2 ), or β(r) = arg min Y Xβ 2 β; β 2 2 R 2 /T. Special case: p = 2. Y Xβ 2 2 /T is quadratic. β 1 is a region of diamond shape, yet β 2 2 is a circle. Thus, LASSO leads to sparsity. Ruey S. Tsay Big Dependent Data 7 / 72
8 Computation and extensions 1 Optimization: Least angle regression (lars) by Efron et al. (2004) makes the computation very efficient. 2 Extensions: Group lasso: Yuan and Lin (2006). Subsets of X have specific meaning, e.g. treatment Elastic net: Zou and Hastie (2005). Using a combination of L 1 and L 2 penalties SCAD: Fan and Li (2001). Nonconcave penalized likelihood. [Smoothly clipped absolute deviation (SCAD).] Various Bayesian methods: penalty function is the prior. 3 Packages available in R: lars, glmnet, gamlr, gbm and many others. Ruey S. Tsay Big Dependent Data 8 / 72
9 A simulated example p = 300, T = 150, X iid N(0, 1), ɛ i iid N(0, 0.25). y i = x 3i +2(x 4i +x 5i +x 7i ) 2(x 11,i +x 12,i +x 13,i +x 21,i +x 22,i +x 30,i )+ɛ i 1 How? R demonstration 2 Selection of λ? Cross-validation (10-fold), measurement of prediction accuracy 3 The commands lars and cv.lars of the package lars 4 The commands glmnet and cv.glmnet of the package glmnet 5 Relationship between the two packages (alpha = 0) Ruey S. Tsay Big Dependent Data 9 / 72
10 Lasso may fail for dependent data 1 Data generating model: scalar Gaussian autoregressive, AR(3), model x t = 1.9x t 1 0.8x t 2 0.1x t 3 + a t, a t N(0, 1). Generate 2000 observations. See Figure 1. 2 Big data setup Dependent x t : t = 11,..., 2000 Regressors: X t = [x t 1, x t 2,..., x t 10, z 1t,..., z 10,t ], where z it are iid N(0, 1). Dimension = 20, sample size Run the Lasso regression via the lars package of R. See Figure 2 for results. Lag 3, x t 3 was not selected. Lasso fails in this case. Ruey S. Tsay Big Dependent Data 10 / 72
11 xt Time Figure: Time plot of simulated AR(3) time series with 2000 observations Ruey S. Tsay Big Dependent Data 11 / 72
12 LASSO Standardized Coefficients 2e+05 0e+00 2e+05 4e+05 * * ** * * * * * ** * * * * ***** * ** * * * * ** * * ** * * ** ****** ** * ** * * ** * **** * * * * * * * * * * *** * * * * * * * * * * * ** * * * beta /max beta Figure: Results of Lasso regression for the AR(3) series Ruey S. Tsay Big Dependent Data 12 / 72
13 OLS works if we entertain AR models Run the linear regression using the first three variables of X t. Fitted model x t = 1.902x t x t x t 3 + ɛ t, σ ɛ = All estimates are statistically significant with p-value less than The residuals are well behaved, e.g. Q(10) = with p-value 0.20 (after adjusting the df). Simple time series method works for dependent data. Ruey S. Tsay Big Dependent Data 13 / 72
14 Why does lasso fail? Two possibilities: 1 Scaling effect: Lasso standardizes each variable in X t. For unit-root non-stationary time series, standardization might wash out the dependence in the stationary part 2 Multicollinearity: Unit-root time series have strong serial correlations. [ACF approach 1 for all lags.] This artificial example highlights the difference between independent and dependent data. Need to develop methods for big dependent data! Ruey S. Tsay Big Dependent Data 14 / 72
15 Possible solutions 1 Re-parameterization using time series properties 2 Use different penalties for different parameters The first approach is easier. For the particular time series, we can define x t = (1 B)x t and 2 x t = (1 B) 2 x t. Then, x t = 1.9x t 1 0.8x t 2 0.1x t 3 + a t = x t 1 + x t x t 1 + a t = double + single + stationary + a t. The coefficients of x t 1, x t 1, 2 x t 1 are 1, 1, an 0.1, respectively. Ruey S. Tsay Big Dependent Data 15 / 72
16 Different frameworks for LASSO The X-matrix of conventional LASSO consists of (x t 1, x t 2,..., x t 10, z 1t,..., z 10,t ), where z it are iid N(0, 1). Under the re-parameterization, the X-matrix becomes (x t 1, x t 1, 2 x t 1,..., 2 x t 8, z 1t,..., z 10,t ). These two X-matrices provide theoretically the same information. However, the first one has high multicollinearity, but the 2nd one does not, especially after standardization. Ruey S. Tsay Big Dependent Data 16 / 72
17 β :20 β :20 β : β :22 β β : :22 Figure: Comparison of β-estimates of lars results Ruey S. Tsay Big Dependent Data 17 / 72
18 Theoretical justification Focus on the particular series x t used. Some properties of the series are 1 T 4 T t=1 x t 2 1 W 0 2, where W = 1 0 W (s)ds with W (s) the standard Brownian motion. 2 T 5/2 T t=1 x t 1 W 0 3 T 3 T t=1 x t x t 1 W 0 W 4 T 2 T t=1 ( x t) W 2 Standardization may wash out the x t 1 and 2 x t 1 parts. Ruey S. Tsay Big Dependent Data 18 / 72
19 Examples of big dependent data 1 Daily returns of U.S. stocks 2 Demand of electricity every 30-m intervals 3 Daily spreads of CDS (credit default swaps) of selected companies 4 Monthly unemployment rates of the 50 states of U.S. 5 Interest rates of an economy 6 Air pollution measurements of multiple locations and health risk. Complex spatio-temporal data in general. Ruey S. Tsay Big Dependent Data 19 / 72
20 N(stocks) days size Figure: Sample sizes of U.S. daily stock returns in 2012 and 2013: mean 6681, range = (6593,6774) Ruey S. Tsay Big Dependent Data 20 / 72
21 Time series plot Densities of 2012 Densities of 2013 density density lnreturn lnreturn Figure: Densities of daily log returns of U.S. stocks in 2012 and Ruey S. Tsay Big Dependent Data 21 / 72
22 Monday Tuesday Wednesday Thursday Friday Saturday Sunday demand demand demand demand demand demand demand Figure: Empirical densities of electricity demand, 30 minute intervals, from July 6, 1997 to March 31, Adelaide, Australia Ruey S. Tsay Big Dependent Data 22 / 72
23 State UNRATE: to urate year Figure: Time plots of monthly state unemployment rates of the U.S. from to Ruey S. Tsay Big Dependent Data 23 / 72
24 Some statistical methods Goal: Extract useful information, including pooling. 1 Classification and cluster analysis K means Tree-based classification Model-based classification 2 Factor models & Extensions Orthogonal factor model Approximate factor model Dynamic factor model Constrained factor models (column, row constraints) X t = Rf t C + e t 3 Generalizations of Lasso methods to dependent data, e.g. LASSO for nowcasting vs MIDAS Ruey S. Tsay Big Dependent Data 24 / 72
25 Constrained factor models Column (variable) constraint only: Tsai & Tsay (2010) Let z t be a k-dimensional time series z t = Hωf t + ɛ t, t = 1,..., T where H is a k r known matrix, f t is m-dimensional common factor, ω is r m unknown loading parameters. For observed data in matrix form Z = F ω H + ɛ Ruey S. Tsay Big Dependent Data 25 / 72
26 A simple illustration Monthly log returns of 10 stocks from 2001 to Semi-conductor: TXN, MU, INTC, TSM 2 Pharmaceutical: PFE, MRK, LLY 3 Investment bank: JPM, MS, GS The constraints H = [h 1, h 2, h 3 ], where h 1 = (1, 1, 1, 1, 0, 0, 0, 0, 0, 0) h 2 = (0, 0, 0, 0, 1, 1, 1, 0, 0, 0) h 3 = (0, 0, 0, 0, 0, 0, 0, 1, 1, 1) Ruey S. Tsay Big Dependent Data 26 / 72
27 Table: Estimation Results of Constrained and Orthogonal Factor Models Stock Constrained Model: L = H ω Orthogonal Model: PCA Tick L 1 L 2 L 3 Σ ɛ,i L 1 L 2 L 3 Σ ɛ,i TXN MU INTC TSM PFE MRK LLY JPM MS GS e.v Variability explained: 70.6% Variability explained: 72.4% Ruey S. Tsay Big Dependent Data 27 / 72
28 Both row and column constraints : Tsai, et al (2016) T observations and k variables. Data matrix form Z = F 1 ω 1 H + GF 2 ω 2 + GF 3ω 3 H + E, where G denotes a known T m row constraint matrix. Ruey S. Tsay Big Dependent Data 28 / 72
29 Ruey S. Tsay Big Dependent Data 29 / 72
30 New England West Noth Central West South Central Middle Atlantic East North Central South Atlantic moutain East South Central Pacific year year year Figure: Time plots of monthly housing starts (in logarithms) of 9 U.S. divisions: Ruey S. Tsay Big Dependent Data 30 / 72
31 F1[,1] F1[,2] F2[,1] F2[,2] F3[,1] F3[,2] Figure: Time series plots of common factors for a DCF model of order (r,p,q) = (2,2,2) via maximum likelihood estimation. Ruey S. Tsay Big Dependent Data 31 / 72
32 ts(gterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure: Time series plots for GF 2 ω 2 of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 32 / 72
33 ts(hterm) East North Central Pacific Mountain West South Central East South Central South Atlantic West North Central Middle Atlantic New England Index Figure: Time series plots for F 1 ω 1 H of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 33 / 72
34 ts(ghterm) Pacific Mountain West South Central East South Central South Atlantic West North Central East North Central Middle Atlantic New England Index Figure: Time series plots for GF 3 ω 3 H of a fitted DCF model of order (2,2,2). Maximum likelihood Ruey S. estimation Tsay Bigis Dependent used. Data 34 / 72
35 Matrix-valued variables Consider simultaneously n macroeconomic variables in k countries U.S. Italy Spain Canada GDP X 11,t X 12,t X 13,t X 1k,t Unem X 21,t X 22,t X 23,t X 2k,t CPI X 31,t X 32,t X 33,t X 3k,t... M1 X n1,t X n2,t X n3,t X nk,t On-going: only preliminary results are available. See Chen et al (2016) Ruey S. Tsay Big Dependent Data 35 / 72
36 Classification A possible approach: Use a two-step procedure 1 Transform dependent big data into functions, e.g. probability densities 2 Apply classification methods to functional data The density functions of daily log returns of U.S. stocks serve as an example. We can then classify the density functions to make statistical inference Ruey S. Tsay Big Dependent Data 36 / 72
37 Illustration of classification Cluster Analysis of density functions Consider the time series of density functions {f t (x)}. For simplicity, assume the densities are evaluated at equally-spaced grid point {x 1 < x 2 <... < x N } D with increment x. The data we have become {f t (x i ) t = 1,..., T ; i = 1,..., N}. Using Hellinger distance (HD), we consider two methods: K means Tree-based classification Ruey S. Tsay Big Dependent Data 37 / 72
38 Hellinger distance of two density functions Let f (x) and g(x) be two density functions on the common domain D R. Assume both density functions are absolutely continuous w.r.t. the Lebesgue measure. The Hellinger distance (HD) between f (x) and g(x) is defined as H(f, g) 2 = 1 ( ) 2 f (x) g(x) dx = 1 f (x)g(x)dx 2 D Basic properties: 1 H(f, g) 0 2 H(f, g) = 0 if and only if f (x) = g(x) almost surely. D Ruey S. Tsay Big Dependent Data 38 / 72
39 K-means method For a given K, the K-means method seeks partitions of the densities, say, C 1,..., C K, such that 1 K k=1 C k = {f t (x)} 2 C i Cj = for i j 3 Sum of within-cluster variation V = K k=1 V (C k) is minimized, where the within-cluster variation is V (C k ) = t 1,t 2 C k H(f t1, f t2 ) 2 It turns out this can easily be done by applying the K-means method with squared Euclidean distance to the squared-root densities { f t (x)}. Ruey S. Tsay Big Dependent Data 39 / 72
40 Example of K-means Consider the 48 density functions of half-hour demand of electricity on Monday in Adelaide, Australia. With K = 4 clusters, we have k Elements (time index) Calendar Hours 1 17 to 44 8:00 AM to 10:00 PM 2 15, 16, 45 to 48, 1, 2, 3 7:00 8:00 AM; 10:00 PM 1:30 AM 3 4, 5, 13, 14 1:30 2:30 AM; 6:00 7:00 AM 4 6 to 12 2:30 6:00 AM Result: capture daily activities, namely, (1) active period, (2) transition period, (3) light sleeping period, and (4) sound sleeping period. Ruey S. Tsay Big Dependent Data 40 / 72
41 Mondaydemand density megawatts Figure: Density functions of half-hour electricity demand on Monday at Adelaide, Australia. The sample period is from July 6, 1997 to March 31, Ruey S. Tsay Big Dependent Data 41 / 72
42 density Megawatts Figure: Results of K-means Cluster Analysis Based on Squared Hellinger Distance for Electricity Demands on Monday. Different colors denote different clusters. Ruey S. Tsay Big Dependent Data 42 / 72
43 Tree-based classification Let Z t = (z 1t,..., z pt ) denote p covariates. We use an iterative procedure to build a binary tree, starting with the root C 0 = {f t (x)}. 1 For each covariate z it, let z i (j) be the jth order statistic 1 Divide C 0 into two sub-clusters C i,j,1 = {f t (x) z it z i(j) }; C i,j,2 = {f t (x) z it > z i(j) } 2 Compute the sum of within-cluster variations H(i, j) = V (C i,j,1 ) + V (C i,j,2 ) 3 Find the smallest j, say v i, such that H(i, v i ) = min j {H(i, j)}. 2 Select i {1,..., p}, say I, such that H(I, v I ) = min i {H(i, v i )}. 3 Use covariate z It with threshold v I to grow two new leaves, i.e. C 1,1 = C I,vI,1, C 1,2 = C I,vI,2 Ruey S. Tsay Big Dependent Data 43 / 72
44 Tree-based procedure continued Next, consider C 1,1 and C 1,2 as the root of a branch and apply the same procedure with their associated covariates to find candidate for growth. The only modification is as follows: When considering C 1,1, we treat C 1,2 as a leaf in computing the sum of within-cluster variations. Similarly, when considering C 1,2 for further division, we treat C 1,1 as a leaf in computing the sum of within-cluster variations. This growth-procedure is iterated until the number of clusters K is reached. Ruey S. Tsay Big Dependent Data 44 / 72
45 Example of tree-based classification Consider the density functions of U.S. daily log stock returns in 2012 and Using the first-differenced VIX index as the explanatory variable and K = 4, we obtain 4 clusters as follows: (, 0.73], ( 0.73, 0.39], (0.39, 1, 19], (1.19, ). The cluster sizes are 104, 259, 86, and 53, respectively. Note that positive z t signifies an increase in market volatility (uncertainty). Ruey S. Tsay Big Dependent Data 45 / 72
46 What drove the U.S. financial market? The Fear Factor VIX days Change series of VIX diff(vix) days Figure: Time plots of the market fear factor (VIX index) and its change series: Ruey S. Tsay Big Dependent Data 46 / 72
47 dvix > >= dvix > 0.73 density log rtn density density log rtn 1.19 >= dvix > 0.39 dvix <= 0.73 density log rtn log rtn Figure: Results of Tree-based Cluster Analysis for the Daily Densities of Log Returns of the U.S. Stocks in 2012 and The first-differenced series of the VIX index is used as the explanatory variable. The numbers of element for the clusters are 53, 86, 259, and 104, respectively. The cluster classification is given in the heading of each plot. Ruey S. Tsay Big Dependent Data 47 / 72
48 Model-based classification Work directly on observed multiple time series 1 Postulate a general univariate model for all time series, e.g. an AR(p) model 2 Time series in a cluster follow the same model: Pooling data to estimate common parameters 3 Time series in different clusters follow different models 4 May be estimated by Markov chain Monte Carlo methods 5 May employ scaled-mixture of normal innovations to handle outliers Have been widely studied, e.g. Wang et al (2013) and Fruehwirth-Schnatter (2011), among others. Ruey S. Tsay Big Dependent Data 48 / 72
49 Application 1 Apply to monthly unemployment rates of 50 states of the U.S. 2 Use out-of-sample predictions to compare with other methods, including lasso. 3 For 1-step to 5-step ahead predictions, the model-based method works well in comparison. Wang et al (2013, JoF). Ruey S. Tsay Big Dependent Data 49 / 72
50 RMSE 10 4 MAE 10 4 Method m = 1 m = 2 m = 3 m = 4 m = 1 m = 2 m = 3 m UAR VAR Lasso Lasso G-Lasso LVAR Pls Pls Pls Pls Pls Pcr Pcr Pcr Pcr Pcr MBC rmbc Ruey S. Tsay Big Dependent Data 50 / 72
51 Functional PCA: Singular value decomposition 1 A tool to study the time evolution of the return distributions 2 Data set: In this particular instance, each density function is evaluated at 512 points and we have Y = [Y it = f t (x i ) i = 1,..., N; t = 1,..., T ] Perform singular value decomposition Ỹ = (N 1)UDV where Ỹ denotes column-mean adjusted data matrix, U is an N N unitary matrix, D is an N T rectangular diagonal matrix, and V is a T T unitary matrix. 4 This is a simple form of functional PCA. [Large samples, smoothing of PC is not needed.] Ruey S. Tsay Big Dependent Data 51 / 72
52 Scree plot Screeplot Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 Figure: Scree plot of PCA for daily return densities in 2012 and Ruey S. Tsay Big Dependent Data 52 / 72
53 The first 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The first 6 PC functions for daily log return densities in 2012 and Ruey S. Tsay Big Dependent Data 53 / 72
54 The next 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The 7th-12th PC functions for daily log return densities in 2012 and Ruey S. Tsay Big Dependent Data 54 / 72
55 Meaning of PC functions? 1st Mean density pm first PC pc lnreturn Figure: Mean density ± 1st PC: Peak and tails: mean+ standardized 1st PC (red). Ruey S. Tsay Big Dependent Data 55 / 72
56 Meaning of PC functions? 2nd Mean density pm 2nd PC pc lnrturn Figure: Mean density ± 2nd PC: Midrange returns Ruey S. Tsay Big Dependent Data 56 / 72
57 Meaning of PC functions? 3rd Mean density pm 3rd PC pc lnreturn Figure: Mean density ± 3rd PC: Curvature Ruey S. Tsay Big Dependent Data 57 / 72
58 Approximate factor models f t (x) = p λ t,i g i (x) + ɛ t (x), i=1 where g i (x) denotes the ith common factor and ɛ t (x) is the noise function. 1 A generalization of the orthogonal factor model, but allows the error functions to be correlated. 2 Only asymptotically identified under some regularity conditions. 3 FPCA provides a way to estimate approximate factor models. Ruey S. Tsay Big Dependent Data 58 / 72
59 Loadings of the first PC function Loadings dvix Figure: Scatter plot of loadings vs changes in VIX index. Red line denotes lowess fit Ruey S. Tsay Big Dependent Data 59 / 72
60 Functional PC via Thresholding 1 Zero appears to be a reasonable and natural threshold 2 Regime 1: dvix 0 with 244 days. [Volatile (bad) state] 3 Regime 2: dvix < 0 with 258 days. [Calm (good) state] 4 Perform PCA of density functions for each regime. 5 The differences are clearly seen. 6 Leads to different approximate factor models for the density functions Ruey S. Tsay Big Dependent Data 60 / 72
61 Scree plots dvix >= 0 Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 dvix < 0 Variances Comp.1 Comp.3 Comp.5 Comp.7 Comp.9 Figure: Scree plots of PCA for each regime Ruey S. Tsay Big Dependent Data 61 / 72
62 The first 6 PC functions pc lnreturn pc lnreturn pc lnreturn pc lnreturn pc pc lnreturn lnreturn Figure: The first 6 PC functions for daily log return densities for each regime: red line is for the Calm state, Regime 2 Ruey S. Tsay Big Dependent Data 62 / 72
63 Approximate factor models 1 Use approximate factor models with the first 12 principal component functions 2 Compare overall fits with/without thresholding 3 For Regime 1 (positive dvix): randomly select day 17 4 For Regime 2 (negative dvix): randomly select day Check: (a) observed vs fits and (b) residuals of with/without thresholding 6 With 12 components, both approaches fair well, but thresholding provides improvements. Ruey S. Tsay Big Dependent Data 63 / 72
64 Comparison: day 17 (in Regime 1) density and its fits: day 17 density lnreturn Error in approximation: red (Thr) difference lnreturn Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot: all (black), Thr (red) Ruey S. Tsay Big Dependent Data 64 / 72
65 Comparison: day 420 (in Regime 2) density and its fits: day 420 density lnreturn Errors of approximation: day 420, red(thr) errors lnreturn Figure: Top plot: observed (black), all (red), Thr (blue). Bottom plot: all (black), Thr (red) Ruey S. Tsay Big Dependent Data 65 / 72
66 Lasso and beyond 1 Need to exploit parsimony, beyond sparsity 2 Need to take into account prior knowledge. We have accumulated lot of knowledge in diverse scientific areas. How to take advantages of this knowledge? 3 Variable selection is not sufficient. More importantly, what are the proper measurements to take? What questions can a given big data answer? Ruey S. Tsay Big Dependent Data 66 / 72
67 An illustration Every country has many interest series 1 have different maturities 2 serve different financial purposes 3 What is the information embedded in those interest rate series? Consider U.S. weekly constant maturity interest rates 1 From January 8, 1982 to October 30, Maturities: 3m, 6m, 1y, 2y, 3y, 5y, 7y, 10y, and 30y Ruey S. Tsay Big Dependent Data 67 / 72
68 Figure: Time plots of U.S. weekly interest rates with different maturities: 1/8/1982 to 10/30/2015. Ruey S. Tsay Big Dependent Data 68 / 72
69 p Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Figure: Screeplot of U.S. weekly interest rates. Ruey S. Tsay Big Dependent Data 69 / 72
70 Figure: Time plots of the first four principal components of U.S. weekly interest rates Ruey S. Tsay Big Dependent Data 70 / 72
71 Implication? In lasso-type of analysis, 1 should we use the interest rate series directly? Even with group lasso. This leads to sparsity. 2 should we apply PCA first, then use the PCs? This leads to parsimony. 3 should we develop other possibilities? Fused lasso? Factor models? Ruey S. Tsay Big Dependent Data 71 / 72
72 Concluding Remark 1 Big dependent data appear in many applications 2 Methods developed for independent big data may fail 3 Statistical methods for big dependent data are relatively under-developed 4 Some new challenges emerge, new opportunities exist 5 Simple modifications of the traditional methods might work well 6 Both theory and methods require further research Ruey S. Tsay Big Dependent Data 72 / 72
High-Dimensional Time Series Analysis
High-Dimensional Time Series Analysis Ruey S. Tsay Booth School of Business University of Chicago December 2015 Outline Analysis of high-dimensional time-series data (or dependent big data) Problem and
More informationBayesian Variable Selection for Nowcasting Time Series
Bayesian Variable Selection for Time Series Steve Scott Hal Varian Google August 14, 2013 What day of the week are there the most searches for [hangover]? 1. Sunday 2. Monday 3. Tuesday 4. Wednesday 5.
More informationLinear regression methods
Linear regression methods Most of our intuition about statistical methods stem from linear regression. For observations i = 1,..., n, the model is Y i = p X ij β j + ε i, j=1 where Y i is the response
More informationThe prediction of house price
000 001 002 003 004 005 006 007 008 009 010 011 012 013 014 015 016 017 018 019 020 021 022 023 024 025 026 027 028 029 030 031 032 033 034 035 036 037 038 039 040 041 042 043 044 045 046 047 048 049 050
More informationEstimating Global Bank Network Connectedness
Estimating Global Bank Network Connectedness Mert Demirer (MIT) Francis X. Diebold (Penn) Laura Liu (Penn) Kamil Yılmaz (Koç) September 22, 2016 1 / 27 Financial and Macroeconomic Connectedness Market
More informationMining Big Data Using Parsimonious Factor and Shrinkage Methods
Mining Big Data Using Parsimonious Factor and Shrinkage Methods Hyun Hak Kim 1 and Norman Swanson 2 1 Bank of Korea and 2 Rutgers University ECB Workshop on using Big Data for Forecasting and Statistics
More informationPrediction & Feature Selection in GLM
Tarigan Statistical Consulting & Coaching statistical-coaching.ch Doctoral Program in Computer Science of the Universities of Fribourg, Geneva, Lausanne, Neuchâtel, Bern and the EPFL Hands-on Data Analysis
More informationGeneralized Elastic Net Regression
Abstract Generalized Elastic Net Regression Geoffroy MOURET Jean-Jules BRAULT Vahid PARTOVINIA This work presents a variation of the elastic net penalization method. We propose applying a combined l 1
More informationLinear Methods for Regression. Lijun Zhang
Linear Methods for Regression Lijun Zhang zlj@nju.edu.cn http://cs.nju.edu.cn/zlj Outline Introduction Linear Regression Models and Least Squares Subset Selection Shrinkage Methods Methods Using Derived
More informationEcon 423 Lecture Notes: Additional Topics in Time Series 1
Econ 423 Lecture Notes: Additional Topics in Time Series 1 John C. Chao April 25, 2017 1 These notes are based in large part on Chapter 16 of Stock and Watson (2011). They are for instructional purposes
More informationBayesian variable selection via. Penalized credible regions. Brian Reich, NCSU. Joint work with. Howard Bondell and Ander Wilson
Bayesian variable selection via penalized credible regions Brian Reich, NC State Joint work with Howard Bondell and Ander Wilson Brian Reich, NCSU Penalized credible regions 1 Motivation big p, small n
More informationShrinkage Methods: Ridge and Lasso
Shrinkage Methods: Ridge and Lasso Jonathan Hersh 1 Chapman University, Argyros School of Business hersh@chapman.edu February 27, 2019 J.Hersh (Chapman) Ridge & Lasso February 27, 2019 1 / 43 1 Intro and
More informationSmoothly Clipped Absolute Deviation (SCAD) for Correlated Variables
Smoothly Clipped Absolute Deviation (SCAD) for Correlated Variables LIB-MA, FSSM Cadi Ayyad University (Morocco) COMPSTAT 2010 Paris, August 22-27, 2010 Motivations Fan and Li (2001), Zou and Li (2008)
More informationSTA414/2104 Statistical Methods for Machine Learning II
STA414/2104 Statistical Methods for Machine Learning II Murat A. Erdogdu & David Duvenaud Department of Computer Science Department of Statistical Sciences Lecture 3 Slide credits: Russ Salakhutdinov Announcements
More informationRegression: Ordinary Least Squares
Regression: Ordinary Least Squares Mark Hendricks Autumn 2017 FINM Intro: Regression Outline Regression OLS Mathematics Linear Projection Hendricks, Autumn 2017 FINM Intro: Regression: Lecture 2/32 Regression
More informationChapter 3. Linear Models for Regression
Chapter 3. Linear Models for Regression Wei Pan Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN 55455 Email: weip@biostat.umn.edu PubH 7475/8475 c Wei Pan Linear
More informationECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference
ECE 18-898G: Special Topics in Signal Processing: Sparsity, Structure, and Inference Sparse Recovery using L1 minimization - algorithms Yuejie Chi Department of Electrical and Computer Engineering Spring
More informationRobust Testing and Variable Selection for High-Dimensional Time Series
Robust Testing and Variable Selection for High-Dimensional Time Series Ruey S. Tsay Booth School of Business, University of Chicago May, 2017 Ruey S. Tsay HTS 1 / 36 Outline 1 Focus on high-dimensional
More informationISyE 691 Data mining and analytics
ISyE 691 Data mining and analytics Regression Instructor: Prof. Kaibo Liu Department of Industrial and Systems Engineering UW-Madison Email: kliu8@wisc.edu Office: Room 3017 (Mechanical Engineering Building)
More informationData Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods.
TheThalesians Itiseasyforphilosopherstoberichiftheychoose Data Analysis and Machine Learning Lecture 12: Multicollinearity, Bias-Variance Trade-off, Cross-validation and Shrinkage Methods Ivan Zhdankin
More informationFactor models. March 13, 2017
Factor models March 13, 2017 Factor Models Macro economists have a peculiar data situation: Many data series, but usually short samples How can we utilize all this information without running into degrees
More informationWarwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014
Warwick Business School Forecasting System Summary Ana Galvao, Anthony Garratt and James Mitchell November, 21 The main objective of the Warwick Business School Forecasting System is to provide competitive
More informationVariable Selection in Restricted Linear Regression Models. Y. Tuaç 1 and O. Arslan 1
Variable Selection in Restricted Linear Regression Models Y. Tuaç 1 and O. Arslan 1 Ankara University, Faculty of Science, Department of Statistics, 06100 Ankara/Turkey ytuac@ankara.edu.tr, oarslan@ankara.edu.tr
More informationA Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression
A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression Noah Simon Jerome Friedman Trevor Hastie November 5, 013 Abstract In this paper we purpose a blockwise descent
More informationIntroduction to Machine Learning. PCA and Spectral Clustering. Introduction to Machine Learning, Slides: Eran Halperin
1 Introduction to Machine Learning PCA and Spectral Clustering Introduction to Machine Learning, 2013-14 Slides: Eran Halperin Singular Value Decomposition (SVD) The singular value decomposition (SVD)
More informationNon-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets
Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets Nan Zhou, Wen Cheng, Ph.D. Associate, Quantitative Research, J.P. Morgan nan.zhou@jpmorgan.com The 4th Annual
More informationIdentifying Financial Risk Factors
Identifying Financial Risk Factors with a Low-Rank Sparse Decomposition Lisa Goldberg Alex Shkolnik Berkeley Columbia Meeting in Engineering and Statistics 24 March 2016 Outline 1 A Brief History of Factor
More informationGeneralized Autoregressive Score Models
Generalized Autoregressive Score Models by: Drew Creal, Siem Jan Koopman, André Lucas To capture the dynamic behavior of univariate and multivariate time series processes, we can allow parameters to be
More informationHigh-dimensional Ordinary Least-squares Projection for Screening Variables
1 / 38 High-dimensional Ordinary Least-squares Projection for Screening Variables Chenlei Leng Joint with Xiangyu Wang (Duke) Conference on Nonparametric Statistics for Big Data and Celebration to Honor
More informationDirect Learning: Linear Regression. Donglin Zeng, Department of Biostatistics, University of North Carolina
Direct Learning: Linear Regression Parametric learning We consider the core function in the prediction rule to be a parametric function. The most commonly used function is a linear function: squared loss:
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 9-10 - High-dimensional regression Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Recap from
More informationBayesian Compressed Vector Autoregressions
Bayesian Compressed Vector Autoregressions Gary Koop a, Dimitris Korobilis b, and Davide Pettenuzzo c a University of Strathclyde b University of Glasgow c Brandeis University 9th ECB Workshop on Forecasting
More informationFinal Overview. Introduction to ML. Marek Petrik 4/25/2017
Final Overview Introduction to ML Marek Petrik 4/25/2017 This Course: Introduction to Machine Learning Build a foundation for practice and research in ML Basic machine learning concepts: max likelihood,
More informationVAR Models and Applications
VAR Models and Applications Laurent Ferrara 1 1 University of Paris West M2 EIPMC Oct. 2016 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions
More informationCOMS 4771 Regression. Nakul Verma
COMS 4771 Regression Nakul Verma Last time Support Vector Machines Maximum Margin formulation Constrained Optimization Lagrange Duality Theory Convex Optimization SVM dual and Interpretation How get the
More informationR = µ + Bf Arbitrage Pricing Model, APM
4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.
More informationLinear Regression Models. Based on Chapter 3 of Hastie, Tibshirani and Friedman
Linear Regression Models Based on Chapter 3 of Hastie, ibshirani and Friedman Linear Regression Models Here the X s might be: p f ( X = " + " 0 j= 1 X j Raw predictor variables (continuous or coded-categorical
More informationDoubly Constrained Factor Models with Applications. Summary
Doubly Constrained Factor Models with Applications Henghsiu Tsai 1 Institute of Statistical Science, Academia Sinica, Taiwan, R.O.C. Ruey S. Tsay Booth School of Business, University of Chicago, Illinois,
More informationA Modern Look at Classical Multivariate Techniques
A Modern Look at Classical Multivariate Techniques Yoonkyung Lee Department of Statistics The Ohio State University March 16-20, 2015 The 13th School of Probability and Statistics CIMAT, Guanajuato, Mexico
More informationLeast Squares Regression
E0 70 Machine Learning Lecture 4 Jan 7, 03) Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in the lecture. They are not a substitute
More informationFast Regularization Paths via Coordinate Descent
August 2008 Trevor Hastie, Stanford Statistics 1 Fast Regularization Paths via Coordinate Descent Trevor Hastie Stanford University joint work with Jerry Friedman and Rob Tibshirani. August 2008 Trevor
More informationProperties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation
Properties of optimizations used in penalized Gaussian likelihood inverse covariance matrix estimation Adam J. Rothman School of Statistics University of Minnesota October 8, 2014, joint work with Liliana
More informationMultivariate Time Series: VAR(p) Processes and Models
Multivariate Time Series: VAR(p) Processes and Models A VAR(p) model, for p > 0 is X t = φ 0 + Φ 1 X t 1 + + Φ p X t p + A t, where X t, φ 0, and X t i are k-vectors, Φ 1,..., Φ p are k k matrices, with
More informationA New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables
A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables Niharika Gauraha and Swapan Parui Indian Statistical Institute Abstract. We consider the problem of
More informationRegularization and Variable Selection via the Elastic Net
p. 1/1 Regularization and Variable Selection via the Elastic Net Hui Zou and Trevor Hastie Journal of Royal Statistical Society, B, 2005 Presenter: Minhua Chen, Nov. 07, 2008 p. 2/1 Agenda Introduction
More informationLeast Squares Regression
CIS 50: Machine Learning Spring 08: Lecture 4 Least Squares Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may not cover all the
More informationVector Auto-Regressive Models
Vector Auto-Regressive Models Laurent Ferrara 1 1 University of Paris Nanterre M2 Oct. 2018 Overview of the presentation 1. Vector Auto-Regressions Definition Estimation Testing 2. Impulse responses functions
More informationNonconcave Penalized Likelihood with A Diverging Number of Parameters
Nonconcave Penalized Likelihood with A Diverging Number of Parameters Jianqing Fan and Heng Peng Presenter: Jiale Xu March 12, 2010 Jianqing Fan and Heng Peng Presenter: JialeNonconcave Xu () Penalized
More informationMachine Learning for Economists: Part 4 Shrinkage and Sparsity
Machine Learning for Economists: Part 4 Shrinkage and Sparsity Michal Andrle International Monetary Fund Washington, D.C., October, 2018 Disclaimer #1: The views expressed herein are those of the authors
More informationMachine Learning for OR & FE
Machine Learning for OR & FE Regression II: Regularization and Shrinkage Methods Martin Haugh Department of Industrial Engineering and Operations Research Columbia University Email: martin.b.haugh@gmail.com
More informationLearning Multiple Tasks with a Sparse Matrix-Normal Penalty
Learning Multiple Tasks with a Sparse Matrix-Normal Penalty Yi Zhang and Jeff Schneider NIPS 2010 Presented by Esther Salazar Duke University March 25, 2011 E. Salazar (Reading group) March 25, 2011 1
More informationStochastic Processes
Stochastic Processes Stochastic Process Non Formal Definition: Non formal: A stochastic process (random process) is the opposite of a deterministic process such as one defined by a differential equation.
More informationZhaoxing Gao and Ruey S Tsay Booth School of Business, University of Chicago. August 23, 2018
Supplementary Material for Structural-Factor Modeling of High-Dimensional Time Series: Another Look at Approximate Factor Models with Diverging Eigenvalues Zhaoxing Gao and Ruey S Tsay Booth School of
More informationECE521 week 3: 23/26 January 2017
ECE521 week 3: 23/26 January 2017 Outline Probabilistic interpretation of linear regression - Maximum likelihood estimation (MLE) - Maximum a posteriori (MAP) estimation Bias-variance trade-off Linear
More informationPermutation-invariant regularization of large covariance matrices. Liza Levina
Liza Levina Permutation-invariant covariance regularization 1/42 Permutation-invariant regularization of large covariance matrices Liza Levina Department of Statistics University of Michigan Joint work
More informationOr How to select variables Using Bayesian LASSO
Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO x 1 x 2 x 3 x 4 Or How to select variables Using Bayesian LASSO On Bayesian Variable Selection
More informationECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications
ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications Yongmiao Hong Department of Economics & Department of Statistical Sciences Cornell University Spring 2019 Time and uncertainty
More informationL11: Pattern recognition principles
L11: Pattern recognition principles Bayesian decision theory Statistical classifiers Dimensionality reduction Clustering This lecture is partly based on [Huang, Acero and Hon, 2001, ch. 4] Introduction
More informationBayesian Grouped Horseshoe Regression with Application to Additive Models
Bayesian Grouped Horseshoe Regression with Application to Additive Models Zemei Xu, Daniel F. Schmidt, Enes Makalic, Guoqi Qian, and John L. Hopper Centre for Epidemiology and Biostatistics, Melbourne
More informationA short introduction to INLA and R-INLA
A short introduction to INLA and R-INLA Integrated Nested Laplace Approximation Thomas Opitz, BioSP, INRA Avignon Workshop: Theory and practice of INLA and SPDE November 7, 2018 2/21 Plan for this talk
More informationBi-level feature selection with applications to genetic association
Bi-level feature selection with applications to genetic association studies October 15, 2008 Motivation In many applications, biological features possess a grouping structure Categorical variables may
More informationUNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013
UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013 Exam policy: This exam allows two one-page, two-sided cheat sheets; No other materials. Time: 2 hours. Be sure to write your name and
More informationFunctional time series
Rob J Hyndman Functional time series with applications in demography 4. Connections, extensions and applications Outline 1 Yield curves 2 Electricity prices 3 Dynamic updating with partially observed functions
More informationLecture 3: More on regularization. Bayesian vs maximum likelihood learning
Lecture 3: More on regularization. Bayesian vs maximum likelihood learning L2 and L1 regularization for linear estimators A Bayesian interpretation of regularization Bayesian vs maximum likelihood fitting
More information25 : Graphical induced structured input/output models
10-708: Probabilistic Graphical Models 10-708, Spring 2013 25 : Graphical induced structured input/output models Lecturer: Eric P. Xing Scribes: Meghana Kshirsagar (mkshirsa), Yiwen Chen (yiwenche) 1 Graph
More informationA Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)
A Survey of L 1 Regression Vidaurre, Bielza and Larranaga (2013) Céline Cunen, 20/10/2014 Outline of article 1.Introduction 2.The Lasso for Linear Regression a) Notation and Main Concepts b) Statistical
More informationFaMIDAS: A Mixed Frequency Factor Model with MIDAS structure
FaMIDAS: A Mixed Frequency Factor Model with MIDAS structure Frale C., Monteforte L. Computational and Financial Econometrics Limassol, October 2009 Introduction After the recent financial and economic
More informationParameterized Expectations Algorithm
Parameterized Expectations Algorithm Wouter J. Den Haan London School of Economics c by Wouter J. Den Haan Overview Two PEA algorithms Explaining stochastic simulations PEA Advantages and disadvantages
More informationPaper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001)
Paper Review: Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties by Jianqing Fan and Runze Li (2001) Presented by Yang Zhao March 5, 2010 1 / 36 Outlines 2 / 36 Motivation
More informationMSA220/MVE440 Statistical Learning for Big Data
MSA220/MVE440 Statistical Learning for Big Data Lecture 7/8 - High-dimensional modeling part 1 Rebecka Jörnsten Mathematical Sciences University of Gothenburg and Chalmers University of Technology Classification
More informationChris Fraley and Daniel Percival. August 22, 2008, revised May 14, 2010
Model-Averaged l 1 Regularization using Markov Chain Monte Carlo Model Composition Technical Report No. 541 Department of Statistics, University of Washington Chris Fraley and Daniel Percival August 22,
More informationAn economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso
An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso PIER Exchange Nov. 17, 2016 Thammarak Moenjak What is machine learning? Wikipedia
More informationConsistent high-dimensional Bayesian variable selection via penalized credible regions
Consistent high-dimensional Bayesian variable selection via penalized credible regions Howard Bondell bondell@stat.ncsu.edu Joint work with Brian Reich Howard Bondell p. 1 Outline High-Dimensional Variable
More informationLinear Regression (9/11/13)
STA561: Probabilistic machine learning Linear Regression (9/11/13) Lecturer: Barbara Engelhardt Scribes: Zachary Abzug, Mike Gloudemans, Zhuosheng Gu, Zhao Song 1 Why use linear regression? Figure 1: Scatter
More informationRoss (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.
4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.
More informationDynamic Matrix-Variate Graphical Models A Synopsis 1
Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st 6th, 2006 Dynamic Matrix-Variate Graphical Models A Synopsis 1 Carlos M. Carvalho & Mike West ISDS, Duke
More informationLecture 2 Part 1 Optimization
Lecture 2 Part 1 Optimization (January 16, 2015) Mu Zhu University of Waterloo Need for Optimization E(y x), P(y x) want to go after them first, model some examples last week then, estimate didn t discuss
More informationVector autoregressions, VAR
1 / 45 Vector autoregressions, VAR Chapter 2 Financial Econometrics Michael Hauser WS17/18 2 / 45 Content Cross-correlations VAR model in standard/reduced form Properties of VAR(1), VAR(p) Structural VAR,
More informationThe Econometric Analysis of Mixed Frequency Data with Macro/Finance Applications
The Econometric Analysis of Mixed Frequency Data with Macro/Finance Applications Instructor: Eric Ghysels Structure of Course It is easy to collect and store large data sets, particularly of financial
More informationLecture 14: Variable Selection - Beyond LASSO
Fall, 2017 Extension of LASSO To achieve oracle properties, L q penalty with 0 < q < 1, SCAD penalty (Fan and Li 2001; Zhang et al. 2007). Adaptive LASSO (Zou 2006; Zhang and Lu 2007; Wang et al. 2007)
More informationTime Series Models for Measuring Market Risk
Time Series Models for Measuring Market Risk José Miguel Hernández Lobato Universidad Autónoma de Madrid, Computer Science Department June 28, 2007 1/ 32 Outline 1 Introduction 2 Competitive and collaborative
More informationForecast comparison of principal component regression and principal covariate regression
Forecast comparison of principal component regression and principal covariate regression Christiaan Heij, Patrick J.F. Groenen, Dick J. van Dijk Econometric Institute, Erasmus University Rotterdam Econometric
More informationMacroeconomic nowcasting with big data through the lens of a sparse factor model 1
Macroeconomic nowcasting with big data through the lens of a sparse factor model 1 Laurent Ferrara (Banque de France) Anna Simoni (CREST, CNRS, ENSAE, École Polytechnique ) ECB Forecasting Conference June
More informationComputer Vision Group Prof. Daniel Cremers. 10a. Markov Chain Monte Carlo
Group Prof. Daniel Cremers 10a. Markov Chain Monte Carlo Markov Chain Monte Carlo In high-dimensional spaces, rejection sampling and importance sampling are very inefficient An alternative is Markov Chain
More informationTHE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2015, Mr. Ruey S. Tsay
THE UNIVERSITY OF CHICAGO Booth School of Business Business 41914, Spring Quarter 2015, Mr. Ruey S. Tsay Lecture 8: Seasonal Model, Principal Component Analysis and Factor Models Reference: Chapter 6 of
More informationRegularization: Ridge Regression and the LASSO
Agenda Wednesday, November 29, 2006 Agenda Agenda 1 The Bias-Variance Tradeoff 2 Ridge Regression Solution to the l 2 problem Data Augmentation Approach Bayesian Interpretation The SVD and Ridge Regression
More informationMACHINE LEARNING. Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA
1 MACHINE LEARNING Methods for feature extraction and reduction of dimensionality: Probabilistic PCA and kernel PCA 2 Practicals Next Week Next Week, Practical Session on Computer Takes Place in Room GR
More informationInference in VARs with Conditional Heteroskedasticity of Unknown Form
Inference in VARs with Conditional Heteroskedasticity of Unknown Form Ralf Brüggemann a Carsten Jentsch b Carsten Trenkler c University of Konstanz University of Mannheim University of Mannheim IAB Nuremberg
More informationPh.D. Qualifying Exam Monday Tuesday, January 4 5, 2016
Ph.D. Qualifying Exam Monday Tuesday, January 4 5, 2016 Put your solution to each problem on a separate sheet of paper. Problem 1. (5106) Find the maximum likelihood estimate of θ where θ is a parameter
More informationRegression Shrinkage and Selection via the Lasso
Regression Shrinkage and Selection via the Lasso ROBERT TIBSHIRANI, 1996 Presenter: Guiyun Feng April 27 () 1 / 20 Motivation Estimation in Linear Models: y = β T x + ɛ. data (x i, y i ), i = 1, 2,...,
More informationResearch Division Federal Reserve Bank of St. Louis Working Paper Series
Research Division Federal Reserve Bank of St Louis Working Paper Series Kalman Filtering with Truncated Normal State Variables for Bayesian Estimation of Macroeconomic Models Michael Dueker Working Paper
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationA Bayesian Perspective on Residential Demand Response Using Smart Meter Data
A Bayesian Perspective on Residential Demand Response Using Smart Meter Data Datong-Paul Zhou, Maximilian Balandat, and Claire Tomlin University of California, Berkeley [datong.zhou, balandat, tomlin]@eecs.berkeley.edu
More informationPathwise coordinate optimization
Stanford University 1 Pathwise coordinate optimization Jerome Friedman, Trevor Hastie, Holger Hoefling, Robert Tibshirani Stanford University Acknowledgements: Thanks to Stephen Boyd, Michael Saunders,
More informationProbabilistic Graphical Models
School of Computer Science Probabilistic Graphical Models Gaussian graphical models and Ising models: modeling networks Eric Xing Lecture 0, February 7, 04 Reading: See class website Eric Xing @ CMU, 005-04
More informationShort T Panels - Review
Short T Panels - Review We have looked at methods for estimating parameters on time-varying explanatory variables consistently in panels with many cross-section observation units but a small number of
More informationCross-Validation with Confidence
Cross-Validation with Confidence Jing Lei Department of Statistics, Carnegie Mellon University UMN Statistics Seminar, Mar 30, 2017 Overview Parameter est. Model selection Point est. MLE, M-est.,... Cross-validation
More informationLinear Models for Regression CS534
Linear Models for Regression CS534 Example Regression Problems Predict housing price based on House size, lot size, Location, # of rooms Predict stock price based on Price history of the past month Predict
More informationPredicting bond returns using the output gap in expansions and recessions
Erasmus university Rotterdam Erasmus school of economics Bachelor Thesis Quantitative finance Predicting bond returns using the output gap in expansions and recessions Author: Martijn Eertman Studentnumber:
More informationMA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2
MA 575 Linear Models: Cedric E. Ginestet, Boston University Regularization: Ridge Regression and Lasso Week 14, Lecture 2 1 Ridge Regression Ridge regression and the Lasso are two forms of regularized
More information