Mining Big Data Using Parsimonious Factor and Shrinkage Methods

Similar documents
ECONOMETRIC ESSAYS ON NONLINEAR METHODS AND DIFFUSION INDEX FORECASTING HYUN HAK KIM

Methods for Pastcasting, Nowcasting and Forecasting Using Factor-MIDAS*

Combining Macroeconomic Models for Prediction

Nowcasting Norwegian GDP

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Linear Model Selection and Regularization

Bayesian Compressed Vector Autoregressions

Assessing the use of Google Trends to predict credit developments 1. Edwige Burdeau* Banque de France, Paris, France

STRATHCLYDE DISCUSSION PAPERS IN ECONOMICS FORECASTING WITH MEDIUM AND LARGE BAYESIAN VARS GARY KOOP NO

Econometric Forecasting

Machine Learning (BSMC-GA 4439) Wenke Liu

Linear regression methods

CS 540: Machine Learning Lecture 1: Introduction

Econ 424 Time Series Concepts

The prediction of house price

University of Konstanz Department of Economics

CIFAR Lectures: Non-Gaussian statistics and natural images

Independent Component Analysis. Contents

Lecture 1: OLS derivations and inference

A multi-country approach to forecasting output growth using PMIs

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

Testing for Structural Stability of Factor Augmented Forecasting Models

Time Series Models and Inference. James L. Powell Department of Economics University of California, Berkeley

The Superiority of Greenbook Forecasts and the Role of Recessions

Gatsby Theoretical Neuroscience Lectures: Non-Gaussian statistics and natural images Parts I-II

Environmental Econometrics

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Time-Varying Vector Autoregressive Models with Structural Dynamic Factors

Econ 423 Lecture Notes: Additional Topics in Time Series 1

An Introduction to Independent Components Analysis (ICA)

Linear Methods for Regression. Lijun Zhang

Macroeconomic nowcasting with big data through the lens of a sparse factor model 1

An economic application of machine learning: Nowcasting Thai exports using global financial market data and time-lag lasso

Dimension Reduction (PCA, ICA, CCA, FLD,

A Course on Advanced Econometrics

Methods for Backcasting, Nowcasting and Forecasting Using Factor-MIDAS: With An Application To Korean GDP*

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2013

Max. Likelihood Estimation. Outline. Econometrics II. Ricardo Mora. Notes. Notes

On the Selection of Common Factors for Macroeconomic Forecasting. Alessandro Giovannelli and Tommaso Proietti. CREATES Research Paper

The Conditional Predictive Ability of Economic Variables

1.1 VAR (in levels and dierences, Holt-Winters exponential smoothing)

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

Forecasting In ation Using Dynamic Model Averaging

Forecasting Using Supervised Factor Models

Introduction to Independent Component Analysis. Jingmei Lu and Xixi Lu. Abstract

Short-term forecasts of GDP from dynamic factor models

Sparse Principal Component Analysis Formulations And Algorithms

Estimating Global Bank Network Connectedness

May 2, Why do nonlinear models provide poor. macroeconomic forecasts? Graham Elliott (UCSD) Gray Calhoun (Iowa State) Motivating Problem

Independent Component Analysis and Its Applications. By Qing Xue, 10/15/2004

Introduction to Machine Learning Midterm, Tues April 8

STATISTICAL LEARNING SYSTEMS

Bagging and Other Ensemble Methods

Motivation Non-linear Rational Expectations The Permanent Income Hypothesis The Log of Gravity Non-linear IV Estimation Summary.

Near-equivalence in Forecasting Accuracy of Linear Dimension Reduction Method

Forecast comparison of principal component regression and principal covariate regression

Chap 1. Overview of Statistical Learning (HTF, , 2.9) Yongdai Kim Seoul National University

Introduction to Machine Learning and Cross-Validation

Chapter 3. Linear Models for Regression

Applied Econometrics. Professor Bernard Fingleton

FaMIDAS: A Mixed Frequency Factor Model with MIDAS structure

A Bayesian Perspective on Residential Demand Response Using Smart Meter Data

Small versus big-data factor extraction in Dynamic Factor Models: An empirical assessment

Statistical Data Mining and Machine Learning Hilary Term 2016

Do Markov-Switching Models Capture Nonlinearities in the Data? Tests using Nonparametric Methods

Nowcasting New Zealand GDP Using Machine Learning Algorithms

STATS 306B: Unsupervised Learning Spring Lecture 13 May 12

The value of competitive information in forecasting FMCG retail product sales and category effects

Discussion of Predictive Density Combinations with Dynamic Learning for Large Data Sets in Economics and Finance

Lecture'12:' SSMs;'Independent'Component'Analysis;' Canonical'Correla;on'Analysis'

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Learning in Real Time: Theory and Empirical Evidence from the Term Structure of Survey Forecasts

Instead of using all the sample observations for estimation, the suggested procedure is to divide the data set

2. Multivariate ARMA

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

8.6 Bayesian neural networks (BNN) [Book, Sect. 6.7]

1 The Multiple Regression Model: Freeing Up the Classical Assumptions

Forecasting 1 to h steps ahead using partial least squares

How News and Its Context Drive Risk and Returns Around the World

Independent Component Analysis. PhD Seminar Jörgen Ungh

Stephan Smeekes, Etiënne Wijler. Macroeconomic Forecasting Using Penalized Regression Methods RM/16/039

General comments Linear vs Non-Linear Univariate vs Multivariate

CS534 Machine Learning - Spring Final Exam

ECONOMETRICS FIELD EXAM Michigan State University May 9, 2008

Granger Causality and Dynamic Structural Systems 1

A Support Vector Regression Model for Forecasting Rainfall

STA 4273H: Statistical Machine Learning

Final Overview. Introduction to ML. Marek Petrik 4/25/2017

CS 229 Final report A Study Of Ensemble Methods In Machine Learning

9) Time series econometrics

Pattern Recognition and Machine Learning

Machine Learning for OR & FE

Factor-MIDAS for now- and forecasting with ragged-edge data: A model comparison for German GDP 1

Machine Learning for Signal Processing Bayes Classification

Functional time series

Variable Selection for Highly Correlated Predictors

A Sparse Linear Model and Significance Test. for Individual Consumption Prediction

ECON 4160: Econometrics-Modelling and Systems Estimation Lecture 7: Single equation models

Analysis of Fast Input Selection: Application in Time Series Prediction

GMM-based inference in the AR(1) panel data model for parameter values where local identi cation fails

Transcription:

Mining Big Data Using Parsimonious Factor and Shrinkage Methods Hyun Hak Kim 1 and Norman Swanson 2 1 Bank of Korea and 2 Rutgers University ECB Workshop on using Big Data for Forecasting and Statistics Apr 7th, 2014 Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 1 / 28

Outline 1 Introduction 2 Factor Construction Methods 3 Forecasting Framework 4 Forecasting Strategy 5 Main Findings 6 Conclusion Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 2 / 28

Original 50 components 25 components 2 components Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 3 / 28 Introduction Statistical algorithms for large-scale data Classi cation - decision trees, support vector machine Clustering - K-means clustering, hierachical clustering Regression - neural networks, principal component analysis Sequence labeling - hidden markov models, kalman lters Ensemble learning - bagging, boosting Overall, those are nding useful set of variables, that is, reducing dimesion

Introduction Forecasting applications Macroeconomic forecasting Di usion Index Model Microeconomic forecasting Consumer s behavior Credit scoring Non-economic forecasting Server tra c analysis - high frequent possible Social network analysis Gene dependence analysis Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 4 / 28

Introduction Forecasting Framework Our generic forecasting equation is: Y t+h = W t β W + F t β F + ε t+h, (1) where h is the forecast horizon, W t is a 1 s vector (possibly including lags of Y ), and F t is a 1 r vector of factors, extracted from F. The parameters, β W and β F are de ned conformably, and ε t+h is a disturbance term. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 5 / 28

Introduction Di usion Index Model Let X tj be the observed datum for the j th cross-sectional unit at time t, for t = 1,..., T and j = 1,..., N. Recall that we shall consider the following model: X tj = ΛjF 0 t + e tj, (2) where F t is a r 1 vector of common factors, Λ j is an r 1 vector of factor loadings associated with F t, and e tj is the idiosyncratic component of X tj. The product F t Λ 0 j is called the common component of X tj. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 6 / 28

Introduction Motivation Shortcomings of Principal Component Analysis (PCA) Since the factors by PCA are a linear combination of all variables, It is not easy to interpret factors New approaches: parsimonious factor construction methods Independent Component Analysis Sparse Principal Component Analysis New factor construction methods Parsimonious model improves predictive accuracy Interpret factors Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 7 / 28

Factor Construction Data Description Data set we examine in this paper consists of 144 macro and nancial monthly time series from 1960:01 2009:05 are what various papers including Stock and Watson (2002) used to investigate the usefulness of factor analysis in the context of forecasting are used to construct factors xed no matter what we forecast are transformed accordingly to induce stationarity Categorize into 7 groups Production, employment, housing, interest rate, price and the miscellaneous Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 8 / 28

Factor Construction Methods Three types of factor construction methods are considered 1 Principal Component Analysis 2 Independent Component Analysis 3 Sparse Principal Component Analysis Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 9 / 28

Factor Construction Method I - PCA Revisit Principal Component Analysis Consider the linear combinations then we obtain P i = a 0 i X = a i1 X 1 + a i2 X 2 + + a in X N (3) Var (P i ) = a 0 i Σa i i = 1, 2,..., N (4) Cov(P i, P k ) = a 0 i Σa k i and k = 1, 2,..., N (5) where Σ be the covariance matrix associated with the random vector X = [X 1, X 2,..., X N ]. The principal component are those uncorrelated linear combinations P 1, P 2,...P N whose variance in (4) are as large as possible. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 10 / 28

Factor Construction Method II - ICA Introduction of Independent Component Analysis (ICA) The starting point for ICA is the very simple assumptions that the components, F, are statistically independent The key is the measurement of this independence between components It begins with statistical independent source data, S, which are mixed according to Ω ; and X which is observed, is a mixture of S weighted by Ω. Schematic representation of ICA Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 11 / 28

Independent Component Analysis Comparing to PCA For simplicity, consider two observables, X = (X 1, X 2 ). PCA nds uncorrelated components F = (F 1, F 2 ), which have a joint probability density function, p F (F ) with E (F 1 F 2 ) = E (F 1 ) E (F 2 ). (6) On the other hand, ICA nds independent components F = (F 1, F 2 ), which have a joint pdf p F (F ) with E F p 1 F q 2 = E F p 1 for every positive integer value of p and q. E F q 2, (7) That is, independent components work for any moments. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 12 / 28

Independent Component Analysis Estimation of ICA One often uses a modi ed version of entropy, so called negentropy, N,where: N (F ) = H (F gauss ) H (F ), (8) where F gauss is a Gaussian random variable with the same covariance matrix as F. This negentropy, N (), as a measure of nongaussianity, is zero for a Gaussian variable and always nonnegative. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 13 / 28

Independent Component Analysis Estimation of ICA Simple version of this approximation use only one nonquadratic function, G, leading to: N (F ) [E fg (F )g E fg (ν)g] 2. (9) If we pick non-fast growing G, we may have more robust estimators. [Hyvärinen and Oja, 2000] suggest two Gs, and they show that these functions yield good approximations: and G 1 (y) = 1 a 1 log cosh a 1 y (10) G 2 (y) = exp u 2 /2, (11) where 1 a 1 2 is some suitable constant. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 14 / 28

Factor Construction Method III - SPCA Introduction of Sparse Principal Component Analysis (SPCA) Principal components are linear combinations of variables that are ordered by covariance contributions, and selection is of a small number of components which maximize the variance that is explained. However, factor loading coe cients are all typically nonzero, making interpretation of estimated components di cult. SPCA aids in the interpretation of principal components by placing (zero) restrictions on various factor loading coe cients. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 15 / 28

Sparse Principal Component Analysis Introduction [Zou et al., 2006] develop a regression optimization framework. Namely, they consider X as a dependent variables, F as explanatory variables, and the loadings as coe cients. They then use of the lasso (and elastic net) to derive a sparse loading matrix. Suppose we derive principle components (PCs), F via ordinary PCA. Then, let the estimated j-th principal component, F j be the dependent variable and X be the independent variables. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 16 / 28

Sparse Principal Component Analysis Number of Nonzero Loadings in First 6 Sparse Components Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 17 / 28

Sparse Principal Component Analysis Example Figure: Number of Selection of Variables in First Sparse Principal Component Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 18 / 28

Sparse Principal Component Analysis Time Varying Loadings of SPC Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 19 / 28

Forecasting Methodologies Revisit Forecasting Framework Our generic forecasting equation is: Y t+h = W t β W + F t β F + ε t+h, (12) where h is the forecast horizon, W t is a 1 s vector (possibly including lags of Y ), and F t is a 1 r vector of factors, extracted from F. The parameters, β W and β F are de ned conformably, and ε t+h is a disturbance term. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 20 / 28

Forecasting Methodologies Factor Augmented Autoregression (FAAR) Forecasts of Y t+h involve a two-step process 1 The data X t are used to estimate the factors, ˆF t, 2 Obtain the estimators ˆβ F and ˆβ W by regressing Y t+h on ˆF t and W t. As a rst step, estimate ˆF t using PCA, ICA and SPCA As a second step, estimate ˆβ F with various robust estimation techniques including Bagging, Boosting, Ridge Regression, Least Angle Regression, Elastic Net and Non-Negative Garotte Other than that, various benchmark models are considered including Autoregressive model, CADL and BMA Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 21 / 28

Experimental Setup Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 22 / 28

Experimental Setup Speci cation Type 1: Various type of factor components are rst constructed using large set of data; and then prediction models are formed using the shrinkage methods to select functions of and weights for the factors Speci cation Type 2: Various type of factor components are rst constructed using subsets of variables from the large-scale dataset that are pre-selected via application of the robust shrinkage methods discussed. Speci cation Type 3: Prediction models are constructed using only the shrinkage methods, without use of factor analysis at any stage. Speci cation Type 4: Prediction models are constructed using only shrinkage methods, and only with variables which have nonzero coe cients, as speci ed via pre-selection using SPCA. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 23 / 28

Empirical Results Reported Results Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 24 / 28

Empirical Results Reported Results Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 25 / 28

Empirical Results Various our benchmarks do not dominate more complicated nonlinear methods, and that using a combination of factor and other shrinkage methods often yields superior predictions. UR PI TB CPI PPI NPE HS IPX M2 SNP GDP SP1 SP1 SP1 SP4 SP1 SP1 SP1 SP1 SP1L SP1 SP2 REC REC REC ROL REC REC REC REC ROL REC REC PCA SPC SPC N/A ICA SPC SPC SPC SPC SPC ICA FAAR PCR PCR BMA2 FAAR Mean FAAR FAAR Mean Boost Boost MSFE-best Speci cation Type/Window/PC/Model Combo for h = 1 Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 26 / 28

Empirical Results Our benchmark econometric models are never found to be MSFEbest, regardless of the target variable being forecast, and the forecast horizon. Additionally, pure shrinkage type prediction models and standard (linear) regression models, do not MSFE-dominate models based on the use of factors constructed using either principal component analysis, independent component analysis or sparse component analysis. This result provides strong new evidence of the usefulness of factor based forecasting. Recursive estimation window strategies only dominate rolling strategies at the 1-step ahead forecast horizon Including lags in factor model approaches does not generally yield improved predictions. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 27 / 28

Concluding Remarks In this paper, I nd the simplest principal components type models win around 40% of the time. Interestiingly, ICA and SPCA type models also win around 40% of the time. hybrid methods including factor approaches coupled with shrinkage win around 1/3 of the time, simple linear autoregressive type models never win in our experiments. I take these results as evidence of the usefulness of new methods in factor modelling and shrinkage, when the objective is prediction of macroeconomic time series variables Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 28 / 28

Hyvärinen, A. and Oja, E. (2000). Independent component analysis: algorithms and applications. Neural Networks, 13(4-5):411 430. Zou, H., Hastie, T., and Tibshirani, R. (2006). Sparse principal component analysis. Journal of Computational and Graphical Statistics, 15(2):262 286. Kim & Swanson (BOK and Rutgers) Mining Big Data in Parsimonious Ways Apr 7th, 2014 28 / 28