Efficient Bayesian Multivariate Surface Regression

Similar documents
Efficient Bayesian Multivariate Surface Regression

Bayesian Aspects of Statistical Learning Summer School of Statistics 2011

Bayesian Linear Regression

Gibbs Sampling in Linear Models #2

Labor-Supply Shifts and Economic Fluctuations. Technical Appendix

Copula Regression RAHUL A. PARSA DRAKE UNIVERSITY & STUART A. KLUGMAN SOCIETY OF ACTUARIES CASUALTY ACTUARIAL SOCIETY MAY 18,2011

Bayesian linear regression

High-dimensional Problems in Finance and Economics. Thomas M. Mertens

Part 6: Multivariate Normal and Linear Models

L7: Multicollinearity

Bayesian non-parametric model to longitudinally predict churn

MCMC algorithms for fitting Bayesian models

MULTILEVEL IMPUTATION 1

Markov Chain Monte Carlo methods

variability of the model, represented by σ 2 and not accounted for by Xβ

Bayesian Analysis of Multivariate Normal Models when Dimensions are Absent

Introduction to Machine Learning

Bayesian Linear Models

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 28, 2015

Bayesian Linear Models

STAT Advanced Bayesian Inference

Katsuhiro Sugita Faculty of Law and Letters, University of the Ryukyus. Abstract

Bayesian Inference. Chapter 9. Linear models and regression

STAT 730 Chapter 4: Estimation

Multivariate Normal & Wishart

GAUSSIAN PROCESS REGRESSION

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Part 8: GLMs and Hierarchical LMs and GLMs

Markov chain Monte Carlo

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010

Bayesian Linear Models

November 2002 STA Random Effects Selection in Linear Mixed Models

Vector Autoregressive Model. Vector Autoregressions II. Estimation of Vector Autoregressions II. Estimation of Vector Autoregressions I.

Introduction to Machine Learning

Lecture 8: Bayesian Estimation of Parameters in State Space Models

Student-t Process as Alternative to Gaussian Processes Discussion

Principles of Bayesian Inference

Lecture 2: Priors and Conjugacy

Gaussian processes and bayesian optimization Stanisław Jastrzębski. kudkudak.github.io kudkudak

CSC 2541: Bayesian Methods for Machine Learning

BAYESIAN METHODS FOR VARIABLE SELECTION WITH APPLICATIONS TO HIGH-DIMENSIONAL DATA

Nonparameteric Regression:

Generalized Autoregressive Score Models

Metropolis Hastings. Rebecca C. Steorts Bayesian Methods and Modern Statistics: STA 360/601. Module 9

Bayesian Methods for Machine Learning

CPSC 540: Machine Learning

Overfitting, Bias / Variance Analysis

Machine Learning for OR & FE

Timevarying VARs. Wouter J. Den Haan London School of Economics. c Wouter J. Den Haan

Logistic Regression. Seungjin Choi

EM Algorithm II. September 11, 2018

Cross-sectional space-time modeling using ARNN(p, n) processes

Dynamic Factor Models and Factor Augmented Vector Autoregressions. Lawrence J. Christiano

Bayesian spatial hierarchical modeling for temperature extremes

ISQS 5349 Final Exam, Spring 2017.

GWAS V: Gaussian processes

ETIKA V PROFESII PSYCHOLÓGA

Bayesian Estimation with Sparse Grids

GWAS IV: Bayesian linear (variance component) models

The linear model is the most fundamental of all serious statistical models encompassing:

1 Bayesian Linear Regression (BLR)

L2: Two-variable regression model

Marginal Specifications and a Gaussian Copula Estimation

Using Model Selection and Prior Specification to Improve Regime-switching Asset Simulations

Bayesian model selection: methodology, computation and applications

A Fully Nonparametric Modeling Approach to. BNP Binary Regression

PMR Learning as Inference

ST 740: Linear Models and Multivariate Normal Inference

Regression, Ridge Regression, Lasso

Machine Learning - MT & 5. Basis Expansion, Regularization, Validation

Dependence. Practitioner Course: Portfolio Optimization. John Dodson. September 10, Dependence. John Dodson. Outline.

Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation.

A Study into Mechanisms of Attitudinal Scale Conversion: A Randomized Stochastic Ordering Approach

Physics 403. Segev BenZvi. Numerical Methods, Maximum Likelihood, and Least Squares. Department of Physics and Astronomy University of Rochester

Accounting for Complex Sample Designs via Mixture Models

Bayesian Inference for DSGE Models. Lawrence J. Christiano

Expression Data Exploration: Association, Patterns, Factors & Regression Modelling

Bayesian Regression Linear and Logistic Regression

A Process over all Stationary Covariance Kernels

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

Linear Models for Regression

Gibbs Sampling in Latent Variable Models #1

Markov chain Monte Carlo methods in atmospheric remote sensing

The Bayesian approach to inverse problems

Chapter 3: Maximum Likelihood Theory

Bayesian Learning. HT2015: SC4 Statistical Data Mining and Machine Learning. Maximum Likelihood Principle. The Bayesian Learning Framework

STA 4273H: Statistical Machine Learning

Bayesian methods in economics and finance

Pattern Recognition and Machine Learning. Bishop Chapter 2: Probability Distributions

Linear Regression (9/11/13)

Reminder of some Markov Chain properties:

Metropolis-Hastings sampling

Generalized Linear Models. Kurt Hornik

Riemann Manifold Methods in Bayesian Statistics

Markov Chain Monte Carlo

Hierarchical Modeling for Spatial Data

David Giles Bayesian Econometrics

Monte Carlo Composition Inversion Acceptance/Rejection Sampling. Direct Simulation. Econ 690. Purdue University

Online Appendix. Online Appendix A: MCMC Algorithm. The model can be written in the hierarchical form: , Ω. V b {b k }, z, b, ν, S

Support Vector Machines

Transcription:

Efficient Bayesian Multivariate Surface Regression Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics

Outline of the talk 1 Introduction to flexible regression models 2 The multivariate surface model 3 Application to firm leverage data 4 Extensions and future work Feng Li (StatMath, CUFE) Multivariate Surface Regression 2 / 16

Flexible regression models ï Introduction Flexible models of the regression function E(y x) has been an active research field for decades. Attention has shifted from kernel regression methods to spline-based models. Splines are regression models with flexible mean functions. Example: a simple spline regression with only one explanatory variable with truncated linear basis function can be like this where y = α + α 1 x + β 1 (x ξ 1 ) + +... + β q (x ξ q ) + + ε (x ξi ) + are called the basis functions, ξi are called knots (the location of the basis function). Feng Li (StatMath, CUFE) Multivariate Surface Regression 3 / 16

Flexible regression models ï Spline example (single covariate with thinplate bases) Feng Li (StatMath, CUFE) Multivariate Surface Regression 4 / 16

Flexible regression models ï Spline regression with multiple covariates Additive spline model Each knot ξj. (scaler) is connected with only one covariate m 1 m ÿ ÿ q y = α + α 1 x 1 +... + α q x q + β j1 f (x 1, ξ j1 ) +... + β jq f ( ) x q, ξ jq + ε j 1 =1 j q=1 Good and simple if you know there is no interactions in the data a priori. Surface spline model Each knot ξj (vector) is connected with more than one covariate [ ] mÿ y = α + α 1 x 1 +... + α q x q + β j g (x 1,...x q, ξ j ) + ε A popular choice of g (x1,...x q, ξ j ) can be e.g. the multi-dimensional thinplate spline g (x 1,...x q, ξ j ) = }x ξ j } 2 ln }x ξ j } Can handle the interactions but the model complexity increase dramatically with the interactive knots. Feng Li (StatMath, CUFE) Multivariate Surface Regression 5 / 16 j=1

The challenges How many knots are needed? Too few knots lead to a bad approximation; too many knots yield overfitting. Where to place those knots? Equal spacing for the additive model, which is obviously not efficient with the surface model. Common approaches to the two problems: place enough many knots and use variable selection to pick up useful ones. not truly flexible use reversible jump MCMC to move among the model spaces with different numbers of knots very sensitive to the prior and not computational efficient clustering the covariates to select knots does not use the information from the responses How to choose between additive spline and surface spline? NA Feng Li (StatMath, CUFE) Multivariate Surface Regression 6 / 16

The multivariate surface model ï The model The multivariate surface model consists of three different components, linear, surface and additive as Y = X o B o + X s (ξ s )B s + X a (ξ a )B a + E. We treat the knots ξ i as unknown parameters and let them move freely. A model with a minimal number of free knots outperforms model with lots of fixed knots. For notational convenience, we sometimes write model in compact form Y = XB + E, where X = [X o, X s, X a ] and B = [B o 1, B s 1, B a 1 ] 1 and E N p (, Σ) Feng Li (StatMath, CUFE) Multivariate Surface Regression 7 / 16

The multivariate surface model ï The prior Conditional on the knots, the prior for B and Σ are set as [ ] vecb i Σ, λ i N q µ i, Λ 1/2 i ΣΛ 1/2 i b P 1 i, i P to, s, au, Σ IW [n S, n ], Λi = diag(λ i ) are called the shrinkage parameters, which is used for overcome overfitting through the prior. If Pi = I, can prevent singularity problem, like the ridge regression estimate. If Pi = X 1 i X i: use the covariates information, also a compressed version of least squares estimate when λ i is large. The shrinkage parameters are estimated in MCMC A small λi shrinks the variance of the conditional posterior for B i It is another approach to selection important variables (knots) and components. We allow to mixed use the two types priors ( P i = I, P i = X 1 i X i) in different components in order to take the both the advantages of them. Feng Li (StatMath, CUFE) Multivariate Surface Regression 8 / 16

The multivariate surface model ï The Bayesian posterior The posterior distribution is conveniently decomposed as p(b, Σ, ξ, λ Y, X) = p(b Σ, ξ, λ, Y, X)p(Σ ξ, λ, Y, X)p(ξ, λ Y, X). Hence p(b Σ, ξ, λ, Y, X) follows the multivariate normal distribution according to the conjugacy; When p = 1, p(σ ξ, λ, Y, X) follows the inverse Wishart distribution $, & IW n + n, % n S + n S + ÿ. Λ 1/2 i ( B i M i ) 1 P i ( B i M )Λ 1/2 i i - ipto,s,au When p ě 2, no closed form of p(σ ξ, λ, Y, X), the above result is a very accurate approximation. Then the marginal posterior of Σ, ξ and λ is p (Σ, ξ, λ Y, X) =c ˆ p(ξ, λ) ˆ Σ 1/2 Σ (n+n +p+1)/2 β Σ 1/2 β " ˆ exp 1 [trσ 1 ( ) n S + n S + ( β µ β 2 ) 1 Σ 1 ( β µ) ]* Feng Li (StatMath, CUFE) Multivariate Surface Regression 9 / 16

The MCMC algorithm ï Metropolis-Hastings within Gibbs The coefficients (B) are directly sampled from normal distribution. We update covariance (Σ), all knots (ξ) and shrinkages (λ) jointly by using Metropolis-Hastings within Gibbs. The proposal density for Σ is the inverse Wishart density on previous slide. The proposal density for ξ and λ is a multivariate t-density with ν ą 2 df, [ ( ] B θ p θ c MVT ˆθ, 2 ln p(θ Y) ) 1ˇˇˇˇ BθBθ 1, ν, ˇθ= ˆθ where ˆθ is obtained by R steps (R ď 3) Newton s iterations during the proposal with analytical gradients for matrices. The analytical gradients are very complicated and we have implemented it in an efficient way (the key!). Feng Li (StatMath, CUFE) Multivariate Surface Regression 1 / 16

Application to firm leverage data ï The data 1..6.8 1..8.6.8.6.4.6.8 1. 15 2 25 3 5 1 15 2 25 3 Feng Li (StatMath, CUFE).4. 5 5 5 1 5 1 4 6 4 2 2 4 6 4 2 2 4 2 Tang.2.4. 1 3 2 3 2 1 1 1 1 6.2 5 4. 2 1..8 6 4 2.6 6.4 6.2 6. 6 4 2 log[y/(1 Y)].2.4.2...2.4 Y.6.8 1. total debt/(total debt+book value of equity), 445 observations; tangible assets/book value of total assets; (book value of total assets - book value of equity + market value of equity) / book value of total assets; logarithm of sales; (earnings before interest, taxes, depreciation, and amortization) / book value of total assets. 1. leverage (Y): tang: market2book: logsales: profit: Market2Book Multivariate Surface Regression LogSale Profit 11 / 16

Surface + 2 fixed additive knots Surface + 2 free additive knots 122 123 124 125 126 127 128 129 13 131 132 133 134 135 136 137 138 139 LPDS Surface component model Free knots 5 1 15 2 25 3 35 4 Additive component model Fixed knots 5 1 15 2 25 3 35 4 LPDS LPDS 123 124 125 126 127 128 129 13 122 123 124 125 126 127 Surface + 4 fixed additive knots Surface + 4 free additive knots No. of surface knots No. of additive knots 128 129 13 Ò Models with only surface or additive components Surface + 8 fixed additive knots Surface + 8 free additive knots Ñ Model with both additive and surface components. 122 LPDS Log predictive density score which is defined as 123 124 125 LPDS = 1 ÿ D ln p(ỹd D d=1 Ỹ d, X) ż ź = ipτ p(y i θ, x i )p(θ Ỹ d)dθ, d LPDS 126 127 128 129 Free surface knots Fixed surface knots and D = 5 in the cross-validation. 13 5 1 15 2 25 3 35 4 5 1 15 2 25 3 35 4 No. of surface knots No. of surface knots

Posterior locations of knots 1..7.5.6..5.5 Profit 1. 1.5.4.3 2..2 2.5 3. Prior mean of surface knots Prior mean of additive knots.1 3.5 5 1 15 2 25 3 35. Market2Book

Application to firm leverage data ï Posterior mean surface(left) and standard deviation(right) Posterior surface (mean) Posterior surface (standard deviation) 1..8.7 1..8.2.6.6.6.4.5.4.15.2.2 Profit..4 Profit..2.3.2.1.4.4.6.2.6.5.8.1.8 1. 1. 5 1 15 5 1 15 Market2Book Market2Book Feng Li (StatMath, CUFE) Multivariate Surface Regression 14 / 16

Extensions and future work The model and the methods we used are very general. It is easy to generalize the model to GLM framework. Variable selection is possible for knots. Dirichlet precess prior can be plugged into the model when heteroscedasticity is the problem. And the copula... Feng Li (StatMath, CUFE) Multivariate Surface Regression 15 / 16