Efficient Bayesian Multivariate Surface Regression Feng Li feng.li@cufe.edu.cn School of Statistics and Mathematics Central University of Finance and Economics
Outline of the talk 1 Introduction to flexible regression models 2 The multivariate surface model 3 Application to firm leverage data 4 Extensions and future work Feng Li (StatMath, CUFE) Multivariate Surface Regression 2 / 16
Flexible regression models ï Introduction Flexible models of the regression function E(y x) has been an active research field for decades. Attention has shifted from kernel regression methods to spline-based models. Splines are regression models with flexible mean functions. Example: a simple spline regression with only one explanatory variable with truncated linear basis function can be like this where y = α + α 1 x + β 1 (x ξ 1 ) + +... + β q (x ξ q ) + + ε (x ξi ) + are called the basis functions, ξi are called knots (the location of the basis function). Feng Li (StatMath, CUFE) Multivariate Surface Regression 3 / 16
Flexible regression models ï Spline example (single covariate with thinplate bases) Feng Li (StatMath, CUFE) Multivariate Surface Regression 4 / 16
Flexible regression models ï Spline regression with multiple covariates Additive spline model Each knot ξj. (scaler) is connected with only one covariate m 1 m ÿ ÿ q y = α + α 1 x 1 +... + α q x q + β j1 f (x 1, ξ j1 ) +... + β jq f ( ) x q, ξ jq + ε j 1 =1 j q=1 Good and simple if you know there is no interactions in the data a priori. Surface spline model Each knot ξj (vector) is connected with more than one covariate [ ] mÿ y = α + α 1 x 1 +... + α q x q + β j g (x 1,...x q, ξ j ) + ε A popular choice of g (x1,...x q, ξ j ) can be e.g. the multi-dimensional thinplate spline g (x 1,...x q, ξ j ) = }x ξ j } 2 ln }x ξ j } Can handle the interactions but the model complexity increase dramatically with the interactive knots. Feng Li (StatMath, CUFE) Multivariate Surface Regression 5 / 16 j=1
The challenges How many knots are needed? Too few knots lead to a bad approximation; too many knots yield overfitting. Where to place those knots? Equal spacing for the additive model, which is obviously not efficient with the surface model. Common approaches to the two problems: place enough many knots and use variable selection to pick up useful ones. not truly flexible use reversible jump MCMC to move among the model spaces with different numbers of knots very sensitive to the prior and not computational efficient clustering the covariates to select knots does not use the information from the responses How to choose between additive spline and surface spline? NA Feng Li (StatMath, CUFE) Multivariate Surface Regression 6 / 16
The multivariate surface model ï The model The multivariate surface model consists of three different components, linear, surface and additive as Y = X o B o + X s (ξ s )B s + X a (ξ a )B a + E. We treat the knots ξ i as unknown parameters and let them move freely. A model with a minimal number of free knots outperforms model with lots of fixed knots. For notational convenience, we sometimes write model in compact form Y = XB + E, where X = [X o, X s, X a ] and B = [B o 1, B s 1, B a 1 ] 1 and E N p (, Σ) Feng Li (StatMath, CUFE) Multivariate Surface Regression 7 / 16
The multivariate surface model ï The prior Conditional on the knots, the prior for B and Σ are set as [ ] vecb i Σ, λ i N q µ i, Λ 1/2 i ΣΛ 1/2 i b P 1 i, i P to, s, au, Σ IW [n S, n ], Λi = diag(λ i ) are called the shrinkage parameters, which is used for overcome overfitting through the prior. If Pi = I, can prevent singularity problem, like the ridge regression estimate. If Pi = X 1 i X i: use the covariates information, also a compressed version of least squares estimate when λ i is large. The shrinkage parameters are estimated in MCMC A small λi shrinks the variance of the conditional posterior for B i It is another approach to selection important variables (knots) and components. We allow to mixed use the two types priors ( P i = I, P i = X 1 i X i) in different components in order to take the both the advantages of them. Feng Li (StatMath, CUFE) Multivariate Surface Regression 8 / 16
The multivariate surface model ï The Bayesian posterior The posterior distribution is conveniently decomposed as p(b, Σ, ξ, λ Y, X) = p(b Σ, ξ, λ, Y, X)p(Σ ξ, λ, Y, X)p(ξ, λ Y, X). Hence p(b Σ, ξ, λ, Y, X) follows the multivariate normal distribution according to the conjugacy; When p = 1, p(σ ξ, λ, Y, X) follows the inverse Wishart distribution $, & IW n + n, % n S + n S + ÿ. Λ 1/2 i ( B i M i ) 1 P i ( B i M )Λ 1/2 i i - ipto,s,au When p ě 2, no closed form of p(σ ξ, λ, Y, X), the above result is a very accurate approximation. Then the marginal posterior of Σ, ξ and λ is p (Σ, ξ, λ Y, X) =c ˆ p(ξ, λ) ˆ Σ 1/2 Σ (n+n +p+1)/2 β Σ 1/2 β " ˆ exp 1 [trσ 1 ( ) n S + n S + ( β µ β 2 ) 1 Σ 1 ( β µ) ]* Feng Li (StatMath, CUFE) Multivariate Surface Regression 9 / 16
The MCMC algorithm ï Metropolis-Hastings within Gibbs The coefficients (B) are directly sampled from normal distribution. We update covariance (Σ), all knots (ξ) and shrinkages (λ) jointly by using Metropolis-Hastings within Gibbs. The proposal density for Σ is the inverse Wishart density on previous slide. The proposal density for ξ and λ is a multivariate t-density with ν ą 2 df, [ ( ] B θ p θ c MVT ˆθ, 2 ln p(θ Y) ) 1ˇˇˇˇ BθBθ 1, ν, ˇθ= ˆθ where ˆθ is obtained by R steps (R ď 3) Newton s iterations during the proposal with analytical gradients for matrices. The analytical gradients are very complicated and we have implemented it in an efficient way (the key!). Feng Li (StatMath, CUFE) Multivariate Surface Regression 1 / 16
Application to firm leverage data ï The data 1..6.8 1..8.6.8.6.4.6.8 1. 15 2 25 3 5 1 15 2 25 3 Feng Li (StatMath, CUFE).4. 5 5 5 1 5 1 4 6 4 2 2 4 6 4 2 2 4 2 Tang.2.4. 1 3 2 3 2 1 1 1 1 6.2 5 4. 2 1..8 6 4 2.6 6.4 6.2 6. 6 4 2 log[y/(1 Y)].2.4.2...2.4 Y.6.8 1. total debt/(total debt+book value of equity), 445 observations; tangible assets/book value of total assets; (book value of total assets - book value of equity + market value of equity) / book value of total assets; logarithm of sales; (earnings before interest, taxes, depreciation, and amortization) / book value of total assets. 1. leverage (Y): tang: market2book: logsales: profit: Market2Book Multivariate Surface Regression LogSale Profit 11 / 16
Surface + 2 fixed additive knots Surface + 2 free additive knots 122 123 124 125 126 127 128 129 13 131 132 133 134 135 136 137 138 139 LPDS Surface component model Free knots 5 1 15 2 25 3 35 4 Additive component model Fixed knots 5 1 15 2 25 3 35 4 LPDS LPDS 123 124 125 126 127 128 129 13 122 123 124 125 126 127 Surface + 4 fixed additive knots Surface + 4 free additive knots No. of surface knots No. of additive knots 128 129 13 Ò Models with only surface or additive components Surface + 8 fixed additive knots Surface + 8 free additive knots Ñ Model with both additive and surface components. 122 LPDS Log predictive density score which is defined as 123 124 125 LPDS = 1 ÿ D ln p(ỹd D d=1 Ỹ d, X) ż ź = ipτ p(y i θ, x i )p(θ Ỹ d)dθ, d LPDS 126 127 128 129 Free surface knots Fixed surface knots and D = 5 in the cross-validation. 13 5 1 15 2 25 3 35 4 5 1 15 2 25 3 35 4 No. of surface knots No. of surface knots
Posterior locations of knots 1..7.5.6..5.5 Profit 1. 1.5.4.3 2..2 2.5 3. Prior mean of surface knots Prior mean of additive knots.1 3.5 5 1 15 2 25 3 35. Market2Book
Application to firm leverage data ï Posterior mean surface(left) and standard deviation(right) Posterior surface (mean) Posterior surface (standard deviation) 1..8.7 1..8.2.6.6.6.4.5.4.15.2.2 Profit..4 Profit..2.3.2.1.4.4.6.2.6.5.8.1.8 1. 1. 5 1 15 5 1 15 Market2Book Market2Book Feng Li (StatMath, CUFE) Multivariate Surface Regression 14 / 16
Extensions and future work The model and the methods we used are very general. It is easy to generalize the model to GLM framework. Variable selection is possible for knots. Dirichlet precess prior can be plugged into the model when heteroscedasticity is the problem. And the copula... Feng Li (StatMath, CUFE) Multivariate Surface Regression 15 / 16