Bayesian Variable Selection of Risk Factors in the APT model: Estimation and Inference

Size: px

Start display at page:

Download "Bayesian Variable Selection of Risk Factors in the APT model: Estimation and Inference"

May Cobb
5 years ago
Views:

1 Bayesian Variable Selection of Risk Factors in the APT model: Estimation and Inference Robert Kohn University of New South Wales Sydney NSW, 2052 Rachida Ouysse University of New South Wales Sydney NSW, 2052 March 9, 2006 Abstract PRELIMINARY AND INCOMPLETE In this paper we use a probabilistic approach to risk factor selection in the arbitrage pricing theory model. The methodology uses a Bayesian framework to simultaneously select the pervasive risk factors and estimate the model. This will enable correct inference and testing of the implications of the APT model. Furthermore, we are able to make inference on any function of the parameters, in particular the pricing errors. We investigate the macroeconomic risk factors of Chen, Roll, and Ross (1986) and the firm characteristic factors of Fama and French (1992,1993). We also augment the model to allow for both measured economic variables and latent factors. Using monthly portfolio returns grouped by size and book to market, we find that the economic variables have zero risk premium although some appear to have non zero posterior probability. The "Market" factor is not priced. An APT model with factors mimicking size (SMB), book to market equity (HML), value-weighted portfolio and Standard and Poor, is supported by a conditionally independent prior and offers a significant decrease in the pricing error over a two-factor APT with SMB and HML. The posterior probability and cumulative distributions functions of the average risk premium and the pricing errors are compared to the normal distribution. The results show that under certain conditions the distortions are very small.

2 1 Introduction In this paper, we address model selection in the context of a linear factor model with potentially measured and latent factors. The study proposes an exact statistical framework for the estimation and inference in a factor model. We use a Bayesian framework to implicitly incorporate model uncertainty into the estimation of the parameters and model inference. Since the inception of the arbitrage pricing theory (APT) by Ross (1976, 1977), there is an increased interest in the use of linear factor models in the study of Asset pricing. There is a growing evidence that high returns are driven by a multifactor model rather than the one factor capital pricing model (CAPM). The APT has the attractive feature of minimal assumptions about the nature of the economy. However, this tractability comes at the cost of certain ambiguities such as an approximate pricing relation and an unknown set of pervasive factors. In order to test the implications of the APT, one must specify the number and the identity of the factors. There are two main streams in the literature of factor selection in the APT. The first view toward model determination uses latent (unobservable) factors as sources of common variations. These common factors are estimated from sample covariance matrices using statistical techniques like factor analysis and principal components. Bai and Ng (2000) develops an econometric approach to consistently determine the dimension of the model for large panels. Bai (2001) addresses the asymptotic properties of the estimated model. This literature addresses the asymptotic properties of the distributions and therefore is based on the model selection being consistent and therefore treated as deterministic. The second alternative view suggests the use of observed economic variables as factors. There is no doubt that asset prices are intimately linked to macroeconomic activity and that the influences go in both directions. However, little is done to formalize the search for the set of significant influential variables. Chen, Roll and Ross (1986) (CRR) asserts A rather embarrassing gap exists between the theoretically exclusive importance of systematic state variables and our complete ignorance of their identity. CRR attempts to explore this identification terrain by combining a set of likely macroeconomic and financial candidates for pervasive risk in asset returns. The selection procedure consists of a series of t-test for the significance of average risk premia corresponding to each of the variables allowed into the regression. The authors identify five common risk factors that are significantly priced in the stock market. Using a different approach, based on firm characteristics as a proxy for the firms sensitivity to systematic risk in the economy, Fama and French (1993) shows that the variation in returns on stocks and bonds can be explained by five size and book-to-market based factors. However, these studies raise two fundamental critiques. First, the number of factors is often assumed and arbitrarily prespecified. Second, the set of potential pervasive factors is subjectively reduced to a few number of candidates and only a few specifications are tested. Hence, no statistical justification is provided to justify the selected set of variables. Ouysse (2005) proposes a formal econometric procedure to consistently select the set of pervasive factors in panels with large dimensions. However, the study does not address the post-selection properties of the model estimates. For example, Fama and French (1996) indicates that market anomalies largely disappear in a three-factor model. Ferson and Korajczyk (1995) finds that a five factors model capture a large fraction of asset returns predictability.

3 The APT implies nonlinear restrictions on the model parameters, which make it very difficult and complex to derive the asymptotic distribution of the restricted estimates, let alone, the distribution of post-selction restricted estimates. The Bayesian approach enables exact inference by making it possible to implicitly incorporate model uncertainty and to derive post-selction distributions of any functions of the parameters. Geweke and Zhou (1996) uses a Bayesian framework to analyze the APT. The authors propose the use of the pricing error to test the implication the APT that the expected returns are approximately linear function of the risk premium on systematic factors. The authors use latent factors and do not perform a selection of the appropriate number of factors. They borrow the results from the asymptotic principal component analysis of Connor and Korajczyk (1986,1993) and propose the use of 1 to 4 factors. The present study extends Geweke and Zhou (1996) to allow for both latent and measured factors. In particular, this method makes it possible to derive the exact post-selection posterior distributions for the measures of the pricing error and for the measure of the systematic risk and risk premia. 2 Methodology 2.1 Asset Pricing model In the finance literature, the debate on what drives excess returns continues. A large number of studies use factor models to identify the common sources of systematic risk in expected returns. The factors used are classified into latent factors estimated through statistical methods as principal components and factor analysis and observable factors based on the sensitivity of stocks returns to economic and financial news. The use of observable variables to explain excess returns is particularly appealing. The estimated factor loadings have a meaningful interpretation. The estimated pricing relationship can be used to stimulate the financial markets through the pervasive economic and financial variables. The ability to predict excess returns with tangible factors can be useful for portfolio management and stock market investment decisions. Unlike the Capital Asset Pricing model, the AP T allows for multiple risk factors to enter the return generating process for asset returns. In a rational asset pricing model with multiple beta, expected returns of securities are related to their sensitivities to changes in the state of the economy. Let Y it be the return on asset i at time t. Assume that asset returns follow an approximate k 0 factor model, Y it = α i + Pk0 b ij x tj + ε ti (1) j=2 The intercept α i istheexpectedreturnonasseti, α i = E[Y it ]. The risk factors are common across the assets. Therefore, the asset risk can be divided into a common diversifiable risk due to the exposure to the k 0 common risk factors x t, and an idiosyncratic risk due to the idiosyncratic factor ε it. The betas b ij, or An approximate factor structure, first introduced by Chamberlain and Rothschild (1983), relaxes the static factor model assumption of digonal idiosyncratic covariance matrix to allow for a limited amount of cross-section dependence. The APT is based on the pricing relation for a countably infinite vector of returns to a countably infinite set of traded assets.

4 factor loadings of the j th factor for asset i, represent the amount of exposure to each risk factor. There are T time periods and N assets. Reverting to the more general setup, for each asset i, y i = y i = Xβ i + i E( i X) = 0; E( i 0 i )=σ 2 i I T where X =(X 1.., X k,..,x K ), β i =(α i,b i1,..,b ik ) 0 and i =(ε i1,...,ε it ) 0 with E εit 2 = Σii. We will consider two additional representation of the regression model in??. The time series regression canbeexpressedinthefollowing y t = (I N x t ) β + ε t E(ε t x t ) = 0, E(x t )=0, E(ε t ε 0 t )=Σ where y t is an N-vector of returns; x t [X 1t,..,X Kt ], is a K row-vector; and ε t =(ε 1t,...,ε Nt ) 0. The slope coefficient, β =(β 0 1,..,β 0 i,...β 0 N) 0 is an N K-column vector. Σ is a N N positive definite matrix representing the structure of the cross-section dependence with elements σ ij = E(ε it ε jt ). In our analysis, it is convenient to work with the matrix form of the model: y NT 1 =(I N X) NT Nk β Nk 1 + ε NT 1 where y = (y1,..y 0 i 0,..,y0 N )0 = (y 0 1,..y0 i,..,y0 N )0, X = (x 0 t,..,x 0 t,..,x 0 T )0 and ε = ( 0 1,.., 0 i,.., 0 N )0 = ( 0 1,.., 0 i,.., 0 N )0. The idiosyncratic factors are assumed to be uncorrelated with the factors, X. They are also assumed to have a normal distribution with mean zero and covariance matrix Σ I k T. We can also compactly write the model in its multivariate form (2) Y = XB + v (3) V (Y X, B, Σ) = B 0 X 0 XB + Σ (4) where Y =(y 1,..,y i,..,y N ), B =(β 1,..,β i,...β N ), and v =( T N K N 1,.., i,.., N )=( 1,.., i,.., N ). T N Therefore, y = vec(y ) and β = vec(b). The absence of riskless arbitrage opportunities implies well-known restrictions on (1), namely α δ 0 + B 0 δ (5) δ = (δ 1,...,δ K ) 0 where λ 1 is the riskless return, provided at least one risk-averse investor holds a portfolio without any residual risk and λ j is the risk premium on j th factor. Equation (5) represents an approximate linear X 1 here is a vector of ones representing the intercept. k If there is serial correlation, the covariance matrix will be of the more general form, E(εε 0 )=Σ Γ. The riskless rate will be measured by the 30-day Treasury-bill rate that is known at the beginning of each period month. Connor (1984) replaces the approximation with equality under the assumption of competitive equilibrium.

5 relationship between the expected asset returns and their risk exposures. It implies that the risk premium on an asset, β i1 δ β ik0 δ k0, by its factor loadings. Exact arbitrage pricing obtains when (5) holds with equality. Geweke and Zhou (1996) proposes a measure of pricing error given by the average of squared deviations from the restriction across assets. The pricing error is measured by N Q 2 N = X αi δ 0 β 0 i δ 2 /N i=1 Q N 0 as N The authors argue that for Connor s equilibrium AP T, Q is equal to zero. For Ross s asymptotic APT, the pricing error will converge to zero as the number of assets approaches infinity. Although for fixed N, the pricing error is not necessarily small, the authors suggest the use of the pricing error to examine some of the testable implications of the APT. Another measure of the pricing error which takes into account the cross section correlation structure in the idiosyncratic term is given by, ³ α e 0 ³ B 0 δ Ω α e B 0 δ eq 2 N = N eb = (ı, B); δ =(δ 0,...,δ k0 ) 0 eq N 0 as N Stated differently, the pricing theory imposes a testable cross-equation restriction on the parameters of a multivariate regression of asset excess returns on the factors It implies zero intercepts in a regression of asset excess returns on the factors. A test of mispricing is a test for non-zero intercept. 2.2 Review of the Classical Inference There are mainly two approaches to estimating and testing the APT. First, traditional factor analysis uses a likelihood ratio to test the restrictions implied by the APT. This involves computing the maximum likelihood estimates under the nonlinear restrictions in (5.) This is a very difficult task in practice and the model inference is non standard. Indeed,? shows that the asymptotic distributions of the model parameters The equilibrium version of the APT implies that E(y t )=r Ft e + ebλ t (6) where r Ft represents the return on a riskless asset, e is a vector of ones, λ t is a k 0 vector of factor risk premiums. Combining equations (??) and (6) gives, ey = B e X e + v where the N T matrix of excess returns is given by ey = Y I N rf 0 and ex is the T k 0 realizations of (x t + λ t). The pricing theory imposes a testable cross-equation restriction on the parameters of a multivariate regression of asset excess returns on the factors. Let µ be the vector of intercepts in a regression of asset excess returns on the factors. The pricing theory implies that µ should be identically zero.

6 estimates are very complex in factor analysis, and the constrained estimates should be even more complex. This complexity makes it difficult to derive the asymptotic distribution of the likelihood ratio tests. The second approaches surmount the complexity of the estimation under nonlinear restrictions by a twopass approach. In the first pass, either the factor loading or the factors are estimated. In order to estimate the factors betas (factor loading) of assets, excess returns are regressed against the common factors using thetimeseriesfromt 60 to t 1 to get the conditional betas β b n o k=1,..,k ik,t 1.The estimates bβik are then i=1,..,n used in the second pass. Treating these estimates as the true variables, the APT restrictions in (5) become linear constraints on the coefficients of the multivariate regression. In fact, these restrictions imply zero intercepts and can be tested using the standard methods. To estimate the risk premia, a cross sectional regression model is utilized for each time point to get the time series of each risk premia. For each month t of the next 12 months, perform cross section regressions: Y it = δ 0t + P K k=1 β ik,t 1δ kt + ε it with i =1,..,N and get an estimate of the sum of risk premium b δ kt for month t associated to variable k, t =1,..,12. The two-pass steps are repeated for each time period in the sample. Estimation technique: (i) Regress excess returns on the economic variables using the time series from t 60 to t 1 to get the conditional betas β b ik,t 1. (ii) For each month t of the next 12 months, perform cross section regressions: R it = δ 0t + P K k=1 β ik,t 1δ kt + ε it with i =1,..,N and get an estimate of the sum of risk premium b δ kt for month t associated to variable k, t =1,..,12. (iii) Steps (i) and (ii) are repeated for eachyearinthesampleperiod. Thetimeseriesmeansoftheseriesofriskpremiumestimatesassociated to each variable are then tested for their significance. A factor with statistically significant risk premia is priced by investors in the market. This method is based on the estimates from the first pass being consistent. However, in small samples, this procedure suffers from errors in variables problem. The uncertainty about the first pass estimates can lead to misleading inference. Variable selection adds an extra source of uncertainty to the model and an extra dimension to the complexity of the first method and to the unreliability of the inference in the second method. Indeed, the empirical literature so far used ad hoc methods to select the variables to enter the APT model and failed to address the issue of post variable selection inference. 3 Bayesian Inference The factors in (??) are unknown but are assumed to be elements of a finite set of potential variables. Let K be the total number of potential variables represented by the columns of the matrix X. Further,letX 0 be the set of pervasive factors in the true data generating process. Define the Bernoulli random variable γ j 1 if X j X 0 as: γ j = 0 otherwise. Therefore, γ = ª K γ, is a selector vector over the column of X. Let q j j=1 γ

7 be the "Binomial "random variable representing the number of variables in the selected model, therefore q γ = X i=1,..,k The objective of this paper is to perform a factor selection and test the adequacy of the APT as a pricing model for the assets. The Bayesian approach makes it possible to implicitly incorporate the uncertainty about the risk factors and to estimate simultaneously in one step the betas and the risk premia which circumvents the shortcomings of the two-pass procedure. Furthermore, we are able to make inference on any function of the parameters, in particular the pricing errors. We can also carry out tests of efficiency of the APT using the posterior odds ratio and Bayesian confidence intervals. We use a full Bayesian specification to evaluate the posterior distributions of the parameters of the model. We will consider both diffuse and informative priors and we will use a Markov Chain Monte Carlo (MCMC) with a Gibbs sampler for the case of measured economic factors and a reversible a reversible Jump MCMC for the case of measured and latent risk factors. We will use a Gibbs sampler to draw from the posterior distributions of γ, β, λ and Σ. γ i 3.1 LinearFactormodelwithmeasuredfactors Based on the idea that asset prices react sensitively to economic news,? used economic forces to proxy for the systematic influences in stock returns. Using intuition and empirical investigation, the authors combined macroeconomic variables and financial markets variables to capture the systematic risk in asset returns and suggests a five factor AP T model. To assess whether the risk associated to a given variable is rewarded in the market, the authors test the significance of the estimated risk premia using a t statistic using 20 equally weighted portfolios constructed on the basis of firm size as dependent variables. Their results show evidence of five factors. CRR concludes that the spread between long and short interest rate (UTS), expected (EI)and unexpected inflation (UEI), monthly industrial production (MP) and the spread between high- and low-grade bonds (URP) are significantly priced. However, neither the market portfolios (EWNY, VWNY) nor consumption (CG) are priced separately. Fama and French? argued that size and book to market equity are related to economic fundamentals. They suggested the use of firm characteristics, such as size (ME) and book to market equity (BE/ME), to construct factors portfolios proxy for sensitivity to common risk factors in returns. The authors used slopes and R 2 values to test whether these mimicking portfolios capture shared variation in stock and bond returns. Their results show that the portfolios constructed to mimic risk factors related to ME and BE/ME capture strong variations in stock returns. Using 25 stock portfolios as dependent variables, their results show evidence that a three factor model, using Market, SMB and HML as risk factors, captures the common variations in the cross section of stock returns Uninformative Prior The standard diffuse prior for the multivariate regression model (??) is the following prior density on the parameters β and Σ : p(β,σ) Σ (N+1) 2 (7)

8 where Σ is the determinant of the covariance matrix Σ. With this prior, the posterior density for β, conditional on r and Σ, is ³ β Σ,y,γ,X N β b γ, Σ Xγ 0 X 1 γ where β b i γ = hi N (X 0 X) 1 X 0 y, and Ω = Σ 1. The posterior density for the inverse covariance matrix Ω, conditional on the data is a Wishart with shape parameter δ = T q γ + N 1 and scale matrix Sγ 1,where S γ = Y 0 (I X γ (XγX 0 γ ) 1 Xγ)Y, 0 Ω y, γ,x W N (δ,sγ 1 ) The variable selection process is performed by sampling from the posterior density of the indicator variable γ. Assuming a uniform prior, p(γ i =1)=0.5, the posterior density is given by p(γ y) = p(y, γ) p(y) = p(y γ)p(γ) p(γ) p(y γ) p(γ y, X) X0 γ X γ N 2 S γ δ 2 (8) The general Normal-Wishart prior The diffuse prior was first introduced into Bayesian multivariate analysis by Geisser and Cornfield (1963). It is a prior of minimum prior information. However, if one has prior information on β, an informative prior should be used. Indeed, Monte Carlo integration makes it possible to work with many choices of priors. In this section we will derive the full conditionals under a general form of Normal-Wishart prior. The amount of prior information and its importance relative to the sample information will be determined by the covariance matrix of the prior density of β. Proposition 1 Lemma 1 Consider the Normal-Wishart prior for β and Σ, β Σ N β 0, Σ H β and Σ 1 W N m, Φ 1 a wishart distribution with location Φ and scale parameter m>n+ 1.Under uniform priors on γ, the full conditionals are given by, ³ 1 ³ 1- β y, Ω,γ N eβγ, Σ D where D γ = XγX 0 γ + H 1 β and ³ eβ γ = I N Dγ 1 H 1 β β 0 + I N Dγ 1 X0 γ X γ β b GLS bβ GLS = hi N XγX 0 1 i γ X 0 γ y 2- Ω y,γ W N (m + T,(S(γ)+Φ) 1 ) where S(γ) = (Y B 0 W ) 0 I T X γ D 1 γ X0 γ (Y B0 W )+B 0 0 LB 0 W γ = I T XD 1 X 0 1 XD 1 γ H 1 β L = Σ 1 V γ V γ = H 1 β β 0 = vec(b 0 ) Given this prior, the firstmomentofσ is E(Σ) = H 1 β D 1 H 1 β H 1 β D 1 γ X 0 W γ Φ and therefore the scale parameter m>n+1. m N 1

9 3. p (y γ,x) H β N 2 X 0 X + H 1 N 2 m Φ 2 Φ + S β (T +m) 2 In order to implement the selection process, the hyperparameters β 0 and H β.determining the on β need to be specified. Assuming that no subjective information about these parameters is available, their values will be set in order to minimize their influence. Two values of the prior mean are considered in this application. The default prior β 0 =0which reflects indifference between positive and negative values and, a somewhat more informative prior which centers the prior distribution around the generalized least squares estimator β 0 = β b GLS. The covariance matrix H β determines the amount of information in the prior and will influence the likelihood covariance structure. In the literature, the specification is simplified to H β = cv β. The preset form V β, can be chosen to either replicate the correlation structure of the likelihood by setting V β =(X 0 X) 1, this is also the g-prior recommended by Zellner (1986); or to weaken the covariance in the likelihood by setting, V β = I K, which implies that the components of β are conditionally independent. The scalar c is a tuning parameter controlling the amount of prior information. The larger the value of c, the more diffuse (more flat) is the prior over the region of plausible values of β. The value of c should be large enough to reduce the prior influence. However, excessively large values can generate a form of the Bartlett-Lindley paradox by putting increasing probability on the null model as c. In the literature, different values of c were recommended depending on the application at hand. In this paper we will consider three values of c {4,T,K 2 }, which represent the most used values in the literature. As pointed out in Chipman, George and McCulloch (2001), there is an asymptotic correspondence between these choices of c and the classical ³ information criteria AIC, BIC and RIC respectively when V β =(X 0 X) 1. Given that D = X 0 X + H 1 β, one can see that under V β = I K the posterior correlations will be less than those of the design correlation. The posterior correlations are however identical to those of the design matrix under the prior V β =(X 0 X) 1. The conditionally independent prior for β, ³ β Σ N bβgls, Σ ci K (9) is equivalent to the prior B N (Σ,cI K ), a matrix variate representation used by Brown et al. (1998), where ci K is the covariance matrix of B Σ. Therefore, the columns of B in 3 are independent under this prior. Given this prior, the posterior densities for the parameters and the covariance matrix are given by Ã µ! 1 1 β Σ,y,γ,X N bβ γ, Ω c I q γ + Xγ 0 X γ (10) where bβ γ = h I N Xγ 0 X 1 i γ X 0 γ y (11) The posterior density of the inverse covariance matrix, conditional on the sample information, Ω r, γ, X W N ³ m + T,(S(γ)+Φ) 1 (12) In a simulation study of the effect of the choice of c on the posterior probability of the true model, Fernandez, Ley and Steel (2001) found that the effect depends on the true model and noise level and they recommend the use of c =max{t,k 2 }.

10 where the location matrix depends on the sample data through the sum of squares residuals S(γ) =Y 0 (I X γ (XγX 0 γ ) 1 Xγ)Y. 0 Given the same uniform prior on the indicator variable, the posterior density is adjusted in light of the informative prior in (9). The posterior density for γ, conditional on the data, is p(γ y,x) c (Nq γ/2) X0 γx γ + 1 c I q γ Now lets consider a different informative prior for the parameters of the model, N 2 Φ m 2 S(γ)+Φ δ 2 (13) β Σ N 0, Σ c(x 0 X) 1 (14) The parameter c measures the amount of information in the prior relative to the sample. Setting c =50 gives the prior the same importance as 2% of the sample. Given this new prior and given the Wishart prior in (??), the full conditional densities for the unknown parameters of the model are given by, µ c β Σ,y,γ,X N eβ γ, 1+c Σ (X0 X) 1 (15) where β f γ = c 1+c (X 0 γ X γ ) 1 Xγ 0 I N y. The full posterior density for the covariance matrix is a Wishart, Ω y,γ,x W N (m + T,³ S(γ)+Φ 1) e (16) where S(γ) e h i =Y 0 I c 1+c X γ(xγx 0 γ ) 1 Xγ 0 Y. Finally, given the same uniform prior on the indicator variable,the posterior density for γ, conditional on the data is as follows p(γ y,x) (1 + c) Nqγ 2 Φ m 2 Φ + es(γ) (T +m) 2 (17) Random draw from the posterior density γ y, X In the Gibbs sampler each element, γ k in the indicator variable is generated at a fixed or random order from the full conditional distributions. Each γ k is a Bernoulli random variable with where p k = P (γ k =1 γ /k,y,x)= γ /k = γ 1,..γ k 1,γ k+1,..,γ K ª L 1+L L = p γ =(γ 1,..γ k 1, 1,γ k+1,..,γ K ) y, X p γ =(γ 1,..γ k 1, 0,γ k+1,..,γ K ) y, X In the case of the standard diffuse prior in(7),the odds ratio which defines the

11 L = Xγ(1) 0 X γ(1) Xγ(0) 0 X γ(0) N 2 S γ(1) δ 2 N 2 S γ(0) δ 2 log L = δ S γ(1) 2 log N S γ(0) 2 log δ = T + N q γ 1 Xγ(1) 0 X γ(1) Xγ(0) 0 X γ(0) (18) Likewise, for the informative g-prior case, the random mechanism general informative prior (See Appendix), L = p [γ(1) y] p [γ(0) y] p(y γ(1)) p(y γ(0))! N ÃH β (1) 2 Xγ(1) 0 X γ(1) + H β (1) 1 H β (0) logl = N H 2 log β (1) H β (0) N 2 log Xγ(0) 0 X γ(0) + H β (0) 1 N 2 X 0 γ(1) X γ(1) + H β (1) 1 X 0 γ(0) X γ(0) + H β (0) 1 µ (T+m) Φ + S(1) 2 Φ + S(0) (T + m) 2 log Φ + S(1) Φ + S(0) this term represents the ratio of the marginal likelihood of the data conditional on the model. informative diffuse prior case 1: X L c N γ(1) 0 2 X γ(1) + 1 c I N 2 q γ(1) Xγ(0) 0 X γ(0) + 1 c I q γ(0) log L = N 2 log c N X 0 2 log γ(1) X γ(1) + 1 c I q γ(1) Xγ(0) 0 X γ(0) + 1 c I q γ(0) µ (T +m) Φ + S(1) 2 Φ + S(0) (T + m) 2 log S γ(1) + Φ Sγ(0) + Φ (19) In the (19) (20) In the data dependent g-prior, (T +m) 2 Φ + es γ(1) L =(1+c) N 2 Φ + S e γ(0) log L e = N (T + m) es γ(1) + Φ log(1 + c) log 2 2 es γ(0) + Φ (21) lim L c = 0 lim L X 0 X

12 3.2 APT with Observed and Latent Factors y (NT 1) r (NT 1) = (I N X) β (NT Nk) (Nk 1) = (I N Z) θ (NT Nq) (Nq 1) +(I N F) λ + ε (NT Nr) + ε; q = k + r; Z =[X, F ] There are only N(N +1)/2 distinct elements in Ψ = V (y t ). The number of unknown parameters in the factor model is: Nq = N(k + r) plus the N(N +1)/2 elements of the errors covariance matrix.σ Obviously we need to put some restrictions on the parameters to be able to estimate the model. Lets assume that the covariance matrix of the error terms is diagonal. That is there are only N elements to estimate. In this case, the total number of parameters is: N(k + r) +N. Is is necessary to have: N(N +1)/2 N(k + r) N º 0. We need to have k + r ¹ N 1 2 Prior Density for the Latent factors:the latent factors and the dependent variables are jointly normally distributed. f t N (0, I r ) y N (I N X) β, tr(λλ 0 )+Σ I T Issue of Identification:Because the latent factors are identified only up to an orthogonal rotation. We need to put some restrictions on the loading matrix Λ. Lets assume that Λ has full rank r and that the matrix composed by the first r rows is nonsingular: Λ (r). There exists a unique orthogonal matrix P such that Λ (r) P 0 is lower triangular with positive diagonal elements. Therefore, in order to identify the factors, we will assume that: λ 11 λ λ r1 h i Λ = Λ (r) Λ (N r) where, Λ (r) 0 λ λ r2 = λ ii.. λ ri λ rr where λ ii > 0,i =1,..,r. In terms of λ = vec(λ).these restrictions decrease the number of free parameters to estimate in the model to N(N +1)/2 N(k + r) N + r(r 1)/2 0. That is the upper bound for the number of parameters in the model now is higher. In terms of the elemets of λ, these restrictions become: Proposition 2 Consider the Normal-Wishart prior for θ and Σ, θ Σ N (θ 0, Σ Ψ) and Σ 1 W N m, Φ 1 a Wishart distribution with location Φ and scale parameter m>n+1. The covariance matrix of the prior distribution on the unobservable parameters is block diagonal Ψ = H β 0 Under uniform priors on γ, the full conditionals are given by, 0 H λ The prior distribution assumes that the loadings on the latent factors are uncorrelated to the loading of the observed economic variables.

13 1. θ y,ω,f,γ N ³ eθ,σ 1 H θ where H θ = Z 0 Z + Ψ 1 and e θ = ³I 1 N H 1 θ Ψ θ 0 + i b θgls = hi N (Z 0 Z) 1 Z 0 y ³ ³ I N H 1 θ Z 0 Z bθgls = eβ 0 0 0, e λ (23) ³ 2. β y,ω,f,γ N eβ, 1 Σ H β the Nk first elements of e θ in [23]. ³ 3. λ y, Ω,F,γ N eλ, 1 Σ H λ the Nr last elements of e θ in [23]. 4. Identifying restrictions: where H β = X 0 X +H 1 β where H λ = F 0 F +H 1 λ F 0 X X0 F F 0 F + H 1 λ F 0 X and β e represents ³ X 0 X + H 1 β (a) For i =1,..,r; λ (i 1)N +i > 0 and i =2:r, λ (i 1)N+1:(i 1)(N+1) =0 X 0 F and e λ represents (b) Let λ i = ª λ (i 1)r+1:(i 1)r+i for i=1:r ; and F [i] is the matrix of latent factors including the first i columns of F. The posterior density for λ i : ³ λ i y, Ω,F,γ N fλ i, Σ H 1 λ 1(λ (i 1)r+i > 0 ³ where H λ = F [i]0 F [i] + H [i] 1 ³ λ F [i]0 X X 0 X + H 1 β X 0 F [i] H [i] λ = H λ{1:i,1:i] and λ f n o i = eλ(i 1)r+1:(i 1)r+i (c) Let λ i+r = ª λ (i 1)r+1:(i 1)r+r for i = r +1:N, ³ λ r+1 y, Ω,F,γ N eλ i+r, Σ H 1 λ ³ where H λ = F0F +(H λ ) 1 F 0 X H [i] λ = H λ{1:i,1:i] and e λ i+r = n eλ(i 1)r+1:(i 1)r+r o 5. Ω y, F W N (m + T,(S F (γ)+φ) 1 ) where ³ Z0 S = (Y Θ 0 W ) 0 I T ZH 1 θ (Y Θ 0 W )+Θ 0 0 LΘ 0 h W = I T ZH 1 θ i 1 Z 0 1 XH θ Ψ 1 V = Ψ 1 Ψ 1 H 1 θ Ψ 1 Ψ 1 H 1 θ Z 0 W L = Σ 1 V θ 0 = vec(θ 0 ) X 0 X + H 1 β X 0 F 6. p (y F, γ) Ψ N 2 Z 0 Z + Ψ 1 N 2 Φ m 2 (T +m) 2 Φ + S F (γ) 4 REVERSIBLE JUMP MCMC Reversible jump MCMC is a generalization of the Metropolis Hastings algorithm which introduces moves between parameter spaces of different dimensionality while retaining the detailed balance property, which

14 is required for convergence within each type of move. Basically, there are within model moves (the usual MCMC to update parameters of the model given a dimension) and between models moves which update the model dimension. Let r be the number of latent factors in the model: r r max. Assume a uniform prior over the dimension space, that is: p(r) = 1 r max The acceptance probabilities (from x to y) in the M-H algorithm are defined as follows: π(y) q(x y) α (x y) =min 1, π(x) q(y x) {z } {z } where π(.) is the target density and q(y, x) is the proposal density for a move from y to x. The RJMCMC proposes an acceptance probability for interdimension moves as follows: α ((λ r,r) (λ r 0,r 0 )) = min 1, p (λ r 0,r0 y) q(λ r λ r 0,r,y) J(r 0 r) p (λ r,r y) q(λ r 0 λ r,r 0,y) J(r r {z } 0 ) {z } p (λ r 0,r 0 y) = p (λ r 0 y, r 0 ) p (r 0 y) =p (y λ r 0,r 0 ) p (λ r 0 r 0 ) = min 1, p (y λ r 0,r0 ) p (λ r 0 r 0 ) q(λ r λ r 0,r,y) J(r 0 r) p (y λ r,r) p (λ r r) q(λ r 0 λ r,r 0,y) J(r r {z } 0 ) {z } Transition probability from state k to state k 0 We can start by assuming that the proposed move does not depend on the state information other than its dimension: q(r 0 λ r,r,y)=q(r 0 r) =J(r r 0 ) If the current dimension is r, the choice of model move is determined by a transition density J(r r 0 ); the latter is a conditional probability of moving to state r 0 given current state is r. Lets assume a discretised Laplacian density J(r r 0 ) p(r 0 r) exp ( η r 0 r ) Proposal densities Following Lopes and West (2004), we will use a preliminary set of parallel MCMC that are run over a set of prespecified values r of the number of factors. These chains will produce within model posterior samples for (λ r,f r ) that approximate the distributions (λ r,f r r, y). From these samples we compute the posterior means and use these to guide the choice of analytically specified distributions to be used to generate candidate parameter values. We use the posterior means and posterior variance from these preliminary MCMC as parameters of the proposal densities.

15 4.1 RJMCMC Algorithm Lets assume the number of potential observed variables is known K. There is uncertainty on the potential number of latent factors.r = {0, 1, 2,..,r max } 1. Estimate the proposal densities using a preliminary set of parallel MCMC Initialize the model. Start value for r (0)ª Set the current values of λ (0) r to a draw from from the posterior (proposal density) q r (λ r )= p(λ r r (0),y). This step will also produce a set of values for the latent factors F r (0) 2. iteration j =1:M (a) i. Draw a candidate value of r 0 from the proposal distribution J(r (j 1) r 0 ) ii. Draw a set of parameters values from q r 0(η r 0) = p(η r 0 r 0,y),.the proposal density, η r 0 = (λ, σ 2 i ) r 0 iii. sample u U(0, 1) iv. if u<α λ r (j 1),r (j 1) λ r (j),r (j) = =min 1, p (y η r 0,r0 ) p (η r 0 r 0 ) q r (η r 0)J(r 0 r) p (y η r 0,r) p (η r 0 r) q r 0(η {z } r 0)J(r r 0 ) where q r(η r 0)=p(η r 0 r, y). {z } then accept the move to state r (j) = r 0 v. else remain in current state r (j) = r (j 1). end if vi. given the state r (j) run an MCMC step to update and generate iterates for the model parameters and the latent factors. A. Update F : f t y t,η r (j),β (j 1) B. Update Ω r(j) and λ r (j):draw the covariance iterate Ω r(j) from Ω y, F r(j),r (j) C. Update λ r (j) : p ³λ r(j) i y, Ω r(j),f D. Draw the iterates β (j) (coefficient for the observed factors) using β F r(j),r (j), Ω r(j),y Use a Gibbs sampler: γ = {γ l } l=1:k FOR l = 1 : K Draw u U(0, 1) Compute the posterior odds of γ l = {0, 1} Let γ (1) = γ γ l =1,γ l = γ (j 1) l Now, in our model, we only have the density of y conditional on the latent factors.

16 p (y F τ,γ) c Nq 2 X 0 X N N/2 2 M Zγ Πγ X k δ = δ i and r τ = X i=1,..,k where δ i = 1 if X i in the model and τ i = 1 if F i in the model i=1,..,r τ i (2N+T +2) 2 P ³ γ l =1 y,f r(j),γ (j) 1:l 1,γ(j 1) l+1:k ³ p y F = F r(j),γ (j) 1:l 1,γ(j 1) l+1:k,γ l =1 c Nq 2 X 0 X N N/2 2 M Π (1) 2 Zγ γ (2N+T +2) P ³ γ l =0 y,f r(j),γ (j) 1:l 1,γ(j 1) l+1:k ³ p y F = F r(j), (γ l =0),γ (j) 1:l 1,γ(j 1) l+1:k c Nq 2 X 0 X N N/2 2 M Π (0) 2 Zγ γ (2N+T +2) log(p log(p ³γ r(j) (Nq γ ) (1) l =1 y,f ) log(c) 2 N 2 log M Zγ(1) ³γ r(j) (Nq γ ) (0) l =0 y,f ) log(c) 2 N 2 log M Zγ(0) (2N + T +2) 2 (2N + T +2) 2 log log Π (1) γ Π (0) γ log L = log(p ³γ r(j) l =1 y,f ) log(p ³γ r(j) l =0 y,f ) N 2 log(c) N 2 log M Zγ(1) log M Zγ(0) (2N + T +2) 2 ³ log Π (1) γ log 1. (a) i. If u<p then γ (j) l =1;γ (j) = γ (1) and S (j) = S (1) else γ (j) l =0; γ (j) = γ (0) and S (j) = S (0) end else Sample β (j) F r(j),r (j), Ω r(j),y from N( β e δ,v) Π (0) γ A. β e γ (j)= Ω (j) X 0 1+c γ (j) M F r (j) + c I 1 T Xγ (j) Ω (j) X 0 γ (j) M F r y (j) V = Ω (j) X 0 1+c γ (j) M F r (j) + c I ³ T Xγ (j) M F (j 1) = I T F r(j) F r(j)0 F r(j) + 1 c I (j) 1 r F r (j) 0 l = l +1 end FOR 1:K

17 (b) j = j +1 Back to step (2) Output of the RJMCMC: {r (1),..r (j),..r (M) } {Ω (1),..Ω (j),..ω (M) ; {λ (1),..λ (j),..λ (M) };{β (1),..β (j),..β (M) } 5 Empirical Results 5.1 Measured Economic and Firm Characteristics Variables Based on the idea that asset prices react sensitively to economic news, Chen, Roll and Ross (1986) (CRR) uses economic forces to proxy for the systematic influences in stock returns. Using intuition and empirical investigation, CRR combines macroeconomic variables and financial markets variables to capture the systematic risk in asset returns and suggests a five factor AP T model. Usingasetof20 equally weighted portfolios constructed on the basis of firm size as dependent variables, the authors apply Fama and MacBeth (1973) two-step estimation procedure to estimate the average risk premia associated to the variables included in each regression. To assess whether the risk associated to a given variable is rewarded in the market, they test the significance of the estimated risk premia using a t statistic. Results show evidence of five factors: CRR concludes that the spread between long and short interest rate (UTS), expected (EI)and unexpected inflation (UEI), monthly industrial production (MP) and the spread between high- and low-grade bonds (URP) are significantly priced. However, neither the market portfolio (EWNY, VWNY) nor consumption (CG) are priced separately. Furthermore, they found no evidence that oil prices (OG) are rewarded separately. Fama and French (1992) documents that size and book to market equity are related to economic fundamentals. This motivates the use of these firm characteristics to construct factors portfolios. Fama and French (1993) suggests that variables that are related to average returns, such as size (ME) andbooktomarket equity (BE/ME) must proxy for sensitivity to common risk factors in returns. The authors use slopes and R 2 values to test whether these mimicking portfolios capture shared variation in stock and bond returns. Their results show that the portfolios constructed to mimic risk factors related to ME and BE/ME capture strong variations in stock returns. Using 25 stock portfolios as dependent variables, their results show evidence that a three factor model, using Market, SMB and HML as risk factors, captures the common variations in the cross section of stock returns. In this application we consider the monthly value weighted returns for the intersections of 10 ME portfolios and 10 BE/ME portfolios from Fama and French. The portfolios are constructed at the end of June. ME is market cap at the end of June. BE/ME is book equity at the last fiscal year end of the prior calendar year divided by ME at the end of December of the prior year. The sample period considered in thevariableselectionis1960 : : 12. We also report results on the subperiod 1980 : : 12. The authors use portfolios instead of individual stocks in order to control for the errors-in-variables problems that arises from the use of the two steps estimation technique.

18 Measured Variables Measured Variables Consumption CG Private Savings SAVE Term Structure UTS Money Growth GB Risk Premium URP US/Canada Ex. rate CAN Inflation I US/UK Ex. rate UK Unexpected Inflation UEI US/Japan Ex.rate JAP Change in EI DEI Value-Weighted return VWRET Prod Growth (M) MP Standard and Poor SP500 Prod Growth (A) YP Market portfolio MARKET Oil Prices OG Small Minus Big SMB Unemployment UNEP High Minus Low HML These two periods are used estimation. In a later section, we will assess the performance of the most promising models over the period 2000 : : 12 to explain the cross section of expected returns.. Given the Normal-Wishart priors described in section 3.1.2, we use a Gibbs sampler to draw an ergodic Markov chain sequence for γ, β and Ω. These parameters are drawn from their full conditional distributions as described in Lemma 1. The posterior distribution of the average pricing errors are then straightforward obtained from these samples of the model parameters. The following are the quantities of interest computed from the MCMC sequences, - The posterior mean of the indicator variable γ, denotedγ y. The elements in which γ y will represent the posterior probability of each variable being in the true data generating process. - Iterates for the risk premia, δ i for each variable i with nonzero posterior probability. In addition of reporting their posterior mean,δ y, and standard deviation, Std(δ y), we plot their histogram and a kernel density estimator of the posterior distribution. To assess significance of the risk premia, we also compute the Bayesian confidence intervals for 90, 95 percent confidence level. - Since variable selection will depend on the loss function considered, we report the model estimates for a selection based on the highest posterior model probability p(y γ) as well as the results for a selection based on minimizing the pricing errors. As a notation, we use δ max p(y γ) and δ min Q respectively. - Iterates for the pricing errors, Q 2 and Q e2, their histogram and kernel density estimate of their posterior distribution as well as the Bayesian confidence intervals. - It is of interest to examine the expected returns, the systematic risks, and the unsystematic risks in the APT model. We provide Bayesian point estimates of the expected asset returns computed MX as the posterior means of the α i,i.e.α pm i α [j] i. The posterior mean, Σ pm, of the Σ iterates = 1 M j=1

19 represents the Bayesian estimate for the idiosyncratic risks. The Bayesian estimate for the total risk is given as the sum of the estimate for the systematic risk BpmX 0 γx 0 γ B pm and the estimate for the nondiversifiable Σ pm. We report the proportion of systematic risk to the total risk denoted D1 and given the ratio of the diagonal elements of the systematic risk to the diagonal elements of the total diag(bpm risk, D1 = 0 X0 XB pm). We also report the proportion of Bayesian total risk to the sample diag(bpm 0 X0 XB pm+σ pm) covariance of observed returns, D2 = diag(b0 pm X 0 XB pm+σ pm) diag(v (Y )),whereb pm is posterior mean of the iterates of B, which is a Bayesian estimate for the factors betas B. - To assess the convergence of the Gibbs sampler, we plot the autocorrelation function for the risk premia and the pricing errors iterates. - Finally, we use kernel density to estimate the posterior probability distribution of the risk premia and pricing errors. These densities along with the cumulative distributions are then plotted against the standard normal distribution. We set the warming period to iterations and the sampling period to The initial values for the indicator variable are one for the constant term and zero for all other variables. The results were unchanged with different starting values. The initial value for the idiosyncratic covariance matrix is 0.2 I N. The location matrix for the Wishart distribution is Φ = I N and the scale parameter m = N +2. The following points summarize the main results: 1. The most favoured model using the g-prior H β = c (X 0 X) 1 with c = max{t,k 2 } is the APT with the two factors {SMB,HML} Table 1 shows that the two factors have a posterior probability of one to be in the DGP. The point estimate for the risk premium for SMB is 0.01% while the risk premium associated to the factor HML is about 1.9%. None of the measured economic variables nor the financial variables were pervasive. Table 2 represents the results for the conditionally independent prior H β = ci K. The most favoured factors are {VWRET,SP500,SMB,HML} all with 100% posterior probability to be in the DGP. The money growth, GB has only a 0.05% posterior odds to be part of the pervasive set of factors, while the monthly production growth, MP, appears to have a 0.05% probability. Both GB and MP have zero risk premia. The posterior means of the risk premia for SMB and HML are negative and are equal to and respectively. The VWRET has anegativeriskpremiaof 0.006, which is slightly smaller than point estimate for the risk premia on the SP500 which is equal to In order to compare the two favoured models using the two different priors we firstlookattheorypricing errors. Table 3 is a summary of the posterior means, standard deviation and confidence intervals for the pricing errors for the two types of priors. With the conditionally independent prior, the mean pricing error for the most favoured model with the factors {VWRET,SP500,SMB,HML} is % for Q e2 (resp % for Q 2 ) compared to % (resp % for Q 2 ) for the most favoured model under the g-prior with factors {SMB,HML}. The4-factor APT shrinks the pricing error by about 21% (resp. 36% for Q 2 ). To assess the economic importance of the pricing errors, we will follow Geweke

20 & Zhou (1996) argument and compare the magnitude of the mean of Q with the monthly expected returns. For the sample period , the means range from to percent per month. One can regard the above means of the pricing errors as economically negligible. To further assess the pricing error we provide the 90% (95%) Bayesian confidence interval, which state that there is 90% (95%) probability that the pricing errors in the interval. The smaller the confidence interval, the more heavily concentrated the posterior density of the average pricing error near its mean and the more informative are the data on the pricing error. Table 3. Pricing Errors for the two types of g-prior. Case of combined set of measured factors, K = 22,N = 45 and c = T. ci K c(x 0 X) 1 PanelA:SamplePeriod1960 : : 12, T = 480 Q Q e Q Q e E(Q y) Std % CI [0.0324; ] [0.0049; ] [0.0516; ] [ ; ] 95% CI [0.0317; ] [0.0047; ] [0.0505; ] [0.0060; ] PanelB:SamplePeriod1980 : : 12, T = 240 Q eq Q eq E(Q y) Std % CI [0.0693; ] [0.0123; ] [0.0267; ] [ ; ] 95% CI [0.0672; ] [0.0117; ] [0.0260; ] [0.0079; ] 3. The average proportion D1 for the 4-factor model is 99.96% andanaverage of93.34% for the proportion D2. The two-factor model has an average proportion of 99.74% for D1 and 49.02% for D2. Figure 1 and Figure 2 show that the 4-factor has both a higher D1 and D2 compared to the two-factor model for all asset returns in the sample. The higher the proportion D1, the smaller is the idiosyncratic risk relative to the systematic risk. Both models have proportions above 99.5%. However, the gap between the sample variances of the asset returns and the estimated total risk is far greater in the two-factor compared to the 4-factor AP T. 4. In terms of matching the average returns, Figure 1 shows that the design matrix prior appears to produce estimates that are very similar to the classical approach. The independent prior. produces estimates for the expected returns that are quantitatively very low compared to the data dependent prior and the sample means.

21 Figure 1: Plot of the alphas for both priors against the average returns. Sample period with c=t. Table 1. Risk Premium for the combined set of measured factors, T = 480, K=22, N = 43 and c = T. Panel A: H β = ci K γ y E(. y) Std δ max p(y) δ min Q 90% CI 95% CI δ [1.6891; ] [1.6576; ] δ GB NA NA δ VWRET [ ; ] [ ; ] δ SP [ ; ] [ ; ] δ MP NA NA δ HML [ ; ] [ ; ] δ SMB [ ; ] [ ; ] Panel B: H β = c(x 0 X) 1 γ y E(. y) Std δ max p(y) δ min Q 90% CI 95% CI δ [1.2441; ] [1.2390; ] δ HML [ ; ] [ ; ] δ SMB [ ; ] [ ; ]

22 Table 2. Risk Premium for the combined set of measured factors, K = 22, T = 240, N = 45 and c = K 2 Panel A: H β = ci K γ y E(δ i y) Std δ max p(y) δ min Q 90% CI 95% CI δ [1.8925; ] [1.8608; ] δ GB [ ; ] [ ; ] δ VWRET [ ; ] [ ; ] δ SP [ ; ] [ ; ] δ MP [ ; ] [ ; ] δ I [ ; ] [ ; ] δ UEI [ ; ] [ ; ] δ HML [ ; ] [ ; ] δ SMB [ ; ] [ ; ] Panel B: H β = c(x 0 X) 1 γ y E(δ i y) Std δ max p(y) δ min Q 90% CI 95% CI δ [1.3537; ] [1.3469; ] δ VWRET [ ; ] [ ; ] δ HML [ ; ] [ ; ] δ SMB [ ; ] [ ; ] 6 Conclusion In this article, we propose a fully Bayesian framework for selecting the risk factors and examining their risk premia and the pricing restrictions implied by the APT. This a one step approach which integrates the uncertainty behind model selection and the estimation of the different functions of the parameters. In contrast to existing studies, we do not fix apriorithe number of measured variables allowed to enter the pricing relationship. The number of measured variables and statistical factors is endogenously determined. This process is performed simultaneously with the estimation of the factor betas and their risk premia. Hence, this method avoids the errors in variables problem encountered in the main stream two-pass approach of Fama-MacBeth. Because, the Bayesian approach evaluates the exact posterior distribution of the estimated parameters and any other function of the parameters, we are able to produce Bayesian confidence intervals for the risk premia to gage if the market does price a certain factor. Inference is also done on the average pricing errors in order to evaluate the extent to which the APT restrictions deviate from the data. In an APT with only measured economic variables are allowed along with Fama and French three factors, the choice of the prior on the factor betas influences the posterior distribution of the promising factors. In the case of Zellner g-prior where the prior covariance matrix is a replica of the design matrix, the pervasive

23 factors are Fama and French size and book to market risk factors SMB and HML. However, using the conditionally dependent prior, in addition to SMB and HML,some economic variables appear to be priced by the market. More specifically, inflation, unexpected inflation, return on value weighted portfolio and return on the standard and poor portfolio. References [1] Anderson, T. W. and Amemiya, Y., (1988), The Asymptotic Normal Distribution of Estimators in Factor Analysis under General Conditions, Annals of Statistics, Vol. 16, [2] Bartlett, M. (1957), A comment on D. V. Lindley s Statistical Paradox. Biometrika Vol. 44, pp [3] Berger, J. O., Ghosh, J. K. and Mukhopadhyay, N. (2003), Approximations and Consistency of Bayes FactorsasModelDimensionGrows.Journal of Statistical Planning and Inference, Vol. 112, pp [4] Brown, P. J., Vannucci, M. and Fearn, T. (1998), Multivariate Bayesian Variable Selection and Prediction, Journal of the Royal Statistical Society, series B, Vol. 60, Part 3, pp [5] Brown, P. J., Vannucci, M. and Fearn, T. (1999), The Choice of Variables in Multivariate Regression: A Non-conjugate Bayesian Decision Theory Approach. Biometrika Vol. 86, pp [6] Burmeister, E. and McElroy, B. M.,(1988), Joint Estimation of Factor Sensitivities and Risk Premia for the Arbitrage Pricing Theory. Journal of Finance, Vol. XLIII, No. 3, pp [7] Carlin, B. P. and Chib, S. (1995), Bayesian Model Choice Via Markov Chain Monte Carlo Methods. Journal of the Royal Statistical Society, Series B, Vol. 57, pp [8] Chamberlin, G. and Rothschild, M. (1983), Arbitrage, Factor Structure and Mean-Variance Analysis in Large Asset Markets, Econometrica 51, [9] Chen, N. F., Roll, R. and Ross, S., A. (1986), Economic Forces and the Stock Market, Journal of Business Vol. 59, pp [10] Chipman H., George E. I. and McCulloch R. E. (2001), The Practical Implementation of Bayesian Model Selection, ims Lecture Notes- Monograph Series, Vol. 38, pp [11] Donoho, D. L. and Johnstone, I.M. (1994), Ideal Spatial Adaptation by Wavelet Shrinkage, Biometrika, Vol. 81, pp / [12] Fama, E. F., and French, K. R. (1993), Common Risk Factors in the Returns on Stocks and Bonds, Journal of Financial Economics Vol. 33, No. 1, pp [13] Ferson, E. W., and Harvey, R. C. (1991), The Variation of Economic Risk Premiums, Journal of Political Economy, Vol. 99, No. 2, pp

24 [14] George, E. and McCulloch, R. E. (1993), Variable Selection Via Gibbs Sampling. Journal of the American Statistical Association, Vol. 88, pp [15] Geweke, J. and Zhou, G., (1996), Measuring the Pricing Error of the Arbitrage Pricing Theory, The Review of Financial Studies, Vol.9, N0. 2, [16] Ouysse, R., (2005), Consistent Variable Selection In Large Panels When Factors are Observable, Journal of Multivariate Analysis, Forthcoming [17] Smith, M. and Kohn, R. (1996), Nonparametric Regression using Bayesian Variable Selection. Journal of Econometrics, Vol. 75, pp [18] Smith, M. and Kohn, R. (1997), A Bayesian Approach to Nonparametric Bivariate Regression. Journal of the American Statistical Association, Vol. 92, pp [19] Smith, M. and Kohn, R. (2002), Parsimonious Covariance Matrix Estimation for Longitudinal Data. Journal of the American Statistical Association, Vol. 97, pp

25 Figure 2: Risk Premia Iterates and Histogram for the combined set of measured factors for the sample period The g-prior used is c(x 0 X) 1. N =43,T =480,K max =19.γ y = {γ cons,γ SMB,γ HML }

26 Figure 3: Empirical pdf and cdf for Risk Premia Iterates for the economic variables. Case of combined set of measured factors with g-priors c(x 0 X) 1,c= T =480,N=43and K max =19. Sample period

27 Figure 4: Autocorrelation Function for Risk Premia Iterates and the pricing error. Case of combined set of measured factors with g-priors c(x0x) 1,c= T = 480, N=43and K max =19,lags=300. Sample period

Figure 5: Risk Premia Iterates for FF factors SMB and HML for the priod 1960-2000.

28 Figure 5: Risk Premia Iterates for FF factors SMB and HML for the priod Case of combined set of measured factors with g-priors ci Kmax,c= T = 480, N=43and K max =19.

29 Figure 6: Risk Premia Iterates for Economic factors for the period Case of combined set of measured factors with g-priors ci Kmax,c= T = 480, N=43and K max =19.

30 Figure 7: Empirical pdf and cdf for Risk Premia Iterates for FF factors {SMB,HML}. Case of combined set of measured factors with g-priors ci Kmax,c= T = 480, N=43and K max =19. Sample period

31 Figure 8: Empirical pdf and cdf for Risk Premia Iterates for the economic variables. Case of combined set of measured factors with g-priors ci Kmax,c= T = 480, N=43and K max =19. Sample period

32 Figure 9: Autocorrelation Function for Risk Premia Iterates and the pricing error. Case of combined set of measured factors with g-priors ci K,c= T = 480, N=43and K max =19,lags= 300

33 Figure 10: Risk Premia Iterates and Histogram for the combined set of measured factors for the sample period The g-prior used is c(x 0 X) 1. N = 43,T = 240,K max = 19. γ y = {γ cons,γ VWRET,γ SMB,γ HML }

34 Figure 11: Empirical pdf and cdf for Risk Premia Iterates for the economic variables. Case of combined set of measured factors with g-priors c(x 0 X) 1,c= T =240,N=43and K max =19. Sample period

35 Figure 12: Autocorrelation Function for Risk Premia Iterates and the pricing error. Case of combined set of measured factors with g-priors c(x0x) 1,c= T = 240, N=43and K max =19,lags=300. Sample period

36 Figure 13: Risk Premia Iterates for FF factors SMB HML and VWRET for the priod Case of combined set of measured factors with g-priors ci Kmax,c= T =240,N=43and K max =19.

37 Figure 14: Risk Premia Iterates for the economic factors for the priod Case of combined set of measured factors with g-priors ci Kmax,c= T = 240, N=43and K max =19.

38 Figure 15: Empirical pdf and cdf for Risk Premia Iterates. Case of combined set of measured factors with g-priors ci Kmax,c= T =240,N=43and K max =19. Sample period

39 Figure 16: Empirical pdf and cdf for Risk Premia Iterates. Case of combined set of measured factors with g-priors ci Kmax,c= T =240,N=43and K max =19. Sample period

Figure 17: Risk Premia Iterates for FF factors SMB and HML for the priod 1980-2000.

40 Figure 17: Risk Premia Iterates for FF factors SMB and HML for the priod Case of combined set of measured factors with g-priors c (X 0 X) 1,c= T =480,N=43and K max =19.

Figure 18: Risk Premia Iterates for the priod 1980-2000.

41 Figure 18: Risk Premia Iterates for the priod Case of combined set of measured factors with g-priors ci Kmax,c= T =240,N=43and K max =19.

Figure 19: Risk Premia Iterates 1980-2000.

42 Figure 19: Risk Premia Iterates Case of combined set of measured factors with g-priors ci Kmax, c = T =240,N=43and K max =19.

R = µ + Bf Arbitrage Pricing Model, APM

R = µ + Bf Arbitrage Pricing Model, APM 4.2 Arbitrage Pricing Model, APM Empirical evidence indicates that the CAPM beta does not completely explain the cross section of expected asset returns. This suggests that additional factors may be required.