Factor Investing using Penalized Principal Components

Size: px

Start display at page:

Download "Factor Investing using Penalized Principal Components"

Edith Flowers
5 years ago
Views:

1 Factor Investing using Penalized Principal Components Markus Pelger Martin Lettau 2 Stanford University 2 UC Berkeley February 8th 28 AI in Fintech Forum 28

2 Motivation Motivation: Asset Pricing with Risk Factors The Challenge of Asset Pricing Most important question in finance: Why are prices different for different assets? Fundamental insight: Arbitrage Pricing Theory: Prices of financial assets should be explained by systematic risk factors. Problem: Chaos in asset pricing factors: Over 3 potential asset pricing factors published! Fundamental question: Which factors are really important in explaining expected returns? Which are subsumed by others? Goals of this paper: Bring order into factor chaos Summarize the pricing information of a large number of assets with a small number of factors

3 Motivation Why is it important? Importance of factors for investing Optimal portfolio construction Only factors are compensated for systematic risk Optimal portfolio with highest Sharpe-ratio must be based on factor portfolios (Sharpe-ratio=expected excess return/standard deviation) Smart beta investments = exposure to risk factors 2 Arbitrage opportunities Find underpriced assets and earn alpha 3 Risk management Factors explain risk-return trade-off Factors allow to manage systematic risk exposure 2

4 Motivation Contribution of this paper Contribution This Paper: Estimation approach for finding risk factors and optimal factor investing Key elements of estimator: Statistical factors instead of pre-specified (and potentially miss-specified) factors 2 Uses information from large panel data sets: Many assets with many time observations 3 Searches for factors explaining asset prices (explain differences in expected returns) not only co-movement in the data 4 Allows time-variation in factor structure 3

5 Motivation Contribution of this paper Results Asymptotic distribution theory for weak and strong factors No blackbox approach Estimator discovers weak factors with high Sharpe-ratios high Sharpe-ratio factors important for asset pricing and investment Estimator strongly dominates conventional approach (Principal Component Analysis (PCA)) PCA does not find all high Sharpe-ratio factors Empirical results: Investment: 3 times higher Sharpe-ratio then benchmark factors (PCA) Asset Pricing: Smaller pricing errors in- and out-of sample than benchmark (PCA, 5 Fama-French factors, etc.) 4

6 Model The Model Approximate Factor Model Observe excess returns of N assets over T time periods: X t,i = F t K }{{} factors Λ i K }{{} loadings + e t,i }{{} idiosyncratic i =,..., N t =,..., T Matrix notation X }{{} T N = }{{} F }{{} Λ T K K N + e }{{} T N N assets (large) T time-series observation (large) K systematic factors (fixed) F, Λ and e are unknown 5

7 Model The Model Approximate Factor Model Systematic and non-systematic risk (F and e uncorrelated): Var(X ) = ΛVar(F )Λ + }{{} Var(e) }{{} systematic non systematic Systematic factors should explain a large portion of the variance Idiosyncratic risk can be weakly correlated Arbitrage-Pricing Theory (APT): The expected excess return is explained by the risk-premium of the factors: E[X i ] = E[F ]Λ i Systematic factors should explain the cross-section of expected returns 6

8 Model The Model: Estimation of Latent Factors Conventional approach: PCA (Principal component analysis) Apply PCA to the sample covariance matrix: T X X X X with X = sample mean of asset excess returns Eigenvectors of largest eigenvalues estimate loadings ˆΛ. Much better approach: Risk-Premium PCA (RP-PCA) Apply PCA to a covariance matrix with overweighted mean T X X + γ X X γ = risk-premium weight Eigenvectors of largest eigenvalues estimate loadings ˆΛ. ˆF estimator for factors: ˆF = N X ˆΛ = X (ˆΛ ˆΛ) ˆΛ. 7

9 Model The Model: Objective Function Conventional PCA: Objective Function Minimize the unexplained variance: min Λ,F NT N T (X ti F t Λ i ) 2 i= t= RP-PCA (Risk-Premium PCA): Objective Function Minimize jointly the unexplained variance and pricing error min Λ,F N T (X ti F t Λ i ) 2 +γ N ( X i F Λ ) 2 i NT N i= t= i= }{{}}{{} unexplained variation pricing error with X i = T T t= X t,i and F = T T t= F t and risk-premium weight γ 8

10 Model The Model: Interpretation Interpretation of Risk-Premium-PCA (RP-PCA): Combines variation and pricing error criterion functions: Select factors with small cross-sectional pricing errors (alpha s). Protects against spurious factor with vanishing loadings as it requires the time-series errors to be small as well. 2 Penalized PCA: Search for factors explaining the time-series but penalizes low Sharpe-ratios. 3 Information interpretation: PCA of a covariance matrix uses only the second moment but ignores first moment Using more information leads to more efficient estimates. RP-PCA combines first and second moments efficiently. 9

11 Model The Model: Interpretation Interpretation of Risk-Premium-PCA (RP-PCA): continued 4 Signal-strengthening: Intuitively the matrix T X X + γ X X converges to Λ ( Σ F + ( + γ)µ F µ ) F Λ + Var(e) with Σ F = Var(F ) and µ F = E[F ]. The signal of weak factors with a small variance can be pushed up by their mean with the right γ.

12 Illustration Illustration (Size and accrual) Illustration: Anomaly-sorted portfolios (Size and accrual) Factors PCA: Estimation based on PCA of correlation matrix, K = 3 2 RP-PCA: K = 3 and γ = 3 FF-long/short: market, size and accrual (based on extreme quantiles, same construction as Fama-French factors) Data Double-sorted portfolios according to size and accrual (from Kenneth French s website) Monthly return data from 7/963 to 5/27 (T = 647) for N = 25 portfolios Out-of-sample: Rolling window of 2 years (T=24) Optimal factor investing: maximum Sharpe-ratio portfolio R opt = F Σ F µ F

13 Illustration Illustration (Size and accrual) In-sample Out-of-sample SR RMS α Idio. Var. SR RMS α Idio. Var. RP-PCA PCA FF-long/short Table: Maximal Sharpe-ratios, root-mean-squared pricing errors and unexplained idiosyncratic variation. K = 3 factors and γ =. SR: Maximum Sharpe-ratio of linear combination of factors Cross-sectional pricing errors α: Pricing error α i = E[X i ] E[F ]Λ i N RMS α: Root-mean-squared pricing errors N i= α i 2 N T Idiosyncratic Variation: NT i= t= (X t,i Ft Λ i ) 2 RP-PCA significantly better than PCA and quantile-sorted factors. 2

14 Illustration Loadings for statistical factors (Size and Accrual).5. PCA factor.5 2. PCA factor.5 3. PCA factor Loadings Loadings Loadings Portfolio.5. RP-PCA factor Portfolio.5 2. RP-PCA factor Portfolio.5 3. RP-PCA factor Loadings Loadings Loadings Portfolio Portfolio Portfolio RP-PCA detects accrual factor while 3rd PCA factor is noise. 3

15 Illustration Maximal Sharpe ratio (Size and accrual) SR (In-sample) SR (Out-of-sample).35 =- = =.3 = =5 = factor 2 factors 3 factors factor 2 factors 3 factors Figure: Maximal Sharpe-ratio by adding factors incrementally. st and 2nd PCA and RP-PCA factors the same. RP-PCA detects 3rd factor (accrual) for γ >. 4

16 Weak vs. Strong Factors The Model Strong vs. weak factor models Strong factor model ( N Λ Λ bounded) Interpretation: strong factors affect most assets (proportional to N), e.g. market factor Strong factors lead to exploding eigenvalues RP-PCA always more efficient than PCA optimal γ relatively small Weak factor model (Λ Λ bounded) Interpretation: weak factors affect a smaller fraction of assets Weak factors lead to large but bounded eigenvalues RP-PCA detects weak factors which cannot be detected by PCA optimal γ relatively large In Data: Typically -2 strong factors and several weak factors with high Sharpe-ratios 5

17 Weak Factor Model Weak Factor Model Weak Factor Model Weak factors have small variance or affect small fraction of assets Λ Λ bounded (after normalizing factor variances) Statistical model: Spiked covariance from random matrix theory Eigenvalues of sample covariance matrix separate into two areas: The bulk, majority of eigenvalues The extremes, a few large outliers Bulk spectrum converges to generalized Marchenko-Pastur distrib. Large eigenvalues converge either to A biased value (characterized by the Stieltjes transform) To the bulk of the spectrum if the true eigenvalue is too small Phase transition phenomena: estimated eigenvectors orthogonal to true eigenvectors if eigenvalues too small 6

18 Weak Factor Model Weak Factor Model Intuition: Weak Factor Model Signal matrix for PCA of covariance matrix: ΛΣ F Λ K largest eigenvalues θ PCA,..., θk PCA measure strength of signal Signal matrix for RP-PCA: Λ ( Σ F + ( + γ)µ F µ ) F Λ K largest eigenvalues θ RP-PCA,..., θk RP-PCA measure strength of signal If µ F and γ > then RP-PCA signal always larger than PCA signal: θ RP-PCA i > θ PCA i 7

19 Weak Factor Model Weak Factor Model Theorem : Risk-Premium PCA under weak factor model The correlation of the estimated with the true factors is ρ Ĉorr(F, ˆF ) p }{{} Ũ ρ }{{} Ṽ rotation rotation ρ K with ρ 2 i { p +θ i B(θ i if θ )) i > θ crit otherwise Critical value θ crit and function B(.) depend only on the noise distribution and are known in closed-form Based on closed-form expression choose optimal RP-weight γ For θ i > θ crit ρ 2 i is strictly increasing in θ i. RP-PCA strictly dominates PCA 8

20 Strong Factor Model Strong Factor Model Strong Factor Model Strong factors affect most assets: e.g. market factor N Λ Λ bounded (after normalizing factor variances) Factors and loadings can be estimated consistently and are asymptotically normal distributed Assumptions more general than in weak factor model Asymptotic Efficiency Choose RP-weight γ to obtain smallest asymptotic variance of estimators RP-PCA (i.e. γ > ) always more efficient than PCA In most cases optimal γ = (smaller than in weak factor model) RP-PCA and PCA are both consistent 9

21 Time-varying loadings Time-varying loadings Model with time-varying loadings Observe panel of excess returns and L covariates Z i,t,l : X t,i = F t K g K (Z i,t,,..., Z i,t,l ) + e t,i Loadings are function of L covariates Z i,t,l with l =,..., L e.g. characteristics like size, book-to-market ratio, past returns,... Factors and loading function are latent Idea: Approximate non-linear function g by basis functions Project return data on appropriate basis functions Equivalent to apply RP-PCA to portfolio data Special case: Characteristics sorted portfolios corresponds to kernel basis functions General case: Obtain arbitrary interactions and break curse of dimensionality by conditional tree sorting projection 2

22 Empirical Results Single-sorted portfolios Portfolio Data Monthly return data from 7/963 to 2/26 (T = 638) for N = 37 portfolios Kozak, Nagel and Santosh (27) data: 37 decile portfolios sorted according to 37 anomalies Factors: RP-PCA: K = 6 and γ =. 2 PCA: K = 6 3 Fama-French 5: The five factor model of Fama-French (market, size, value, investment and operating profitability, all from Kenneth French s website). 4 Proxy factors: RP-PCA and PCA factors approximated with 5% of largest position. 2

23 Empirical Results Single-sorted portfolios In-sample Out-of-sample SR RMS α Idio. Var. SR RMS α Idio. Var. RP-PCA PCA Fama-French Table: Deciles of 37 single-sorted portfolios from 7/963 to 2/26 (N = 37 and T = 638): Maximal Sharpe-ratios, root-mean-squared pricing errors and unexplained idiosyncratic variation. K = 6 statistical factors. RP-PCA strongly dominates PCA and Fama-French 5 factors Results hold out-of-sample. 22

24 Empirical Results Single-sorted portfolios: Maximal Sharpe-ratio SR (In-sample) SR (Out-of-sample) factors 4 factors 5 factors 6 factors 7 factors RP-PCA RP-PCA Proxy PCA PCA Proxy RP-PCA RP-PCA Proxy PCA PCA Proxy Figure: Maximal Sharpe-ratios. Spike in Sharpe-ratio for 6 factors 23

25 Empirical Results Single-sorted portfolios: Pricing error.25 RMS (In-sample).25 RMS (Out-of-sample) factors 4 factors 5 factors 6 factors 7 factors..5 RP-PCA RP-PCA Proxy PCA PCA Proxy RP-PCA RP-PCA Proxy PCA PCA Proxy Figure: Root-mean-squared pricing errors. RP-PCA has smaller out-of-sample pricing errors 24

26 Empirical Results Single-sorted portfolios: Idiosyncratic Variation 5 Idiosyncratic Variation (In-sample) 5 Idiosyncratic Variation (Out-of-sample) factors 4 factors 5 factors 6 factors 7 factors 2 RP-PCA RP-PCA Proxy PCA PCA Proxy RP-PCA RP-PCA Proxy PCA PCA Proxy Figure: Unexplained idiosyncratic variation. Unexplained variation similar for RP-PCA and PCA 25

27 Empirical Results Single-sorted portfolios: Maximal Sharpe-ratio SR (In-sample) SR (Out-of-sample).7 =- = =.6 = =5 = factor 2 factors 3 factors 4 factors 5 factors 6 factors 7 factors.2. factor 2 factors 3 factors 4 factors 5 factors 6 factors 7 factors Figure: Maximal Sharpe-ratios for different RP-weights γ and number of factors K Strong increase in Sharpe-ratios for γ. 26

28 Empirical Results Optimal Portfolio with RP-PCA Composition of Stochast Discount Factor (RP-PCA) Industry Mom. Rev. Ind. Rel. Rev. (L.V.) Value-Mom-Prof. Industry Rel. Reversals High Decile Low Decile Sales Growth Value-Profitability Seasonality Value (M) Gross Profitability Momentum (2m) Idiosyncratic Volatility Long Run Reversals Asset Turnover Momentum-Reversals Sales/Price Return on Assets (A) Investment/Assets Value (A) Return on Book Equity (A) Gross Margins Composite Issuance Earnings/Price Size Momentum (6m) Accrual Cash Flows/Price Short-Term Reversals Value-Momentum Investment Growth Industry Momentum Asset Growth Dividend/Price Share Volume Net Operating Assets Price Investment/Capital Leverage Figure: Portfolio composition of highest Sharpe ratio portfolio (Stochastic Discount Factor) based on 6 RP-PCA factors. 27

29 Empirical Results Optimal Portfolio with RP-PCA (largest positions) Composition of Stochast Discount Factor (RP-PCA).3 High Decile Low Decile Industry Mom. Rev. Ind. Rel. Rev. (L.V.) Value-Mom-Prof. Industry Rel. Reversals Value-Profitability Seasonality Value (M) Gross Profitability Momentum (2m) Idiosyncratic Volatility Long Run Reversals Asset Turnover Momentum-Reversals Sales/Price Return on Assets (A) Figure: Largest portfolios in highest Sharpe ratio portfolio (Stochastic Discount Factor) based on 6 RP-PCA factors. 28

30 Empirical Results Optimal Portfolio with PCA Composition of Stochast Discount Factor (PCA) High Decile Low Decile Value-Profitability Asset Turnover Sales/Price Leverage Value-Mom-Prof. Gross Profitability Industry Mom. Rev. Earnings/Price Size Idiosyncratic Volatility Long Run Reversals Ind. Rel. Rev. (L.V.) Return on Book Equity (A) Dividend/Price Cash Flows/Price Value-Momentum Value (A) Return on Assets (A) Industry Momentum Momentum (6m) Momentum (2m) Composite Issuance Momentum-Reversals Investment/Assets Share Volume Short-Term Reversals Asset Growth Price Seasonality Gross Margins Investment/Capital Investment Growth Value (M) Industry Rel. Reversals Sales Growth Accrual Net Operating Assets Figure: Portfolio composition of highest Sharpe ratio portfolio (Stochastic Discount Factor) based on 6 PCA factors. 29

31 Empirical Results Optimal Portfolio with PCA (largest positions) Composition of Stochast Discount Factor (PCA).3 High Decile Low Decile Value-Mom-Prof. Value-Profitability Asset Turnover Sales/Price Leverage Gross Profitability Industry Mom. Rev. Earnings/Price Size Long Run Reversals Idiosyncratic Volatility Ind. Rel. Rev. (L.V.) Return on Book Equity (A) Dividend/Price Cash Flows/Price Figure: Largest portfolios in highest Sharpe ratio portfolio (Stochastic Discount Factor) based on 6 PCA factors. 3

32 Empirical Results Interpreting factors: Generalized correlations with proxies RP-PCA PCA. Gen. Corr Gen. Corr Gen. Corr Gen. Corr Gen. Corr Gen. Corr Table: Generalized correlations of statistical factors with proxy factors (portfolios of 5% of assets). Problem in interpreting factors: Factors only identified up to invertible linear transformations. Generalized correlations close to measure of how many factors two sets have in common. Proxy factors approximate statistical factors. 3

33 Empirical Results Interpreting factors: 6th proxy factor 6. Proxy RP-PCA Weights 6. Proxy PCA Weights Momentum (6m).28 Leverage.33 Momentum (6m) 2.25 Asset Turnover.25 Value (M).25 Value-Profitability.25 Value-Momentum.23 Profitability.22 Industry Momentum.2 Asset Turnover 9.22 Industry Reversals 9.9 Sales/Price.2 Industry Momentum 2.9 Sales/Price 9.8 Momentum (6m) 3.8 Size.7 Idiosyncratic Volatility Value-Momentum-Profitability -.9 Industry Mom. Reversals -.8 Profitability Value-Momentum Value-Profitability -.2 Momentum (6m) -.2 Profitability Value-Momentum Value-Profitability Value-Momentum -.23 Profitability -.23 Short-Term Reversals -.24 Idiosyncratic Volatility -.24 Industry-Momentum -.24 Profitability Industry Rel. Reversals -.28 Asset Turnover Idiosyncratic Volatility -.38 Asset Turnover

34 Empirical Results Time-stability of loadings Figure: Time-varying rotated loadings for the first six factors. Loadings are estimated on a rolling window with 24 months. 33

35 Empirical Results Time-stability of loadings of individual stocks SR (In-sample) SR (Out-of-sample).6.4 factor 2 factors 3 factors 4 factors 5 factors 6 factors RP-PCA PCA RP-PCA PCA Figure: Stock RMS price (In-sample) data (N = 27 and T = 5): RMS Maximal (Out-of-sample) Sharpe-ratios for different number of factors. RP-weight γ = Stock price data from /972 to 2/26.3 (N = 27 and T = 5).2. Out-of-sample performance collapses Constant loading model inappropriate.2. 34

36 Empirical Results Time-stability of loadings of individual stocks Figure: Stock price data: Generalized correlations between loadings estimated on the whole time horizon and a rolling window 35

37 Conclusion Conclusion Methodology Estimator for estimating priced latent factors from large data sets Combines variation and pricing criterion function Asymptotic theory under weak and strong factor model assumption Detects weak factors with high Sharpe-ratio More efficient than conventional PCA Empirical Results Strongly dominates PCA of the covariance matrix. Potential to provide benchmark factors for horse races. RP-PCA factor investing outperforms benchmark models 36

38 Generalized Correlation Generalized Correlation Time-stability: Generalized correlations RP-PCA (total vs. time-varying) st GC 2nd GC 3rd GC 4th GC 5th GC 6th GC Year PCA (total vs. time-varying) st GC 2nd GC 3rd GC 4th GC 5th GC 6th GC Year Figure: Generalized correlations between loadings estimated on the whole time horizon T = 638 and a rolling window with 24 months for K = 6 factors. A

39 Time-stability of loadings of individual stocks Generalized Correlation Generalized Correlation RP-PCA (total vs. time-varying) st GC 2nd GC.5 3rd GC 4th GC 5th GC 6th GC Time PCA (total vs. time-varying) Time Figure: Stock price data (N = 27 and T = 5): Generalized correlations between loadings estimated on the whole time horizon and a rolling window with 24 months. A 2

40 Effect of Risk-Premium Weight γ.4 SR (In-sample).4 SR (Out-of-sample) factor 2 factors SR.2 3 factors SR RMS (In-sample) RMS (Out-of-sample) Idiosyncratic Variation (In-sample) Idiosyncratic Variation (Out-of-sample) 4 4 Variation 2 Variation Figure: Maximal Sharpe-ratios, root-mean-squared pricing errors and unexplained idiosyncratic variation. RP-PCA detects 3rd factor (accrual) for γ >. A 3

41 Literature (partial list) Large-dimensional factor models with strong factors Bai (23): Distribution theory Fan et al. (26): Projected PCA for time-varying loadings Kelly et al. (27): Instrumented PCA for time-varying loadings Pelger (26), Aït-Sahalia and Xiu (25): High-frequency Large-dimensional factor models with weak factors (based on random matrix theory) Onatski (22): Phase transition phenomena Benauch-Georges and Nadakuditi (2): Perturbation of large random matrices Asset-pricing factors Harvey and Liu (25): Lucky factors Clarke (25): Level, slope and curvature for stocks Kozak, Nagel and Santosh (25): PCA based factors Bryzgalova (26): Spurious factors A 4

42 Extreme deciles of single-sorted portfolios Portfolio Data Monthly return data from 7/963 to 2/26 (T = 638) for N = 64 portfolios Kozak, Nagel and Santosh (27) data: 37 decile portfolios sorted according to 37 anomalies Here we take only the lowest and highest decile portfolio for each anomaly (N = 64). Factors: RP-PCA: K = 6 and γ =. 2 PCA: K = 6 3 Fama-French 5: The five factor model of Fama-French (market, size, value, investment and operating profitability, all from Kenneth French s website). 4 Proxy factors: RP-PCA and PCA factors approximated with 8 largest positions. A 5

43 Extreme Deciles Anomaly Mean SD Sharpe-ratio Anomaly Mean SD Sharpe-ratio Accruals - accrual Momentum (2m) - mom Asset Turnover - aturnover Momentum-Reversals - momrev Cash Flows/Price - cfp Net Operating Assets - noa Composite Issuance - ciss Price - price Dividend/Price - divp Gross Profitability - prof Earnings/Price - ep Return on Assets (A) - roaa Gross Margins - gmargins Return on Book Equity (A) - roea Asset Growth - growth Seasonality - season Investment Growth - igrowth Sales Growth - sgrowth Industry Momentum - indmom Share Volume - shvol. 6.. Industry Mom. Reversals - indmomrev Size - size Industry Rel. Reversals - indrrev Sales/Price sp Industry Rel. Rev. (L.V.) - indrrevlv Short-Term Reversals - strev Investment/Assets - inv Value-Momentum - valmom Investment/Capital - invcap Value-Momentum-Prof. - valmomprof Idiosyncratic Volatility - ivol Value-Profitability - valprof Leverage - lev Value (A) - value Long Run Reversals - lrrev Value (M) - valuem Momentum (6m) - mom Table: Long-Short Portfolios of extreme deciles of 37 single-sorted portfolios from 7/963 to 2/26: Mean, standard deviation and Sharpe-ratio. A 6

44 Extreme Deciles In-sample Out-of-sample SR RMS α Idio. Var. SR RMS α Idio. Var. RP-PCA PCA RP-PCA Proxy PCA Proxy Fama-French Table: First and last decile of 37 single-sorted portfolios from 7/963 to 2/26 (N = 64 and T = 638): Maximal Sharpe-ratios, root-mean-squared pricing errors and unexplained idiosyncratic variation. K = 6 statistical factors. RP-PCA strongly dominates PCA and Fama-French 5 factors Results hold out-of-sample. A 7

45 Extreme Deciles: Maximal Sharpe-ratio SR (In-sample) SR (Out-of-sample) factors 4 factors 5 factors 6 factors 7 factors RP-PCA RP-PCA Proxy PCA PCA Proxy RP-PCA RP-PCA Proxy PCA PCA Proxy Figure: Maximal Sharpe-ratios. Spike in Sharpe-ratio for 6 factors A 8

46 Extreme Deciles: Pricing error.35 RMS (In-sample).35 RMS (Out-of-sample) factors 4 factors 5 factors 6 factors 7 factors.5..5 RP-PCA RP-PCA Proxy PCA PCA Proxy RP-PCA RP-PCA Proxy PCA PCA Proxy Figure: Root-mean-squared pricing errors. RP-PCA has smaller out-of-sample pricing errors A 9

47 Extreme Deciles: Idiosyncratic Variation 7 Idiosyncratic Variation (In-sample) 7 Idiosyncratic Variation (Out-of-sample) factors 4 factors 5 factors 6 factors 7 factors 3 2 RP-PCA RP-PCA Proxy PCA PCA Proxy RP-PCA RP-PCA Proxy PCA PCA Proxy Figure: Unexplained idiosyncratic variation. Unexplained variation similar for RP-PCA and PCA A

48 Extreme Deciles: Maximal Sharpe-ratio SR (In-sample) =- = = = =5 = SR (Out-of-sample) factor 2 factors 3 factors 4 factors 5 factors 6 factors 7 factors.2. factor 2 factors 3 factors 4 factors 5 factors 6 factors 7 factors Figure: First and last decile of 37 single-sorted portfolios (N = 64 and T = 638): Maximal Sharpe-ratios for different RP-weights γ and number of factors K. A

49 Interpreting factors: Generalized correlations with proxies RP-PCA PCA. Gen. Corr Gen. Corr Gen. Corr Gen. Corr Gen. Corr Gen. Corr Table: Generalized correlations of statistical factors with proxy factors (portfolios of 8 assets). Problem in interpreting factors: Factors only identified up to invertible linear transformations. Generalized correlations close to measure of how many factors two sets have in common. Proxy factors approximate statistical factors. A 2

50 Interpreting factors: Cumulative absolute proxy weights RP-PCA Proxy PCA Proxy Momentum (2m).7 Idiosyncratic Volatility.28 Idiosyncratic Volatility.65 Momentum (2m).4 Industry Rel. Rev. (L.V.).2 Momentum (6m).4 Momentum (6m).2 Price.95 Price.2 Asset Turnover.83 Industry Mom. Reversals. Industry Rel. Rev. (L.V.).79 Value (M).3 Value (M).75 Industry Rel. Reversals.84 Industry Momentum.73 Share Volume.55 Industry Mom. Reversals.73 Net Operating Assets.47 Dividend/Price.67 Value-Momentum-Prof..46 Gross Profitability.64 Size.42 Sales/Price.58 Return on Book Equity (A).32 Value-Profitability.58 Industry Momentum.29 Net Operating Assets.37 Value-Momentum.27 Value (A).36 Long Run Reversals.24 Value-Momentum-Prof..33 Earnings/Price.22 Cash Flows/Price.3 A 3

51 Interpreting factors: Composition of proxies 2. Proxy (RP-PCA) 3. Proxy (RP-PCA) 4. Proxy (RP-PCA) 5. Proxy (RP-PCA) 6. Proxy (RP-PCA) indrrevlv.54 valmomprof.7 mom2.28 mom price.38 indmomrev.52 indmomrev -.2 mom.26 mom -.28 mom.36 ivol.24 ivol -.2 valuem.25 valmomprof -.29 valuem.34 Accrual -.2 mom lrrev -.24 roea -.32 indrrev.32 shvol -.22 indrrevlv -.26 mom -.3 shvol -.33 indrrev -.26 ep -.22 indmomrev -.4 valuem -.44 price -.37 valmom -.27 indrrev -.25 indrrevlv -.4 price -.45 size -.42 indmom -.29 mom ivol -.67 mom noa -.47 ivol Proxy (PCA) 3. Proxy (PCA) 4. Proxy (PCA) 5. Proxy (PCA) 6. Proxy (PCA) ivol.59 valuem.46 mom.36 divp.3 valprof.33 indrrevlv.43 price.38 indmom.35 roea -.25 Aturnover.32 indmomrev.37 divp.37 mom2.34 shvol -.27 sp.27 indrrevlv.36 value.36 valmomprof.33 size -.28 prof.24 indmomrev.36 sp.32 valmom -.3 mom -.29 valprof -.25 ivol.27 lrrev.3 indmom -.35 noa -.37 prof -.4 mom2.3 cfp.3 mom mom ivol -.42 indmom.3 valuem -.29 mom -.39 price -.57 Aturnover -.5 Table: Portfolio-composition of proxy factors for first and last decile of 37 single-sorted portfolios: First proxy factors is an equally-weighted portfolio. A 4

52 Extreme Deciles: Time-Stability Figure: Time-varying rotated loadings for the first six factors. Loadings are estimated on a rolling window with 24 months. A 5

53 Generalized Correlation Generalized Correlation Generalized Correlation Generalized Correlation Extreme Deciles: Time-Stability RP-PCA (total vs. time-varying, K=4) RP-PCA (total vs. time-varying, K=6) st GC 2nd GC st GC 3rd GC.2 2nd GC 3rd GC.2 4th GC 5th GC 4th GC 6th GC Time Time PCA (total vs. time-varying, K=4) PCA (total vs. time-varying, K=6) st GC 2nd GC st GC 3rd GC.2 2nd GC 3rd GC.2 4th GC 5th GC 4th GC 6th GC Time Time Figure: Generalized correlations between loadings estimated on the whole time horizon T = 638 and a rolling window. A 6

54 Double-sorted portfolios Double-sorted portfolios Data Factors Monthly return data from 7/963 to 5/27 (T = 647) 3 double sorted portfolios (consisting of 25 portfolios) from Kenneth French s website PCA: K = 3 2 RP-PCA: K = 3 and γ = 3 FF-Long/Short factors: market + two specific anomaly long-short factors A 7

55 Sharpe-ratios and pricing errors (in-sample) Sharpe-Ratio α RPCA PCA FF-L/S RPCA PCA FF-L/S Size and BM BM and Investment BM and Profits Size and Accrual Size and Beta Size and Investment Size and Profits Size and Momentum Size and ST-Reversal Size and Idio. Vol Size and Total Vol Profits and Invest Size and LT-Reversal A 8

56 Sharpe-ratios and pricing errors (out-of-sample) Sharpe-Ratio α RPCA PCA FF-L/S RPCA PCA FF-L/S Size and BM BM and Investment BM and Profits Size and Accrual Size and Beta Size and Investment Size and Profits Size and Momentum Size and ST-Reversal Size and Idio. Vol Size and Total Vol Profits and Invest Size and LT-Reversal A 9

57 The Model: Objective function Time-series objective function: Minimize the unexplained variance: min Λ,F NT N i= t= T (X ti F t Λ i ) 2 = min Λ NT trace ( (XM Λ ) (XM Λ ) ) s.t. F = X (Λ Λ) Λ Projection matrix M Λ = I N Λ(Λ Λ) Λ Error (non-systematic risk): e = X F Λ = XM Λ Λ proportional to eigenvectors of the first K largest eigenvalues of NT X X minimizes time-series objective function Motivation for PCA A 2

58 The Model: Objective function Cross-sectional objective function: Minimize cross-sectional expected pricing error: N = N N (Ê[Xi ] Ê[F ]Λ i i= ) 2 N ( T X i T F Λ i i= ) 2 = N trace ( ( T XM Λ ) ( T XM Λ ) ) s.t. F = X (Λ Λ) Λ is vector T of s and thus F T estimates factor mean Why not estimate factors with cross-sectional objective function? Factors not identified Spurious factor detection (Bryzgalova (26)) A 2

59 The Model: Objective function Combined objective function: Risk-Premium-PCA min Λ,F ( (( )) ( NT trace (XM Λ ) (XM Λ + γ ) ( ) ) N trace T XM Λ T XM Λ ( ( = min Λ NT trace M Λ X I + γ ) T ) XM Λ s.t. F = X (Λ Λ) Λ The objective function is minimized by the eigenvectors of the largest eigenvalues of NT X ( I T + γ T ) X. ˆΛ estimator for loadings: proportional to eigenvectors of the first K eigenvalues of NT X ( I T + γ T ) X ˆF estimator for factors: N X ˆΛ = X (ˆΛ ˆΛ) ˆΛ. Estimator for the common component C = F Λ is Ĉ = ˆF ˆΛ A 22

60 The Model: Objective function Weighted Combined objective function: Straightforward extension to weighted objective function: min Λ,F NT trace(q (X F Λ ) (X F Λ )Q) + γ ) ( N trace (X F Λ )QQ (X F Λ ) = min trace (M ( Λ Q X I + γ ) Λ T ) XQM Λ s.t. F = X (Λ Λ) Λ Cross-sectional weighting matrix Q Factors and loadings can be estimated by applying PCA to Q X ( I + γ T ) XQ. Today: Only Q equal to inverse of a diagonal matrix of standard deviations. For γ = corresponds to PCA of a correlation matrix. Optimal choice of Q: GLS type argument A 23

61 Simulation Simulation parameters N = 25 and T = 35. Factors: K = 4. Factor represent the market with N(.2, 9): Sharpe-ratio of.4 2. Factor represents an industry factors following N(., ): Sharpe-ratio of.. 3. Factor follows N(.4, ): Sharpe-ratio of Factor has a small variance but high Sharpe-ratio. It follows N(.4,.6): Sharpe-ratio of. Loadings normalized such that N Λ Λ = I K. Λ i, = and Λ i,k N(, ) for k = 2, 3, 4. Errors: Cross-sectional and time-series correlation and heteroskedasticity in the residuals. /3 of variation due to systematic risk. A 24

62 Simulation parameters Errors Residuals are modeled as e = σ e D T A T ɛa N D N : ɛ is a T N matrix and follows a multivariate standard normal distribution Time-series correlation in errors: A T creates an AR() model with parameter ρ =. Cross-sectional correlation in errors: A N is a Toeplitz-matrix with (β, β, β) on the right three off-diagonals with β =.7 Cross-sectional heteroskedasticity: D N is a diagonal matrix with independent elements following N(,.2) Time-series heteroskedasticity: D T is a diagonal matrix with independent elements following N(,.2) Signal-to-noise ratio: σ 2 e = 9 Parameters produce eigenvalues that are consistent with the data. A 25

63 Simulation 6. Factor 2 2. Factor Factor 3 4. Factor True factor RP-PCA = RP-PCA = RP-PCA = RP-PCA = PCA Figure: Sample paths of the cumulative returns of the first four factors and the estimated factor processes. N = 25 and T = 35. A 26

64 Simulation True Factor γ = γ = γ = γ = 5 γ = In-sample SR α Idio. Var Ex. Var Out-of-sample SR α Idio. Var Ex. Var Table: (i) Maximal Sharpe-ratio (SR), (ii) root-mean-squared pricing error (α), unexplained variation and proportion of variation explained by K = 4 factors. N = 25 and T = 35. A 27

65 Simulation γ = γ = γ = γ = 5 γ = In-sample. Factor Factor Factor Factor Out-of-sample. Factor Factor Factor Factor Table: Correlation between estimated and true factors. The risk-premium weight γ = corresponds to PCA. N = 25 and T = 35. A 28

66 Simulation Simulation parameters Consider only K = factor Sharpe-ratio =, i.e. µ F = σ F Different signal strength σf 2 Λ i N(, ), i.e. normalized variance σf 2 N in weak factor model. Same residuals as before A 29

67 Simulation: Correlation, N=5, T=5 Correlation with true factor (In-sample) Correlation with true factor (Out-of-sample) =- = = =5 = True factor =.3 2 =.5 2 =.8 2 =. 2 =.3 2 =.5 2 =.8 2 =. Figure: Correlation of estimated with true factor. For signal σ 2 <.8 a value of γ is optimal. A 3

68 Simulation: Sharpe-ratio, N=5, T=5.2 SR (In-sample).2 SR (Out-of-sample) =- = = =5 = True factor =.3 2 =.5 2 =.8 2 =. 2 =.3 2 =.5 2 =.8 2 =. Figure: Maximal Sharpe-ratios. For signal σ 2 <.8 a value of γ is optimal. A 3

69 Simulation: Pricing Error, N=5, T=5 RMS (In-sample) RMS (Out-of-sample) =- = = =5 = True factor =.3 2 =.5 2 =.8 2 =. 2 =.3 2 =.5 2 =.8 2 =. Figure: Root-mean-squared pricing errors. For signal σ 2 <.8 a value of γ is optimal. A 32

70 Simulation: Idiosyncratic Variation, N=5, T=5 25 Idiosyncratic Variation (In-sample) 25 Idiosyncratic Variation (Out-of-sample) =- = = =5 = True factor 5 2 =.3 2 =.5 2 =.8 2 =. 2 =.3 2 =.5 2 =.8 2 =. Figure: Unexplained idiosyncratic variation. Unexplained variation similar for RP-PCA and PCA. A 33

71 Simulation: Correlation, N=25, T=25 Correlation with true factor (In-sample) Correlation with true factor (Out-of-sample) =- = = =5 = True factor =.3 2 =.5 2 =.8 2 =. 2 =.3 2 =.5 2 =.8 2 =. Figure: Correlation of estimated with true factor. For signal σ 2 <.8 a value of γ is optimal. A 34

72 Simulation: Correlation, N=5, T=25 Correlation with true factor (In-sample) Correlation with true factor (Out-of-sample) =- = = =5 = True factor =.3 2 =.5 2 =.8 2 =. 2 =.3 2 =.5 2 =.8 2 =. Figure: Correlation of estimated with true factor. For signal σ 2 <.8 a value of γ is optimal. A 35

73 Simulation: Correlation, N=, T=25 Correlation with true factor (In-sample) Correlation with true factor (Out-of-sample) =- = = =5 = True factor =.3 2 =.5 2 =.8 2 =. 2 =.3 2 =.5 2 =.8 2 =. Figure: Correlation of estimated with true factor. For signal σ 2 <.8 a value of γ is optimal. A 36

74 Simulation: Correlation, N=25, T=5 Correlation with true factor (In-sample) Correlation with true factor (Out-of-sample) =- = = =5 = True factor =.3 2 =.5 2 =.8 2 =. 2 =.3 2 =.5 2 =.8 2 =. Figure: Correlation of estimated with true factor. For signal σ 2 <.8 a value of γ is optimal. A 37

75 Simulation: Weak factor model prediction Statistical Model.5 PCA RP-PCA ( =) RP-PCA ( =) RP-PCA ( =5) Monte-Carlo Simulation.5 PCA RP-PCA ( =) RP-PCA ( =) RP-PCA ( =5) Correlations between estimated and true factor based on the weak factor model prediction and Monte-Carlo simulations for different variances of the factor. The residuals have cross-sectional and time-series correlation and A 38 heteroskedasticity. The Sharpe-ratio of the factor is. N = 25, T = 35.

76 Simulation: Weak factor model prediction Statistical Model.5 PCA RP-PCA ( =) RP-PCA ( =) RP-PCA ( =5) Monte-Carlo Simulation.5 PCA RP-PCA ( =) RP-PCA ( =) RP-PCA ( =5) Correlations between estimated and true factor based on the weak factor model prediction and Monte-Carlo simulations for different variances of the factor. The residuals have only cross-sectional correlation. The Sharpe-ratio of the A 39 factor is. N = 25, T = 35.

77 Weak Factor Model: Dependent residuals dependent residuals i.i.d residuals Figure: Values of ρ 2 i ( if θ +θ i B(ˆθ i )) i > σcrit 2 and otherwise) for different signals θ i. The average noise level is normalized in both cases to σe 2 = and c =. For the correlated residuals we assume that Σ /2 is a Toeplitz matrix with β, β, β on the right three off-diagonals with β =.7. N = 25, T = 35. A 4

78 Weak Factor Model Weak Factor Model Weak factors either have a small variance or affect a smaller fraction of assets: Λ Λ bounded (after normalizing factor variances) Statistical model: Spiked covariance models from random matrix theory Eigenvalues of sample covariance matrix separate into two areas: The bulk, majority of eigenvalues The extremes, a few large outliers Bulk spectrum converges to generalized Marchenko-Pastur distribution (under certain conditions) A 4

79 Weak Factor Model Weak Factor Model Large eigenvalues converge either to A biased value characterized by the Stieltjes transform of the bulk spectrum To the bulk of the spectrum if the true eigenvalue is below some critical threshold Phase transition phenomena: estimated eigenvectors orthogonal to true eigenvectors if eigenvalues too small Onatski (22): Weak factor model with phase transition phenomena Problem: All models in the literature assume that random processes have mean zero RP-PCA implicitly uses non-zero means of random variables New tools necessary! A 42

80 Weak Factor Model Assumption : Weak Factor Model Rate: Assume that N T c with < c <. 2 Factors: F are uncorrelated among each other and are independent of e and Λ and have bounded first two moments. ˆµ F := T p F t µf ˆΣF := σ 2 F T T FtF t p Σ F = t= σf 2 K 3 Loadings: The column vectors of the loadings Λ are orthogonally invariant and independent of ɛ and F (e.g. Λ i,k N(, N ) and Λ Λ = I K 4 Residuals: e = ɛσ with ɛ t,i N(, ). The empirical eigenvalue distribution function of Σ converges to a non-random spectral distribution function with compact support and supremum of support b. Largest eigenvalues of Σ converge to b. A 43

81 Weak Factor Model Intuition: Weak Factor Model Signal matrix for PCA of covariance matrix: ΛΣ F Λ K largest eigenvalues θ PCA,..., θk PCA measure strength of signal Signal matrix for RP-PCA: Λ ( Σ F + ( + γ)µ F µ ) F Λ K largest eigenvalues θ RP-PCA,..., θk RP-PCA measure strength of signal If µ F and γ > then RP-PCA signal always larger than PCA signal: θ RP-PCA i > θ PCA i A 44

82 Weak Factor Model Theorem : Risk-Premium PCA under weak factor model Under Assumption the correlation of the estimated with the true factors is ρ Ĉorr(F, ˆF ) p ρ 2 }{{} Ũ }{{} Ṽ rotation rotation ρ K with ρ 2 i { p +θ i B(θ i if θ )) i > θ crit otherwise Critical value θ crit and function B(.) depend only on the noise distribution and are known in closed-form Based on closed-form expression choose optimal RP-weight γ For θ i > θ crit ρ 2 i is strictly increasing in θ i. RP-PCA strictly dominates PCA A 45

83 Strong Factor Model Strong Factor Model Strong factors affect most assets: e.g. market factor N Λ Λ bounded (after normalizing factor variances) Statistical model: Bai and Ng (22) and Bai (23) framework Factors and loadings can be estimated consistently and are asymptotically normal distributed RP-PCA provides a more efficient estimator of the loadings Assumptions essentially identical to Bai (23) A 46

84 Strong Factor Model Asymptotic Distribution (up to rotation) PCA under assumptions of Bai (23): (up to rotation) Asymptotically ˆΛ behaves like OLS regression of F on X. Asymptotically ˆF behaves like OLS regression of Λ on X. RP-PCA under slightly stronger assumptions as in Bai (23): Asymptotically ( ˆΛ behaves ) like OLS regression of FW on XW with W 2 = I T + γ T and is a T vector of s. Asymptotically ˆF behaves like OLS regression of Λ on X. Asymptotic Efficiency Choose RP-weight γ to obtain smallest asymptotic variance of estimators RP-PCA (i.e. γ > ) always more efficient than PCA Optimal γ typically smaller than optimal value from weak factor model RP-PCA and PCA are both consistent A 47

85 Simplified Strong Factor Model Assumption 2: Simplified Strong Factor Model Rate: Same as in Assumption 2 Factors: Same as in Assumption 3 Loadings: Λ Λ/N p I K and all loadings are bounded. 4 Residuals: e = ɛσ with ɛ t,i N(, ). All elements and all row sums of Σ are bounded. A 48

86 Simplified Strong Factor Model Proposition: Simplified Strong Factor Model Assumption 2 holds. Then: The factors and loadings can be estimated consistently. 2 The asymptotic distribution of the factors is not affected by γ. 3 The asymptotic distribution of the loadings is given by ) D T (ˆΛi Λ i N(, Ωi ) Ω i =σe 2 ( i ΣF + ( + γ)µ F µ ) ( F ΣF + ( + γ) 2 µ F µ ) F ( ΣF + ( + γ)µ F µ ) F, E[e 2 t,i ] = σe 2 i 4 γ = is optimal choice for smallest asymptotic variance. γ =, i.e. the covariance matrix, is not efficient. A 49

87 Time-varying loadings Model with time-varying loadings Observe panel of excess returns and L covariates Z i,t,l : X t,i = F t K g K (Z i,t,,..., Z i,t,l ) + e t,i Loadings are function of L covariates Z i,t,l with l =,..., L e.g. characteristics like size, book-to-market ratio, past returns,... Factors and loading function are latent Idea: Similar to Projected PCA (Fan, Liao and Wang (26)) and Instrumented PCA (Kelly, Pruitt, Su (27)), but we include the pricing error penalty allow for general interactions between covariates A 5

88 Time-varying loadings Projected RP-PCA (work in progress) Approximate nonlinear function g k (.) by basis functions φ m (.): g k (Z i,t ) = M m= b m,k φ m (Z i,t ) g(z t ) = }{{}}{{} B Φ(Z t ) }{{} K N K M M N Apply RP-PCA to projected data X t = X t Φ(Z t ) X t = F t B Φ(Z t )Φ(Z t ) + e t Φ(Z t ) = F t B + ẽt Special case: φ m = {Zt I m} X characteristics sorted portfolios Obtain arbitrary interactions and break curse of dimensionality by conditional tree sorting projection Intuition: Projection creates M portfolios sorted on any functional form and interaction of covariates Z t. A 5

89 Weak Factor Model Assumption : Weak Factor Model Residual matrix can be represented as e = ɛσ with ɛ t,i N(, ). The empirical eigenvalue distribution function of Σ converges to a non-random spectral distribution function with compact support. The supremum of the support is b. 2 The factors F are uncorrelated among each other and are independent of e and Λ and have bounded first two moments. ˆµ F := T p F t µf ˆΣ F := σ 2 F T T FtF t p Σ F = t= σf 2 K 3 The column vectors of the loadings Λ are orthogonally invariant and independent of ɛ and F (e.g. Λ i,k N(, N ) and Λ Λ = I K 4 Assume that N T c with < c <. A 52

90 Weak Factor Model Definition: Weak Factor Model Average idiosyncratic noise σ 2 e := trace(σ)/n Denote by λ λ 2... λ N the ordered eigenvalues of T e e. The Cauchy transform (also called Stieltjes transform) of the eigenvalues is the almost sure limit: G(z) := a.s. lim T N B-function N i= c B(z) :=a.s. lim T N z λ i N i= c =a.s. lim T N trace ( = a.s. lim T N trace (zi N ) T e e) λ i (z λ i ) 2 ( ( (zi N T e e) ) 2 ( ) ) T e e A 53

91 Weak Factor Model Estimator Risk-premium ( PCA (RP-PCA): ) Apply PCA estimation to S γ := X I T T + γ X T PCA : Apply PCA ( to estimated ) covariance matrix S := X I T T X, i.e. γ =. PCA special case of RP-PCA Signal Matrix for Covariance PCA T σf 2 + cσ 2 e M Var = Σ F + cσe 2 I K = σf 2 K + cσe 2 Intuition: Largest K true eigenvalues of S. A 54

92 Weak Factor Model Lemma: Covariance PCA Assumption holds. Define the critical value σcrit 2 = lim z b. The first K G(z) largest eigenvalues ˆλ i of S satisfy for i =,..., K ( ) p G ˆλ if σ i σ F 2 +cσ 2 F 2 i e i + cσe 2 > σcrit 2 b otherwise The correlation between the estimated and true factors converges to ϱ Ĉorr(F, ˆF ) p ϱ K with ϱ 2 i p { +(σ 2 F i +cσ 2 e )B(ˆλ i )) if σ 2 F i + cσ 2 e > σ 2 crit otherwise A 55

93 Weak Factor Model Corollary: Covariance PCA for i.i.d. errors Assumption holds, c and e t,i i.i.d. N(, σ 2 e ). The largest K eigenvalues of S have the following limiting values: ˆλ i p { σ 2 Fi + σ2 e σ 2 F i (c + + σ 2 e ) σ 2 e ( + c) 2 if σ 2 F i + cσ 2 e > σ 2 crit σ 2 F > cσ 2 e otherwise The correlation between the estimated and true factors converges to ϱ Ĉorr(F, ˆF ) p ϱ K with ϱ 2 i p cσ4 e σ 4 F i + cσ2 e σ 2 F i + σ4 e σ 4 F i (c 2 c) if σ 2 F i + cσ 2 e > σ 2 crit otherwise A 56

94 Weak Factor Model Signal Matrix for RP-PCA Signal Matrix for RP-PCA ( ) Σ M RP = F + cσe 2 Σ /2 F µ F ( + γ) µ F Σ /2 F ( + γ) ( + γ)(µ F µ F + cσ2) 2 Define γ = γ + and note that ( + γ) 2 = + γ. Projection on K demeaned factors and on mean operator. Denote by θ... θ K+ the eigenvalues of the signal matrix M RP and by Ũ the corresponding orthonormal eigenvectors : θ Ũ M RP Ũ = θ K+ Intuition: θ,..., θ K+ largest K + true eigenvalues of S γ. A 57

95 Weak Factor Model Theorem : Risk-Premium PCA under weak factor model Assumption holds. The first K largest eigenvalues ˆθ i i =,..., K of S γ satisfy { ( ) p G ˆθ i θ i if θ i > σcrit 2 = lim z b G(z) b otherwise The correlation of the estimated with the true factors converges to ρ Ĉorr(F, ˆF ) p ( I K ) ρ 2 Ũ. }{{}..... D /2 /2 K ˆΣ ˆF rotation ρ K }{{} rotation with ρ 2 i p { +θ i B( ˆθ i )) if θ i > σcrit 2 otherwise A 58

96 Weak Factor Model Theorem : continued ˆΣ ˆF =D/2 K ρ ( ρ K ρ 2 ) ρ 2 K D K =diag ((ˆθ )) ˆθK ( ) Ũ IK Ũ D /2 K ρ ρ K A 59

97 Weak Factor Model Lemma: Detection of weak factors If γ > and µ F, then the first K eigenvalues of M RP are strictly larger than the first K eigenvalues of M Var, i.e. For θ i > σ 2 crit it holds that ˆθ i θ i > θ i > σ 2 F i + cσ 2 e Thus, if γ > and µ F, then ρ i > ϱ i. ρ i θ i > i =,..., K For µ F RP-PCA always better than PCA. A 6

98 Weak Factor Model Example: One-factor model Assume that there is only one factor, i.e. K =. The signal matrix M RP simplifies to ( ) σ 2 M RP = F + cσe 2 σ F µ( + γ) µσ F ( + γ) (µ 2 + cσe 2 )( + γ) and has the eigenvalues: θ,2 = 2 σ2 F + cσe 2 + (µ 2 + cσe 2 )( + γ) ± (σf cσ2 e + (µ 2 + cσe 2 )( + γ)) 2 4( + γ)cσe 2 (σf 2 + µ2 + cσe 2 ) The eigenvector of first eigenvalue θ has the components Ũ, = Ũ,2 = µσ F ( + γ) (θ (σf 2 + cσ2 e )) 2 + µ 2 σf 2 ( + γ) θ σ 2 F + cσ 2 e (θ (σ 2 F + cσ2 e )) 2 + µ 2 σ 2 F ( + γ) A 6

99 Weak Factor Model Corollary: One-factor model The correlation between the estimated and true factor has the following limit: Ĉorr(F, ˆF ) p ρ ρ 2 + ( ρ2 ) (θ (σ 2 F +cσ2 e ))2 + µ 2 σ 2 F (+γ) A 62

100 Strong Factor Model Strong Factor Model Strong factors affect most assets: e.g. market factor N Λ Λ bounded (after normalizing factor variances) Statistical model: Bai and Ng (22) and Bai (23) framework Factors and loadings can be estimated consistently and are asymptotically normal distributed RP-PCA provides a more efficient estimator of the loadings Assumptions essentially identical to Bai (23) A 63

101 Strong Factor Model Asymptotic Distribution (up to rotation) PCA under assumptions of Bai (23): Asymptotically ˆΛ behaves like OLS regression of F on X. Asymptotically ˆF behaves like OLS regression of Λ on X. RP-PCA under slightly stronger assumptions as in Bai (23): Asymptotically ( ˆΛ behaves ) like OLS regression of FW on XW with W 2 = I T + γ T. Asymptotically ˆF behaves like OLS regression of Λ on X. Asymptotic Expansion Asymptotic expansions (under slightly stronger assumptions as in Bai (23)): ) T (H ˆΛi Λ i = ( F W 2 F ) ( T ) T T F W 2 e i + O p + o N p() ) 2 N (H ˆFt F t = ( N Λ Λ ) ( N ) N Λ et + O p + o T p() with known rotation matrix H. A 64

102 Strong Factor Model Assumption 2: Strong Factor Model Assume the same assumptions as in Bai (23) (Assumption A-G) hold and in addition ( T T t= Fte ) ( ) t,i D Ω, Ω,2 T T t= e N(, Ω) Ω = t,i Ω 2, Ω 2,2 A 65

103 Strong Factor Model Theorem 2: Strong Factor Model Assumption 2 holds and γ [, ). Then: For any choice of γ the factors, loadings and common components can be estimated consistently pointwise. then ( ) T H ˆΛi D Λ i N(, Φ) If T N ( ) ) Φ = Σ F + (γ + )µ F µ F (Ω, + γµ F Ω 2, + γω,2µ F + γ 2 µ F Ω 2,2µ F ( ) Σ F + (γ + )µ F µ F For γ = this simplifies to the conventional case Σ F Ω,Σ F. If N T then the asymptotic distribution of the factors is not affected by the choice of γ. The asymptotic distribution of the common component depends on γ if and only if N T does not go to zero. For T N T (Ĉt,i C t,i ) D N (, F t ΦF t ) A 66

Factors that Fit the Time Series and Cross-Section of Stock Returns

Factors that Fit the Time Series and Cross-Section of Stock Returns Martin Lettau Markus Pelger UC Berkeley Stanford University November 3th 8 NBER Asset Pricing Meeting Motivation Motivation Fundamental