Factor Models for Asset Returns. Prof. Daniel P. Palomar

Similar documents
Time Series Modeling of Financial Data. Prof. Daniel P. Palomar

R = µ + Bf Arbitrage Pricing Model, APM

Ross (1976) introduced the Arbitrage Pricing Theory (APT) as an alternative to the CAPM.

Regression: Ordinary Least Squares

ASSET PRICING MODELS

MFE Financial Econometrics 2018 Final Exam Model Solutions

Econ671 Factor Models: Principal Components

Financial Econometrics Lecture 6: Testing the CAPM model

Multivariate GARCH models.

Lecture Notes in Empirical Finance (PhD): Linear Factor Models

Panel Data Models. Chapter 5. Financial Econometrics. Michael Hauser WS17/18 1 / 63

Identifying Financial Risk Factors

Network Connectivity and Systematic Risk

Quick Review on Linear Multiple Regression

Notes on empirical methods

Factor Model Risk Analysis

9.1 Orthogonal factor model.

Financial Econometrics Short Course Lecture 3 Multifactor Pricing Model

Cross-Sectional Regression after Factor Analysis: Two Applications

17 Factor Models and Principal Components

Financial Econometrics Return Predictability

Principal Component Analysis (PCA) Our starting point consists of T observations from N variables, which will be arranged in an T N matrix R,

Statistics 910, #5 1. Regression Methods

[y i α βx i ] 2 (2) Q = i=1

ECON 4160, Spring term 2015 Lecture 7

The Slow Convergence of OLS Estimators of α, β and Portfolio. β and Portfolio Weights under Long Memory Stochastic Volatility

Empirical Market Microstructure Analysis (EMMA)

Statistics 910, #15 1. Kalman Filter

Prior Information: Shrinkage and Black-Litterman. Prof. Daniel P. Palomar

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Factor Analysis and Kalman Filtering (11/2/04)

Multivariate Tests of the CAPM under Normality

Multivariate Time Series

Chapter 4: Factor Analysis

STAT 100C: Linear models

1 Data Arrays and Decompositions

Time-Varying Parameters

Discussion of Bootstrap prediction intervals for linear, nonlinear, and nonparametric autoregressions, by Li Pan and Dimitris Politis

Predicting Mutual Fund Performance

A Non-Parametric Approach of Heteroskedasticity Robust Estimation of Vector-Autoregressive (VAR) Models

APT looking for the factors

Empirical properties of large covariance matrices in finance

Restricted Maximum Likelihood in Linear Regression and Linear Mixed-Effects Model

Specification Test Based on the Hansen-Jagannathan Distance with Good Small Sample Properties

STAT Financial Time Series

Likelihood-based Estimation of Stochastically Singular DSGE Models using Dimensionality Reduction

State-space Model. Eduardo Rossi University of Pavia. November Rossi State-space Model Fin. Econometrics / 53

Lecture 10: Panel Data

In modern portfolio theory, which started with the seminal work of Markowitz (1952),

Maximum Likelihood Estimation

Linear Regression. In this problem sheet, we consider the problem of linear regression with p predictors and one intercept,

A Guide to Modern Econometric:

Applications of Random Matrix Theory to Economics, Finance and Political Science

Inference on Risk Premia in the Presence of Omitted Factors

Econometrics of Panel Data

Specification Errors, Measurement Errors, Confounding

Quantitative Trendspotting. Rex Yuxing Du and Wagner A. Kamakura. Web Appendix A Inferring and Projecting the Latent Dynamic Factors

Network Connectivity, Systematic Risk and Diversification

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41912, Spring Quarter 2008, Mr. Ruey S. Tsay. Solutions to Final Exam

Multivariate Time Series Analysis and Its Applications [Tsay (2005), chapter 8]

Factor Investing using Penalized Principal Components

Multivariate Distributions

Intro VEC and BEKK Example Factor Models Cond Var and Cor Application Ref 4. MGARCH

B y t = γ 0 + Γ 1 y t + ε t B(L) y t = γ 0 + ε t ε t iid (0, D) D is diagonal

X t = a t + r t, (7.1)

Linear models and their mathematical foundations: Simple linear regression

Financial Econometrics and Quantitative Risk Managenent Return Properties

Linear Models. DS-GA 1013 / MATH-GA 2824 Optimization-based Data Analysis.

GMM and SMM. 1. Hansen, L Large Sample Properties of Generalized Method of Moments Estimators, Econometrica, 50, p

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

Estimating Covariance Using Factorial Hidden Markov Models

Probabilities & Statistics Revision

Least Squares Estimation-Finite-Sample Properties

Eigenvalue spectra of time-lagged covariance matrices: Possibilities for arbitrage?

Problem Set 7 Due March, 22

Econometrics of Panel Data

Linear models. Rasmus Waagepetersen Department of Mathematics Aalborg University Denmark. October 5, 2016

Multivariate Time Series: VAR(p) Processes and Models

For more information about how to cite these materials visit

Econ 424 Time Series Concepts

Multivariate modelling of long memory processes with common components

Vector autoregressions, VAR

3.1. The probabilistic view of the principal component analysis.

Factor Analysis (10/2/13)

1. Shocks. This version: February 15, Nr. 1

Statistical Inference On the High-dimensional Gaussian Covarianc

Econ 423 Lecture Notes: Additional Topics in Time Series 1

Linear Dimensionality Reduction

Introduction to Estimation Methods for Time Series models. Lecture 1

Estimation of large dimensional sparse covariance matrices

Generalized Autoregressive Score Models

GARCH Models Estimation and Inference

ZHAW Zurich University of Applied Sciences. Bachelor s Thesis Estimating Multi-Beta Pricing Models With or Without an Intercept:

The BLP Method of Demand Curve Estimation in Industrial Organization

STAT 100C: Linear models

Factor models. March 13, 2017

Financial Econometrics

Heteroskedasticity; Step Changes; VARMA models; Likelihood ratio test statistic; Cusum statistic.

Introduction to Algorithmic Trading Strategies Lecture 3

Vector Auto-Regressive Models

Transcription:

Factor Models for Asset Returns Prof. Daniel P. Palomar The Hong Kong University of Science and Technology (HKUST) MAFS6010R- Portfolio Optimization with R MSc in Financial Mathematics Fall 2018-19, HKUST, Hong Kong

Outline 1 Introduction 2 Linear Factor Models 3 Macroeconomic Factor Models 4 Fundamental Factor Models 5 Statistical Factor Models

Outline 1 Introduction 2 Linear Factor Models 3 Macroeconomic Factor Models 4 Fundamental Factor Models 5 Statistical Factor Models

Why Factor Models? To decompose risk and return into explainable and unexplainable components. Generate estimates of abnormal returns. Describe the covariance structure of returns without factors: the covariance matrix for N stocks requires N(N + 1)/2 parameters (e.g., 500(500 + 1)/2 = 125, 250) with K factors: the covariance matrix for N stocks requires N (K + 1) parameters with K N (e.g., 500(3 + 1) = 2, 000) Predict returns in specified stress scenarios. Provide a framework for portfolio risk analysis. D. Palomar Factor Models 4 / 45

Types of Factor Models Factor models decompose the asset returns into two parts: low-dimensional factors and idiosynchratic residual noise. 1 Three types: 1 Macroeconomic factor models factors are observable economic and financial time series no systematic approach to choose factors 2 Fundamental factor models factors are created from observable asset characteristics no systematic approach to define the characteristics 3 Statistical factor models factors are unobservable and extracted from asset returns more systematic, but factors have no clear interpretation 1 R. S. Tsay, Analysis of Financial Time Series. John Wiley & Sons, 2005. D. Palomar Factor Models 5 / 45

Outline 1 Introduction 2 Linear Factor Models 3 Macroeconomic Factor Models 4 Fundamental Factor Models 5 Statistical Factor Models

Linear Factor Model Data: N assets/instruments/indexes: i = 1,..., N T time periods: t = 1,..., T N-variate random vector for returns at t: x t = (x 1,t,..., x N,t ) T. Factor model for asset i: x i,t = α i + β 1,i f 1,t + + β K,i f K,t + ɛ i,t, t = 1,..., T. K: the number of factors α i : intercept of asset i f t = (f 1,t,..., f K,t ) T : common factors (same for all assets i) β i = (β 1,i,..., β K,i ) T : factor loading of asset i (independing ot t) ɛ i,t : residual idiosyncratic term for asset i at time t D. Palomar Factor Models 7 / 45

Cross-Sectional Regression Factor model: where α = α 1. α N x t = α + Bf t + ɛ t, t = 1,..., T β T (N 1), B = 1. β T N (N K), ɛ t = (N 1). α and B are independent of t the factors {f t } (K-variate) are stationary with ɛ 1,t. ɛ N,t E [f t ] = µ f Cov [f t ] = E[(f t µ f ) (f t µ f ) T ] = Σ f the residuals {ɛ t } (N-variate) are white noise with E [ɛ t ] = 0 Cov [ɛ t, ɛ s ] = E[ɛ t ɛ T s ] = Ψδ ts, Ψ = diag ( σ1, 2..., σn 2 ) the two processes {f t } and {ɛ t } are uncorrelated D. Palomar Factor Models 8 / 45

Linear Factor Model Summary of parameters: α: (N 1) intercept for N assets B: (N K) factor loading matrix µ f : (K 1) mean vector of K common factors Σ f : (K K) covariance matrix of K common factors Ψ = diag ( σ 2 1,..., σ2 N) : N asset-specific variances Properties of linear factor model: the stochastic process {x t } is a stationary multivariate time series with conditional moments unconditional moments E [x t f t ] = α + Bf t Cov [x t f t ] = Ψ E [x t ] = α + Bµ f Cov [x t ] = BΣ f B T + Ψ. D. Palomar Factor Models 9 / 45

Time-Series Regression Factor model: where x i = (T K), ɛ i = x i,1. x i,t ɛ i,1 x i = 1 T α i + Fβ i + ɛ i, i = 1,..., N 1 (T 1), 1 T = (T 1), F =. ɛ i,t (T 1). ɛ i is the (T -variate) vector of white noise with. 1 f T 1. f T T E [ɛ i ] = 0 Cov [ɛ i ] = σ 2 i I T. D. Palomar Factor Models 10 / 45

Multivariate Regression Factor model (using compact matrix notation): where X = x T 1. X T = α1 T + BF T + E T f T 1. (T N), F = (T K), E = ɛ T 1. (T N). x T T f T T ɛ T T D. Palomar Factor Models 11 / 45

Expected Return (α β) Decomposition Recall the factor model for asset i: Take the expectation: where x i,t = α i + β T i f t + ɛ i,t, t = 1,..., T. E[x i,t ] = α i + β T i µ f β T i µ f is the explained expected return due to systematic risk factors α i = E[x i,t ] β T i µ f is the unexplained exptected return (abnormal return). According to the CAPM model, the alpha should be zero, but is it? D. Palomar Factor Models 12 / 45

Portfolio Analysis Let w = (w 1,..., w N ) T be a vector of portfolio weights (w i is the fraction of wealth in asset i). The portfolio return is r p,t = w T r t = Portfolio factor model: r t = α + Bf t + ɛ t N w i r i,t, t = 1,..., T. i=1 r p,t = w T α + w T Bf t + w T ɛ t = α p + β T p f t + ɛ p,t where α p = w T α β T p = w T B ɛ p,t = w T ɛ t and ( ) var(r p,t ) = w T BΣ f B T + Ψ w. D. Palomar Factor Models 13 / 45

Outline 1 Introduction 2 Linear Factor Models 3 Macroeconomic Factor Models 4 Fundamental Factor Models 5 Statistical Factor Models

Macroeconomic Factor Models Recall the factor model for asset i: x i,t = α i + β T i f t + ɛ i,t, t = 1,..., T. In this model, the factors {f t } are observed economic/financial time series. Econometric problems: choice of factors estimation of mean vector and covariance matrix of factors µ f and Σ f from observed history of factors estimation of factor betas β i s and residual variances σi 2 s using time series regression techniques D. Palomar Factor Models 15 / 45

Macroeconomic Factor Models Single factor model of Sharpe (1964) (aka CAPM): x i,t = α i + β i R M,t + ɛ i,t, t = 1,..., T where R M,t is the return of the market (in excess of the risk-free asset rate): market risk factor (typically a value weighted index like the S&P 500) x i,t is the return of asset i (in excess of the risk-free rate) K = 1 and the single factor is f 1,t = R M,t the unconditional cross-sectional covariance matrix of the assets is where σm 2 = var (R M,t) β = (β 1,..., β N ) T Ψ = diag ( σ1 2,..., ) σ2 N Cov [x t ] = Σ = σ 2 M ββt + Ψ D. Palomar Factor Models 16 / 45

Macroeconomic Factor Models Estimation of single factor model: x i = 1 T ˆα i + ˆβ i r M + ˆɛ i, i = 1,..., N where r M = (R M,1,..., R M,T ) with estimates: ˆβ i = ĉov (x i,t, R M,t ) / var (R M,t ) ˆα i = x i ˆβ i r M ˆσ i 2 = 1 T 2 ˆɛT i ˆɛ i, ˆΨ = diag (ˆσ 1 2,..., ) ˆσ2 N The estimated single factor model covariance matrix is ˆΣ = ˆσ 2 M ˆβ ˆβ T + ˆΨ D. Palomar Factor Models 17 / 45

Macroeconomic Factor Models: Market Neutrality Recall the single factor model: x t = α + βr M,t + ɛ t, t = 1,..., T When designing a portfolio w, it is common to have a market-neutral constraint: β T w = 0. This is to avoid exposure to the market risk. The resulting risk is given by w T Ψw. D. Palomar Factor Models 18 / 45

Capital Asset Pricing Model: AAPL vs SP500 AAPL regressed against the SP500 index (using risk-free rate r f = 2%/252): 0.1 Scatter CAMP model 0.05 AAPL log return 0 0.05 0.1 0.15 0.08 0.06 0.04 0.02 0 0.02 0.04 0.06 Market log return D. Palomar Factor Models 19 / 45

R 2 The R 2 from the time series regression is a measure of the proportion of market risk and 1 R 2 is a measure of asset specific risk (σi 2 is a measure of the typical size of asset specific risk). Given the variance decomposition: R 2 can be estimated as var (x i,t ) = β 2 i var (R M,t ) + var (ɛ i,t ) = β 2 i σ 2 M + σ2 i R 2 = ˆβ 2 i ˆσ2 M var (x i,t ) Extensions: robust regression techniques to estimate β i, σi 2, and σ2 M factor model not constant over time, obtaining time-varying β i,t, σi,t 2, and σm,t 2 β i,t can be estimated via rolling regression or Kalman filter techniques σi,t 2 and σ2 M,t (i.e., conditional heteroskedasticity) can be captured via GARCH models or exponential weights D. Palomar Factor Models 20 / 45

Macroeconomic Factor Models General multifactor model: x i,t = α i + β T i f t + ɛ i,t, t = 1,..., T. where the factors {f t } represent macro-economic variables such as 2 market risk price indices (CPI, PPI, commodities) / inflation industrial production (GDP) money growth interest rates housing starts unemployment In practice, there are many factors and in most cases they are very expensive to obtain. Typically, investment funds have to pay substantial subscription fees to have access to them (not available to small investors). 2 Chen, Ross, Roll (1986). Economic Forces and the Stock Market D. Palomar Factor Models 21 / 45

Macroeconomic Factor Models Estimation of multifactor model (K > 1): x i = 1 T α i + Fβ i + ɛ i, i = 1,..., N = Fγ i + ɛ i where F = [ 1 T F ] and γ i = [ αi Estimates: ˆγ i = ( F T F) 1 F T x i, ˆB = [ ˆβ 1 ˆβ N ] T ˆɛ i = x i Fˆγ i ˆσ 2 i = 1 β i ]. T K 1 ˆɛT i ˆɛ i, ˆΨ = diag (ˆσ 1 2,..., ) ˆσ2 N ˆΣ f = 1 T 1 T t=1 ( ft f ) ( f t f ) T, f = 1 T T t=1 f t The estimated multifactor model covariance matrix is ˆΣ = ˆB ˆΣ f ˆB T + ˆΨ D. Palomar Factor Models 22 / 45

Interlude: LS Regression Consider the LS regression: minimize γ x Fγ 2 We set the gradient to zero 2 F T Fγ 2 F T x = 0 to find the estimator as the optimal solution ˆγ = ( F T F) 1 F T x. Now we can compute the residual of the LS regression as ˆɛ = x Fˆγ and its variance with the sample estimator ˆσ 2 = 1 T ˆɛ 2 = 1 T x Fˆγ 2. D. Palomar Factor Models 23 / 45

Interlude: Maximum Likelihood Estimation The pdf for the residual under a Gaussian distribution is 1 p (ɛ t ) = 1 2πσ 2 e 2σ 2 ɛ2 t and the log-likelihood of the parameters ( γ, σ 2) given the T uncorrelated observations in ɛ is L ( γ, σ 2 ; ɛ ) = T 2 log ( 2πσ 2) 1 2σ 2 x Fγ 2. We can then formulate the MLE as the optimization problem minimize γ,σ 2 T 2 log ( σ 2) + 1 2σ 2 x Fγ 2. Setting the gradient with respect to γ and 1/σ 2, respectively, we get 1 ˆσ F T Fˆγ 2 1ˆσ F T 2 x = 0 and T 2 ˆσ2 + 1 2 x Fˆγ 2 = 0 leading to the estimators ˆγ = ( F T F) 1 F T x and ˆσ 2 = 1 T x Fˆγ 2. D. Palomar Factor Models 24 / 45

Interlude: MLE is Biased The MLE is consistent and asymptotically efficient and unbiased. However, it is biased for finite number of observations: the estimation of γ is unbiased [ E [ˆγ] = E ( F T F) 1 F ] T x = ( F T F) 1 F T E [x] = ( F T F) 1 F T Fγ = γ the estimation of σ 2 is, however, biased. First, rewrite it as ˆσ 2 = 1 T x Fˆγ 2 = 1 T x F( F T F) 1 F T x 2 = 1 T P F x 2 where P F = I F( F T F) 1 F T is the projection onto the subspace orthogonal to the one spanned by F (contains T (K + 1) eigenvalues equal to one and K + 1 zero eigenvalues). Now, the bias is E [ˆσ 2] = 1 [ ] T E x T P F x = 1 ( T Tr P [xx ]) F E T = = 1 T Tr ( P F (σ 2 I + Fγγ T F T )) = σ2 T T (K + 1) ) = σ2. Tr(P F T D. Palomar Factor Models 25 / 45

Interlude: MLE is Biased Since we now know there is a bias in the MLE of the variance, we could consider correcting this bias to obtain an unbiased estimator (Bessel t correction): ˆσ 2 unbiased = T T (K + 1) ˆσ2 = 1 T (K + 1) x Fˆγ 2. The interpretation is that T (K + 1) represent the degrees of freedom. (Note that if γ was known perfectly, then the degrees of freedom would be T.) D. Palomar Factor Models 26 / 45

Outline 1 Introduction 2 Linear Factor Models 3 Macroeconomic Factor Models 4 Fundamental Factor Models 5 Statistical Factor Models

Fundamental Factor Models Fundamental factor models use observable asset specific characteristics (fundamentals) like industry classification, market capitalization, style classification (value, growth), etc., to determine the common risk factors {f t }. factor loading betas are constructed from observable asset characteristics (i.e., B is known) factor realizations {f t } are then estimated/constructed for each t given B note that in macroeconomic factor models the process is the opposite, i.e., the factors {f t } are given and B is estimated in practice, fundamental factor models are estimated in two ways: BARRA approach and Fama-French approach. D. Palomar Factor Models 28 / 45

BARRA Approach The BARRA approach was pioneered by Bar Rosenberg, founder of BARRA Inc. (cf. Grinold and Kahn (2000), Conner et al. (2010), Cariño et al. (2010)). Econometric problems: choice of betas estimation of factor realizations {f t } for each t given B (i.e., by running T cross-sectional regressions) D. Palomar Factor Models 29 / 45

Fama-French Approach This approach was introduced by Eugene Fama and Kenneth French (1992): For a given observed asset specific characteristics, e.g. size, determine factor realizations for each t with the following two steps: 1 sort the cross-section of assets based on that attribute 2 form a hedge portfolio by longing in the top quintile and shorting in the bottom quintile of the sorted assets Define the common factor realizations with the return of K of such hedge portfolios corresponding to the K fundamental asset attributes. Then estimate the factor loadings using time series regressions (like in macroeconomic factor models). D. Palomar Factor Models 30 / 45

BARRA Industry Factor Model Consider a stylized BARRA-type industry factor model with K mutually exclusive industries. Define the K factor loadings as { 1 if asset i is in industry k β i,k = 0 otherwise The industry factor model is (note that α = 0) x t = Bf t + ɛ t, t = 1,..., T LS estimation (inefficient due to heteroskedasticity in Ψ): ˆft = (B T B) 1 B T x t, t = 1,..., T but since B T B = diag (N 1,..., N K ), where N k is the count of assets in industry k ( K k=1 N k = N), then ˆf t is a vector of industry averages!! The residual covariance matrix unbiased estimator is ˆΨ = diag (ˆσ 1 2,..., N) ˆσ2 where ˆɛt = x t Bˆf t and ˆσ i 2 = 1 T (ˆɛi,t T 1 ˆɛ ) ) T i (ˆɛi,t ˆɛ i and ˆɛ i = 1 T ˆɛ i,t. T t=1 t=1 D. Palomar Factor Models 31 / 45

Outline 1 Introduction 2 Linear Factor Models 3 Macroeconomic Factor Models 4 Fundamental Factor Models 5 Statistical Factor Models

Statistical Factor Models: Factor Analysis In statistical factor models, both the common-factors {f t } and the factor loadings B are unknown. The primary methods for estimation of factor structure are Factor Analysis (via EM algorithm) Principal Component Analysis (PCA) Both methods model the covariance matrix Σ by focusing on the sample covariance matrix ˆΣ SCM computed as follows: X T = [ ] x 1 x T (N T ) ( X T = X T I 1 ) T 1 T 1 T T (demeaned by row) ˆΣ SCM = 1 T 1 X T X D. Palomar Factor Models 33 / 45

Factor Analysis Model Linear factor model as cross-sectional regression: with E[f t ] = µ f Cov [f t ] = Σ f. x t = α + Bf t + ɛ t, t = 1,..., T Invariance to linear transformations of f t : The solution we seek for B and {f t } is not unique (this problem was not there when only B or {f t } had to be estimated). For any K K invertible matrix H we can define f t = Hf t and B = BH 1. We can write the factor model as x t = α + Bf t + ɛ t = α + B f t + ɛ t with E[ f t ] = E [Hf t ] = Hµ f Cov[ f t ] = Cov [Hf t ] = HΣ f H T. D. Palomar Factor Models 34 / 45

Factor Analysis Model Standard formulation of factor analysis: Orthogonal factors: Σ f = I K This is achieved by choosing H = Λ 1/2 Γ T, where Σ f = ΓΛΓ T is the spectral/eigen decomposition with orthogonal K K matrix Γ and diagonal matrix Λ = diag (λ 1,..., λ K ) with λ 1 λ 2 λ K. Zero-mean factors: µ f = 0 This is achieved by adjusting α to incorporate the mean contribution from the factors: α = α + Bµ f. Under these assumptions, the unconditional covariance matrix of the observations is Σ = BB T + Ψ. D. Palomar Factor Models 35 / 45

Factor Analysis: Variance Decomposition Recall the covariance matrix for the returns Σ = BB T + Ψ. For a given asset i, the return variance may then be expressed as var (x i,t ) = K βik 2 + σ2 i k=1 variance portion due to common factors, K communality k=1 β2 ik, is called the variance portion due to specific factors, σi 2, is called the uniqueness another quantity of interest is the factor s marginal contribution to active risk (FMCAR). D. Palomar Factor Models 36 / 45

Factor Analysis: MLE Maximum likelihood estimation: Consider the model x t = α + Bf t + ɛ t, t = 1,..., T where α and B are vector/matrix constants all random variables are Gaussian/Normal: f t i.i.d. N (0, I) ɛ t i.i.d. N (0, Ψ) with Ψ = diag ( σ1 2,..., ) σ2 N x t i.i.d. N ( α, Σ = BB T + Ψ ) Probability density function (pdf): T p (x 1,..., x T α, Σ) = p (x t α, Σ) = t=1 T t=1 (2π) N 2 Σ 1 2 exp ( 1 ) 2 (x t α) T Σ 1 (x t α) D. Palomar Factor Models 37 / 45

Factor Analysis: MLE Likelihood of the factor model: The log-likelihood of the parameters (α, Σ) given the T i.i.d. observations is L (α, Σ) = log p (x 1,..., x T α, Σ) = TN 2 log (2π) T 2 log Σ 1 2 Maximum likelihood estimation (MLE): minimize α,σ,b,ψ subject to T (x t α) T Σ 1 (x t α) t=1 T 2 log Σ + 1 2 T t=1 (x t α) T Σ 1 (x t α) Σ = BB T + Ψ Note that without the constraint, the solution would be ˆα = 1 T x t and ˆΣ = 1 T (x t ˆα) (x t ˆα) T. T T t=1 D. Palomar Factor Models 38 / 45 t=1

which is asymptotically chi-square with 1 2 ((N K)2 N K) degrees of freedom. D. Palomar Factor Models 39 / 45 Factor Analysis: MLE MLE solution: minimize α,σ,b,ψ subject to T 2 log Σ + 1 2 T t=1 (x t α) T Σ 1 (x t α) Σ = BB T + Ψ Employ the Expectation-Maximization (EM) algorithm to compute ˆα, ˆB, and ˆΨ. Estimate factor realization {f t } using, for example, the GLS estimator: ˆft = (ˆB T ˆΨ 1 ˆB) 1 ˆB T ˆΨ 1 x t, t = 1,..., T. The number of factors K can be estimated with a variety of methods such as the likelihood ratio (LR) test, Akaike information criterion (AIC), etc. For example: LR(K) = (T 1 1 6 (2N + 5) 2 ) (log 3 K) ˆΣ SCM log ˆBˆB T + ˆΨ

Principal Component Analysis PCA: is a dimension reduction technique used to explain the majority of the information in the covariance matrix. N-variate random variable x with E [x] = α and Cov [x] = Σ. Spectral/eigen decomposition Σ = ΓΛΓ T where Λ = diag (λ 1,..., λ K ) with λ 1 λ 2 λ K Γ orthogonal K K matrix: Γ T Γ = I N Principal component variables: p = Γ T (x α) with [ ] E [p] = E Cov [p] = Cov Γ T (x α) [ Γ T (x α) = Γ T (E [x] α) = 0 ] = Γ T Cov [x] Γ = Γ T ΣΓ = Λ. p is a vector of zero-mean, uncorrelated random variables, in order of importance (i.e., the first components explain the largest portion of the sample covariance matrix) In terms of multifactor model, the K most important principal components are the factor realizations. D. Palomar Factor Models 40 / 45

Factor Models: Principal Component Analysis Factor model from PCA: From PCA, we can write the random vector x as x = α + Γp where E [p] = 0 and Cov [p] = Λ = diag (Λ 1, Λ 2 ). Partition Γ = [ ] Γ 1 Γ 2 where Γ1 corresponds to the K largest eigenvalues of [ Σ. ] p1 Partition p = where p 1 contains the first K elements p 2 Then we can write where x = α + Γ 1 p 1 + Γ 2 p 2 = α + Bf + ɛ B = Γ 1, f = p 1 and ɛ = Γ 2 p 2. This is like a factor model except that Cov [ɛ] = Γ 2 Λ 2 Γ T 2, where Λ 2 is a diagonal matrix of last N K eigenvalues but Cov [ɛ] is not diagonal... D. Palomar Factor Models 41 / 45

Factor Models: Why PCA? But why is PCA the desired solution to the statistical factor model? The idea is to obtain the factors from the returns themselves: or, more compactly, x t = α + Bf t + ɛ t, t = 1,..., T x t = α + B f t = C T x t + d The problem formulation is ( ) C T x t + d + ɛ t, t = 1,..., T minimize α,b,c,d 1 T T t=1 x t α + B ( C T x t + d ) 2 Solution is involved to derive and is given by: f t = Γ T 1 (x t α) or, if normalized factors are desired, f t = Λ 1 2 1 Γ T 1 (x t α). D. Palomar Factor Models 42 / 45

Factor Models via PCA 1 Compute sample estimates ˆα = 1 T x t and ˆΣ = 1 T T t=1 T (x t ˆα) (x t ˆα) T. t=1 2 Compute spectral decomposition: ˆΣ = ˆΓ ˆΛˆΓ T, ˆΓ = [ ˆΓ 1 ˆΓ 2 ] 3 Form the factor loadings, factor realizations, and residuals (so that x t = ˆα + ˆBˆf t + ˆɛ t ): ˆB = ˆΓ 1 ˆΛ 1 2 1, ˆft = ˆΛ 1 2 1 4 Covariance matrices: ˆΣ f = 1 T ˆftˆfT T t = I K, ˆΨ = 1 T t=1 ˆΓ T 1 (x t ˆα), T ˆɛ t ˆɛ T t t=1 ˆɛ t = x t ˆα ˆBˆf t ˆΣ = ˆΓ 1 ˆΛ 1 ˆΓ 1 + ˆΓ 2 ˆΛ 2 ˆΓ 2 = ˆB ˆΣ f ˆB T + ˆΨ. = ˆΓ 2 ˆΛ 2 ˆΓ 2 (not diagonal!) D. Palomar Factor Models 43 / 45

Principal Factor Method Since the previous method does not lead to a diagonal covariance matrix for the residual, let s refine the method with the following iterative approach 1 PCA: sample mean: ˆα = x = 1 T XT 1 T demeaned matrix: X T = X T x1 T T sample covariance matrix: ˆΣ = 1 T 1 X T X eigen-decomposition: ˆΣ = ˆΓ 0 ˆΛ 0 ˆΓ T 0 set index s = 0 2 Estimates: ˆB (s) = ˆΓ (s 1) ˆΛ 1 2 (s 1) ˆΨ (s) = diag( ˆΣ ˆB (s) ˆB T (s) ) ˆΣ (s) = ˆB (s) ˆB T (s) + ˆΨ (s) 3 Update the eigen-decomposition as ˆΣ ˆΨ (s) = ˆΓ (s) ˆΛ (s) ˆΓ T (s) 4 Update s s + 1 and repeat Steps 2-3 generating a sequence of estimates (ˆB (s), ˆΨ (s), ˆΣ (s) ) until convergence. D. Palomar Factor Models 44 / 45

Thanks For more information visit: https://www.danielppalomar.com