Multivariate Asset Return Prediction with Mixture Models

Similar documents
The Instability of Correlations: Measurement and the Implications for Market Risk

Time Series Models for Measuring Market Risk

Financial Econometrics and Quantitative Risk Managenent Return Properties

CHICAGO: A Fast and Accurate Method for Portfolio Risk Calculation

Multivariate Distributions

A simple graphical method to explore tail-dependence in stock-return pairs

Efficient estimation of a semiparametric dynamic copula model

CONTAGION VERSUS FLIGHT TO QUALITY IN FINANCIAL MARKETS

Lecture 6: Univariate Volatility Modelling: ARCH and GARCH Models

Introduction to Algorithmic Trading Strategies Lecture 10

Bayesian Modeling of Conditional Distributions

On Backtesting Risk Measurement Models

Bayesian Semiparametric GARCH Models

Bayesian Semiparametric GARCH Models

GARCH Models Estimation and Inference

Heteroskedasticity in Time Series

Generalized Autoregressive Score Models

Stock index returns density prediction using GARCH models: Frequentist or Bayesian estimation?

Lecture 8: Multivariate GARCH and Conditional Correlation Models

Volatility. Gerald P. Dwyer. February Clemson University

Financial Econometrics and Volatility Models Copulas

MFE Financial Econometrics 2018 Final Exam Model Solutions

Lecture 2: Univariate Time Series

Modeling conditional distributions with mixture models: Applications in finance and financial decision-making

Multivariate Statistics

CHICAGO: A Fast and Accurate Method for Portfolio Risk Calculation

Asymptotic distribution of the sample average value-at-risk

Dependence and VaR Estimation:An Empirical Study of Chinese Stock Markets using Copula. Baoliang Li WISE, XMU Sep. 2009

A simple nonparametric test for structural change in joint tail probabilities SFB 823. Discussion Paper. Walter Krämer, Maarten van Kampen

Marginal Specifications and a Gaussian Copula Estimation

A Goodness-of-fit Test for Copulas

Multivariate Normal-Laplace Distribution and Processes

Multivariate GARCH models.

Gaussian kernel GARCH models

GARCH Models. Eduardo Rossi University of Pavia. December Rossi GARCH Financial Econometrics / 50

SYSM 6303: Quantitative Introduction to Risk and Uncertainty in Business Lecture 4: Fitting Data to Distributions

When is a copula constant? A test for changing relationships

Financial Times Series. Lecture 12

6. The econometrics of Financial Markets: Empirical Analysis of Financial Time Series. MA6622, Ernesto Mordecki, CityU, HK, 2006.

Shape of the return probability density function and extreme value statistics

GARCH Models Estimation and Inference

Some Approximations of the Logistic Distribution with Application to the Covariance Matrix of Logistic Regression

THE UNIVERSITY OF CHICAGO Graduate School of Business Business 41202, Spring Quarter 2003, Mr. Ruey S. Tsay

GARCH Models Estimation and Inference. Eduardo Rossi University of Pavia

Backtesting Marginal Expected Shortfall and Related Systemic Risk Measures

SPECIFICATION TESTS IN PARAMETRIC VALUE-AT-RISK MODELS

STATISTICS/ECONOMETRICS PREP COURSE PROF. MASSIMO GUIDOLIN

Regression Analysis. y t = β 1 x t1 + β 2 x t2 + β k x tk + ϵ t, t = 1,..., T,

Estimating Global Bank Network Connectedness

Stat 5102 Final Exam May 14, 2015

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 3, 2010

Asymmetry in Tail Dependence of Equity Portfolios

Dependence Patterns across Financial Markets: a Mixed Copula Approach

Vladimir Spokoiny Foundations and Applications of Modern Nonparametric Statistics

Partially Censored Posterior for Robust and Efficient Risk Evaluation.

Assessing the VaR of a portfolio using D-vine copula based multivariate GARCH models

Modeling conditional distributions with mixture models: Theory and Inference

Warwick Business School Forecasting System. Summary. Ana Galvao, Anthony Garratt and James Mitchell November, 2014

Portfolio Value at Risk Based On Independent Components Analysis

Lecture 9: Markov Switching Models

ECONOMICS 7200 MODERN TIME SERIES ANALYSIS Econometric Theory and Applications

Web Appendix to Multivariate High-Frequency-Based Volatility (HEAVY) Models

MGR-815. Notes for the MGR-815 course. 12 June School of Superior Technology. Professor Zbigniew Dziong

Normal Probability Plot Probability Probability

Risk management with the multivariate generalized hyperbolic distribution, calibrated by the multi-cycle EM algorithm

Two hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

End-Semester Examination MA 373 : Statistical Analysis on Financial Data

Estimation of Operational Risk Capital Charge under Parameter Uncertainty

STA 4273H: Statistical Machine Learning

Motivation Scale Mixutres of Normals Finite Gaussian Mixtures Skew-Normal Models. Mixture Models. Econ 690. Purdue University

Modeling International Financial Returns with a Multivariate Regime-switching Copula

Estimating Expected Shortfall Using a Conditional Autoregressive Model: CARES

Parametric Models. Dr. Shuang LIANG. School of Software Engineering TongJi University Fall, 2012

Three hours. To be supplied by the Examinations Office: Mathematical Formula Tables and Statistical Tables THE UNIVERSITY OF MANCHESTER.

Probabilities & Statistics Revision

Quaderni di Dipartimento. Small Sample Properties of Copula-GARCH Modelling: A Monte Carlo Study. Carluccio Bianchi (Università di Pavia)

L11: Pattern recognition principles

Risk Measures with Generalized Secant Hyperbolic Dependence. Paola Palmitesta. Working Paper n. 76, April 2008

Improving forecasting performance using covariate-dependent copula models

Conditional probabilities for Euro area sovereign default risk

DYNAMIC CONDITIONAL CORRELATIONS FOR ASYMMETRIC PROCESSES

Risk Aggregation and Model Uncertainty

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 28, 2015

Bayesian Regression Linear and Logistic Regression

Research Article The Laplace Likelihood Ratio Test for Heteroscedasticity

Modelling Dependence of Interest, Inflation and Stock Market returns. Hans Waszink Waszink Actuarial Advisory Ltd.

Bayesian inference for the mixed conditional heteroskedasticity model

HANDBOOK OF APPLICABLE MATHEMATICS

INVERTED KUMARASWAMY DISTRIBUTION: PROPERTIES AND ESTIMATION

STA 414/2104: Machine Learning

The Realized RSDC model

High-frequency data modelling using Hawkes processes

High-frequency data modelling using Hawkes processes

Non-parametric Test Frameworks

Modeling International Financial Returns with a Multivariate Regime Switching Copula

Model Fitting. Jean Yves Le Boudec

A Semi-Parametric Measure for Systemic Risk

Solutions of the Financial Risk Management Examination

Miloš Kopa. Decision problems with stochastic dominance constraints

Symmetric btw positive & negative prior returns. where c is referred to as risk premium, which is expected to be positive.

Transcription:

Multivariate Asset Return Prediction with Mixture Models Swiss Banking Institute, University of Zürich

Introduction The leptokurtic nature of asset returns has spawned an enormous amount of research into effective ways of modeling them, often involving the use of distributions which have power tails, and the ensuing nonexistence of higher moments..3.25 Distribution of Stock Returns for Bank of America kernel estimate fitted normal.2.15.1.5 8 6 4 2 2 4 6 8

Introduction This issue has long since transcended ivory tower academics and quantitative groups in financial institutions, especially since the statement by Alan Greenspan (1997, p. 54),... the biggest problems we now have with the whole evaluation of risk is the fat-tail problem, which is really creating very large conceptual difficulties. Because as we all know, the assumption of normality enables us to drop off a huge amount of complexity of our equations very much to the right of the equal sign. Because once you start putting in non-normality assumptions, which unfortunately is what characterizes the real world, then the issues become extremely difficult.

Introduction We propose a multivariate model which, besides exhibiting leptokurtic behavior and asymmetry, lends itself to tractable distributions of weighted sums of the marginal random variables (to enable portfolio construction), portfolios possess finite positive integer moments of all orders, is super-fast to estimate, delivers superior (compared to DCC and related models) multivariate density forecasts, has trivial stationarity conditions as a stochastic process. This is accomplished by not using a GARCH structure.

Need for Mixtures A mixture can capture, besides excess kurtosis and asymmetry, the two further stylized facts of the leverage or down market effect the negative correlation between volatility and asset returns, and contagion effects the tendency of the correlation between asset returns to increase during pronounced market downturns, as well as during periods of higher volatility. Other fat-tailed multivariate distributions, such as the multivariate Student s t or the multivariate generalized hyperbolic distribution, cannot achieve this.

Need for Mixtures With a two-component multivariate normal mixture model with component parameters µ 1, Σ 1, µ 2, Σ 2, we would expect to have the primary component capturing the business as usual stock return behavior, with a near-zero mean vector µ 1, and the second component capturing the more volatile, crisis behavior, with (much) higher variances in Σ 2 than in Σ 1, significantly larger correlations, reflecting the contagion effect, and a predominantly negative µ 2, reflecting the down market effect. A distribution with only a single mean vector and covariance (or, more generally, dispersion) matrix cannot capture this behavior, no matter how many additional shape parameters the distribution possesses. This is true even if each marginal were to be endowed with its own set of shape (tail and skew) parameters, as is possible when using a copula.

Evidence for Mixtures We use the 3 stocks comprising the DJIA-3, daily data from June 21 to March 29, for T = 1945 observations, on 3 variables. Let Y t = (Y t,1, Y t,2,..., Y t,d ) i.i.d. Mix k N d (M, Ψ, λ), t = 1,..., T, where Mix k N d denotes the k-component, non-singular d-dimensional multivariate mixed normal distribution, with M = [ µ 1 µ 2 µ k ], µj = (µ 1j, µ 2j,..., µ dj ), Ψ = [ Σ 1 Σ 2 Σ k ], λ = (λ 1,..., λ k ), Σ j > (i.e., positive definite), j = 1,..., k, and k ( ) f Mixk N d (y; M, Ψ, λ) = λ j f N y; µj, Σ j, λj (, 1), j=1 k λ j = 1, j=1 with f N denoting the d-variate multivariate normal distribution.

Evidence for Mixtures For k = 2 components in the Mix k N d distribution, show the d = 3 means for components 1 and 2 separately. 9 8 7 6 5 4 3 2 1 Means from μ 1 9 8 7 6 5 4 3 2 1 Means from μ 2.4.3.2.1.1.2.3.4.4.3.2.1.1.2.3.4

Evidence for Mixtures Same, but for the d = 3 variances for components 1 and 2 separately. 7 6 5 4 3 2 1 Variances in Σ 1 8 7 6 5 4 3 2 1 Variances in Σ 2 1 2 3 4 5 1 2 3 4 5 6

Evidence for Mixtures Same, but for the ( 3 2 ) = 425 covariance for components 1 and 2 separately. Correlations in Σ 1 Correlations in Σ 2 35 3 25 2 15 1 5.1.2.3.4.5.6.7.8.9 4 35 3 25 2 15 1 5.1.2.3.4.5.6.7.8.9

Shrinkage Estimation Shrinkage estimation is crucial in this context for both numeric reasons, and (far) improved estimation. The paper gives full details on how this is done. Our shrinkage prior, as a function of the scalar hyper-parameter ω, is a 1 = 2ω, a 2 = ω/2, c 1 = c 2 = 2ω, m 1 = p, m 2 = (.1)1 p, B 1 = a 1 [ (1.5.6)Ip +.6J p ], B2 = a 2 [ (1 4.6)Ip + 4.6J p ]. (1) The only tuning parameter which remains to be chosen is ω. The effect of different choices of ω is easily and informatively demonstrated with a simulation study, using the Mix 2 N 3 model, with parameters given by the MLE of the 3 return series.

Shrinkage Estimation For assessing the quality of the estimates, we use the log sum of squares as the summary measure, and noting that we have to convert the estimated parameter vector if the component labels are switched. That is, M ( θ, θ) = min { M( θ, θ), M( θ, θ = ) }, (2) where θ = ( µ 1 µ 2 (vech(σ 1)) (vech(σ 2 )) λ 1 ), the vech operator of a matrix forms a column vector consisting of the elements on and below the main diagonal, M( θ, θ) := log( θ θ) ( θ θ), (3) and θ = refers to the parameter vector obtained by switching the labels of the components, i.e., θ = = ( µ 2 µ 1 (vech(σ 2)) (vech(σ 1 )) (1 λ 1 ) ).

Shrinkage Estimation Estimation accuracy, measured as four divisions of M from (2) (λ 1 is ignored), based on simulation with 1, replications and T = 25, of the parameters of the Mix 2 N 3 model, using as true parameters the MLE of the d = 3 return series, as a function of prior strength parameter ω. M* for μ 1 (n=25, d=3) M* for μ 2 (n=25, d=3).5.5 1 1.5 2 2.5 3 3.5 4 ω = ω = 4 ω = 8 ω = 12 ω = 16 ω = 2 ω = 24 ω = 28 ω = 32 ω = 36 ω = 4 3 2 1 1 2 ω = ω = 4 ω = 8 ω = 12 ω = 16 ω = 2 ω = 24 ω = 28 ω = 32 ω = 36 ω =

Shrinkage Estimation M* for (vech of) Σ 1 (n=25, d=3) M* for (vech of) Σ 2 (n=25, d=3) 4 1 9.5 3.5 9 3 8.5 8 2.5 2 1.5 ω = ω = 4 ω = 8 ω = 12 ω = 16 ω = 2 ω = 24 ω = 28 ω = 32 ω = 36 ω = 7.5 7 6.5 6 ω = ω = 4 ω = 8 ω = 12 ω = 16 ω = 2 ω = 24 ω = 28 ω = 32 ω = 36 ω =

Component Separation The H t,j are the posterior probabilities from the EM algorithm that observation Y t came from component j, t = 1,..., T, j = 1, 2, conditional on all the Y t and the estimated components. It is natural to plot these versus time. Final values of H t1 versus t=1...1945 Final values of H t1 for last 25 values 1.9.8.7.6.5.4.3.2.1 1.9.8.7.6.5.4.3.2.1 5 1 15 5 1 15 2 25 One might be tempted to use this finding as further evidence for the claim that there exist two, reasonably distinct, regimes in financial markets, but this is not the case! The same effect would occur if the data had arisen from a fat-tailed single-component multivariate distribution.

Component Separation Now use a multivariate normal distribution (left) and a heavier-tailed multivariate Laplace, both of which are single-component densities! Then fit fit the 2-comp mixture of normals. 1.9.8.7.6.5.4.3.2.1 1.9.8.7.6.5.4.3.2.1 5 1 15 2 5 1 15 2 Thus, the separation apparent for the DJ-3 data is necessary, but clearly not sufficient to declare that the data were generated by a mixture distribution. Stronger evidence for mixtures comes from the earlier graphic: the means of the first and second component differ markedly, with the latter being primarily negative, and the correlations in the second component are on average higher than those associated with component 1.

Further Evidence for Mixtures Example: 2 stocks for which the correlations change the most between component 1 and 2. For Chevron and Walt Disney, the component 1 correlation is.25; for component 2 it is.63. 15 Fitted Two Component Normal Mixture 1 5 Walt Disney 5 1 15 2 15 1 5 5 1 15 2 Chevron

Component Inspection: Is Normality Adequate? The apparent separation is highly advantageous because it allows us to assign each Y t to one of the two components with very high confidence. Once done, we can assess how well each of the two estimated multivariate normal distributions fits the observations assigned to its component. We use the criteria H t,1 >.99, choosing to place those Y t whose corresponding values of H t,1 suggest even a slight influence from component 2, into this more volatile component. This results in 1,373 observations assigned to component 1, or 7.6% of the observations, and 572 to the second component.

Component Inspection: Is Normality Adequate? Based on the split, for each of the k d = 6 series, we fit a flexible, asymmetric, fat-tailed distribution. We use the generalized asymmetric t, or GAt, distribution, with location-zero, scale-one pdf given by (p, ν, θ R > ) ( z θ)p (1 + ν f GAt (z; p, ν, θ) = C p,ν,θ (1 + (z/θ)p ν ) (ν+ 1 p ), if z <, ) (ν+ 1 p ), if z, Parameter p measures the peakedness of the density, with values near one indicative of Laplace-type behavior, while values near two indicate a peak similar to that of the Gaussian and Student s t distributions. Parameter v indicates the tail thickness, and is analogous to the degrees of freedom parameter. Parameter θ, controls the asymmetry, with values less than 1 indicating negative skewness.

Component Inspection: Is Normality Adequate? Remember: Parameter v indicates the tail thickness analogous to the df in the Student t t, except that moments of order v p and higher do not exist. So that, if p = 2, then we would double the value of v to make it comparable to the df in the usual Student s t. 8 Component 1 (EM.99) 8 Component 2 (EM.99) 7 7 6 6 5 5 4 4 3 3 2 2 1 1 1 p v θ μ ĉ Sk 1 p v θ μ ĉ Sk

Worst Case: McDonalds.4 McDonald s Component 1 Fitted Densities.25 McDonald s Component 2 Fitted Densities.35.3.2.25.2 Uncon: p ML =2.4 v ML =2. Const: p ML =1.4 v ML =9.15 Uncon: p ML =2.1 v ML =2.1 Const: p ML =1.2 v ML =9.15.1.1.5.5 6 4 2 2 4 6 McDonald s Component 1 Fitted Densities (7 Outliers Removed).4.35.3 1 5 5 1 McDonald s Component 2 Fitted Densities (1 Outlier Removed).25.2.25.2.15 Uncon: p ML =2. v ML =5. Const: p ML =1.6 v ML =9.15.1 Uncon: p ML =1.9 v ML =2.9 Const: p ML =1.3 v ML =9.1.5.5 6 4 2 2 4 6 1 5 5 1

Component Inspection: Is Normality Adequate? For each of the 3 series, but not separating them into the two components, we fit the GAt, with no parameter restrictions, and with the restriction that 9 < ˆv < 1 (forces normality if p = 2 and θ = 1, or Laplace if p = 1 and θ = 1) Compute the asymptotically valid p-value of the likelihood ratio test. If that value is less than.5, then we remove the largest value (in absolute terms) from the series, and re-compute the estimates and the p-value. This is repeated until the p-value exceeds.5. We report the smallest number of observations required to be removed in order to achieve this. Do this for the actual 3 series, and also the two separated components.

Component Inspection: Is Normality Adequate? Stock number 5 is Bank of America, and shows that 65 most extreme values had to be removed from the series to get the p-value above.5, but no observations from component 1 needed to be removed, and only 3 from component 2. Stock # 1 2 3 4 5 6 7 8 9 1 11 12 13 14 15 All 19 19 5 2 65 21 11 27 6 1 3 19 28 31 8 Comp1 3 1 1 1 Comp2 1 3 1 1 6 8 2 8 1 1 Stock # 16 17 18 19 2 21 22 23 24 25 26 27 28 29 3 All 7 12 8 5 19 7 15 28 3 14 1 11 6 8 1 Comp1 2 2 7 Comp2 1 1 2 2 1 3 2 1

Density Forecasting We forecast the entire multivariate density. The measure of interest is what we will call the (realized) predictive log-likelihood, given by π t (M, v) = log f M t I t 1 (y t ; ψ), (4) where v denotes the size of the rolling window used to determine I t 1 (and the set of observations used for estimation of ψ) for each time point t.

Density Forecasting We suggest to use what we refer to as the normalized sum of the realized predictive log-likelihood, given by S τ,t (M, v) = 1 (T τ ) d where d is the dimension of the data. T t=τ +1 π t (M, v), (5) It is thus the average realized predictive log-likelihood, averaged over the number of time points used and the dimension of the random variable under study. This facilitates comparison over different d, τ and T. In our setting, we use the d = 3 daily return series of the DJ-3, with v = τ = 5, which corresponds to two years of data, and T = 1, 945.

Density Forecasting We compute for set of models indexed by ω, π t (M ω, v) from (4). We take { M ω } to be the Mix2 N 3 model estimated with shrinkage prior (1) for a given value of hyper-parameter ω. We do this using a moving window of size v = 25, starting at observation τ = v = 25, and updating parameter vector θ at each time increment.

Density Forecasting: Finding the optimal ω 6 5 4 ω=25 ω=2 ω=15 ω=1 ω=5 Normalized Cusum C(h,ω) C(h,.1) 1.62 1.64 1.66 S25,1945(Mω,v) 1.68 3 1.7 2 1.72 1 2 4 6 8 1 12 14 16 h τ +1 for τ =25 1.74 rolling window, v=25 rolling window, v=1 1.76 2 4 6 8 1 Prior Strength ω Here, h C(h, ω) = π t (M ω, 25), h = τ + 1,..., T, t=τ +1

Optimal Shrinkage as a Function of d 5 ω * (25) 1.55 S25,1945(M ω (25), 25) 45 1.6 4 1.65 35 1.7 3 1.75 25 1.8 2 15 1 5 5 1 15 2 25 3 1.85 1.9 1.95 2 2.5 5 1 15 2 25 3 1.55 S25,1945(Mω, 25) 1.55 S25,1945(Mω, 25) 1.6 1.6 1.65 1.65 1.7 1.7 1.75 1.75 1.8 1.8 1.85 1.85 1.9 1.9 1.95 1.95 2 2.5 ω = ω (25) (original graphic) ω = ω reg (25) 5 1 15 2 25 3 2 2.5 ω = ω (25) (original graphic) ω =1 5 1 15 2 25 3

The Competition: CCC, DCC, A-DCC, RSDC-GARCH It turns out that the forecasted values yield only a barely visible preference of DCC over CCC, and A-DCC over DCC. In light of what appears to be a general consensus in the literature that DCC is significantly superior to CCC, and the encouraging in-sample results favoring asymmetry, these results were somewhat surprising. The Regime Switching Dynamic Correlation (RSDC-) GARCH model of Pelletier (26) also recognizes different regimes in the data. 1.55 S25,1945({MixN, A-DCC}, 25) 1.55 S25,1945({MixN,Pelletier}, 25) 1.6 1.6 1.65 1.65 1.7 1.7 1.75 1.75 1.8 1.8 1.85 1.85 1.9 1.9 1.95 1.95 2 2.5 MixN (original graphic) A DCC 5 1 15 2 25 3 2 2.5 MixN (original graphic) Pelletier 5 1 15 2 25 3

The Competition: MVN and MVT Use of the MVN in the left panel. Just as a benchmark... Right panel shows: (i) the multivariate Student s t distribution (MVT) with fixed degrees of freedom 4 (ii), the two-component multivariate Student t mixture (using 1 and 4 degrees of freedom, for the two components), and (iii) the Mix 2 Lap d distribution introduced below. 1.55 S25,1945(Norm, 25) 1.55 S25,1945(MixLap, MVT, MixMVT, 25) 1.6 1.6 1.65 1.65 1.7 1.7 1.75 1.75 1.8 1.8 1.85 1.85 1.9 1.9 1.95 2 2.5 5 1 15 2 25 3 1.95 2 2.5 MixLap Stud 4 MixStud 1 4 5 1 15 2 25 3

Why does Mix 2 Lap d beat Mix 2 Stud d From a forecasting point of view, the Laplace mixture resulted in superior performance. We conjecture the reason for this to be that the tails of the t, when used in a mixture context, are too fat. While higher kurtosis is indeed still required for the second component, the Laplace offers this without power tails and the ensuing upper bound on finite absolute positive moments. Moreover, as demonstrated above via use of the GAt distribution and the fitted values of peakedness parameter p, there is evidence that the two components, particularly the second, have a more peaked distribution than the Student s t.

Portfolio Construction Let Y Mix k N d (M, Ψ, λ), with, as before, M = [ µ 1 µ 2 µ k ], Ψ = [ Σ1 Σ 2 Σ k ], and λ = (λ 1,..., λ k ). Interest centers on the distribution of the portfolio P = a Y, a R d. We show in the Appendix that f P (x) = k λ c φ (x; a µ c, a Σ c a), (6) c=1 where φ ( x; µ, σ 2) denotes the normal distribution with mean µ and variance σ 2 evaluated at x. With such a simple analytic result, portfolio optimization is straightforward, and simulation (as is required when copula-based methods are used) is not necessary.

Portfolio Construction With µ c = a µ c and σ 2 c = a Σ c a, c = 1,..., k, µ P = E [P] = k λ c µ c, σp 2 = V (P) = c=1 k ( λ c σ 2 c + µ 2 c) µ 2 P. It is common when working with non-gaussian portfolio distributions to use a measure of downside risk, such as the value-at-risk (VaR), or the expected shortfall (ES). c=1

Portfolio Construction The VaR involves the γ-quantile of P, denoted q P,γ, for < γ < 1, with γ typically.1 or.5, and which can be found numerically by using the cdf of P, easily seen to be F P (x) = k c=1 λ cφ ((x µ c )/σ c ), with Φ the standard normal cdf. The γ-level ES of P is given by ES γ (P) = 1 γ γ Q P (p) dp, where Q P is the quantile function of P. We prove in the Appendix that this integral can be represented analytically as ES γ (P) = k j=1 λ j Φ (c j ) γ { } φ (c j ) µ j σ j, c j = q P,γ µ j, q P,γ = Q P (γ). Φ (c j ) σ j

Multivariate Laplace The d-variate multivariate Laplace distribution is given by f Y (y, µ, Σ, b) = where m = (y µ) Σ 1 (y µ). 1 2 ( m ) b/2 d/4 ( ) Kb d/2 2m, (7) Σ 1/2 (2π) d/2 Γ (b) 2 It generalizes several constructs in the literature, but itself is a special case of the multivariate generalized hyperbolic distribution. We write Y Lap (µ, Σ, b).

Mixtures of Multivariate Laplace We say the d-dimensional random variable Y i follows a k-component mixture of multivariate Laplace distributions, or Mix k Lap d, if its distribution is given by f Mixk Lap d (y; M, Ψ, λ, b) = k ( ) λ j f Lap y; µj, Σ j, b j, λ j (, 1), k j=1 λ j = 1, with f Lap denoting the d-variate multivariate Laplace distribution given above, M, Ψ and λ are as with the Mix k Norm d, and b = (b 1,..., b k ). j=1

Mixtures of Multivariate Laplace We observe d-variate random variables Y t = (Y t,1, Y t,2,..., Y t,d ), i.i.d. t = 1,..., T, with Y t Mix k Lap d (M, Ψ, λ, b), and take the values of b = (b 1,..., b k ) to be known constants. Interest centers on estimation of the remaining parameters. This is conducted via an EM algorithm, the derivation of which is given in the Appendix. In addition, the EM recursions are extended there to support use of a quasi-bayesian paradigm analogous to that used for the Mix k N d distribution.

Estimation of the b i in the Mixture of Multivariate Laplace We use the profile likelihood to each component, applied to the split data via the EM algorithm. The details are in the paper. This is not optimal, but adequate. 5.41 Component 1 Log lik (ω=5) Using EM Split 4.15 Component 2 Log lik (ω=5) Using EM Split 5.42 4.2 log likelihood / 1 5.43 5.44 5.45 log likelihood / 1 4.25 4.3 5.46 5.47 1 3 5 7 9 11 13 15 MVL parameter b 4.35 1 3 5 7 9 11 13 15 MVL parameter b

Portfolio Construction for Mix k Lap d Let L Lap (µ, Σ, b) with density (7). Then, for a R d, P = a L Lap (a µ, a Σa, b), which is a special case of the general result for normal mixture distributions, as shown in, e.g., McNeil, Embrechts and Frey (25, p. 76). Now let Y Mix k Lap d (M, Ψ, λ, b) and P = a Y. Then, analogous to the Mix MVN case, and using the same format of proof, we find that f P (x) = k λ c Lap (x; a µ c, a Σ c a, b c ). c=1

Portfolio Construction for Mix k Lap d Similar to (33), we have, with µ c = a µ c and σ 2 c = a Σ c a, c = 1,..., k, µ P = E [P] = k λ c µ c, σp 2 = V (P) = c=1 k ( λ c bc σc 2 + µ 2 c) µ 2 P. A closed-form expression for the expected shortfall is currently not available. Though given the tractable density function of P, and the exponential (not power) tails of the Laplace distribution, numeric integration to compute the relevant quantile, and the integral associated with the expected shortfall, will be fast and reliable. c=1