Convex Optimization for High-Dimensional Portfolio Construction

Similar documents
The lasso, persistence, and cross-validation

Machine Learning for OR & FE

Machine Learning. A. Supervised Learning A.1. Linear Regression. Lars Schmidt-Thieme

Lass0: sparse non-convex regression by local search

Within Group Variable Selection through the Exclusive Lasso

The lasso: some novel algorithms and applications

Combinatorial Optimization: AMS Amitabh Basu. Department of Applied Mathematics and Statistics, Johns Hopkins U.

Convex relaxation for Combinatorial Penalties

MS-C1620 Statistical inference

The lasso: some novel algorithms and applications

Multivariate Normal Models

Multivariate Normal Models

25 : Graphical induced structured input/output models

Joint ICTP-TWAS School on Coherent State Transforms, Time- Frequency and Time-Scale Analysis, Applications.

A New Perspective on Boosting in Linear Regression via Subgradient Optimization and Relatives

Regularization and Variable Selection via the Elastic Net

BAGUS: Bayesian Regularization for Graphical Models with Unequal Shrinkage

The Constrained Lasso

Graphical Model Selection

Sparse Graph Learning via Markov Random Fields

Statistical Inference

OWL to the rescue of LASSO

Lasso: Algorithms and Extensions

arxiv: v1 [math.st] 28 May 2018

Nikhil Rao, Miroslav Dudík, Zaid Harchaoui

Best subset selection via bi-objective mixed integer linear programming

LASSO Review, Fused LASSO, Parallel LASSO Solvers

Hierarchical kernel learning

An Improved 1-norm SVM for Simultaneous Classification and Variable Selection

A Bootstrap Lasso + Partial Ridge Method to Construct Confidence Intervals for Parameters in High-dimensional Sparse Linear Models

Making Our Cities Safer: A Study In Neighbhorhood Crime Patterns

ISyE 691 Data mining and analytics

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 14, 2017

ECE G: Special Topics in Signal Processing: Sparsity, Structure, and Inference

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Optimization methods

Regularization Predicts While Discovering Taxonomy Youssef Mroueh, Tomaso Poggio, and Lorenzo Rosasco

A Survey of L 1. Regression. Céline Cunen, 20/10/2014. Vidaurre, Bielza and Larranaga (2013)

A General Framework for Variable Selection in Linear Mixed Models with Applications to Genetic Studies with Structured Populations

SUPPORT VECTOR MACHINE

Exclusive Lasso for Multi-task Feature Selection

Identifying Financial Risk Factors

Learning with Sparsity Constraints

A Blockwise Descent Algorithm for Group-penalized Multiresponse and Multinomial Regression

Machine Learning. Regularization and Feature Selection. Fabio Vandin November 13, 2017

High-dimensional Statistics

Regularization Paths. Theme

Regularization Paths

Computational and Statistical Tradeoffs via Convex Relaxation

Robust Inverse Covariance Estimation under Noisy Measurements

Graph Wavelets to Analyze Genomic Data with Biological Networks

The Illusion of Independence: High Dimensional Data, Shrinkage Methods and Model Selection

Interplay between Statistics and Optimization

Stochastic Second-Order Method for Large-Scale Nonconvex Sparse Learning Models

Large Sample Theory For OLS Variable Selection Estimators

A New Combined Approach for Inference in High-Dimensional Regression Models with Correlated Variables

Fast Regularization Paths via Coordinate Descent

Sparse Bayesian Logistic Regression with Hierarchical Prior and Variational Inference

Pathwise coordinate optimization

Machine Learning for Big Data CSE547/STAT548, University of Washington Emily Fox February 4 th, Emily Fox 2014

High-dimensional Statistical Models

SVRG++ with Non-uniform Sampling

Least Angle Regression, Forward Stagewise and the Lasso

Robust methods and model selection. Garth Tarr September 2015

A SURVEY OF SOME SPARSE METHODS FOR HIGH DIMENSIONAL DATA

The picasso Package for Nonconvex Regularized M-estimation in High Dimensions in R

Discriminative Feature Grouping

arxiv: v3 [stat.ml] 14 Apr 2016

Generalized Linear Models in Collaborative Filtering

Raninen, Elias; Ollila, Esa Optimal Pooling of Covariance Matrix Estimates Across Multiple Classes

... SPARROW. SPARse approximation Weighted regression. Pardis Noorzad. Department of Computer Engineering and IT Amirkabir University of Technology

Comparisons of penalized least squares. methods by simulations

A note on the group lasso and a sparse group lasso

High-dimensional covariance estimation based on Gaussian graphical models

Generalized Elastic Net Regression

Non-linear Supervised High Frequency Trading Strategies with Applications in US Equity Markets

DATA MINING AND MACHINE LEARNING

Coordinate descent. Geoff Gordon & Ryan Tibshirani Optimization /

CS 4491/CS 7990 SPECIAL TOPICS IN BIOINFORMATICS

Sparse Additive machine

Tractable Upper Bounds on the Restricted Isometry Constant

Linear Model for Turning Al6061 using Least Absolute Shrinkage and Selection Operator (Lasso)

Sparse representation classification and positive L1 minimization

High-dimensional regression modeling

Linear Regression with Strongly Correlated Designs Using Ordered Weigthed l 1

Recent Advances in Structured Sparse Models

STANDARDIZATION AND THE GROUP LASSO PENALTY

Regression Shrinkage and Selection via the Lasso

Sparse regression. Optimization-Based Data Analysis. Carlos Fernandez-Granda

Lecture 9: September 28

Gaussian Graphical Models and Graphical Lasso

High-dimensional graphical model selection: Practical and information-theoretic limits

Chapter 3. Linear Models for Regression

Chapter 17: Undirected Graphical Models

Penalized versus constrained generalized eigenvalue problems

Hierarchical Penalization

Boosting Methods: Why They Can Be Useful for High-Dimensional Data

A Study of Relative Efficiency and Robustness of Classification Methods

The Adaptive Lasso and Its Oracle Properties Hui Zou (2006), JASA

Machine Learning for OR & FE

Transcription:

Convex Optimization for High-Dimensional Portfolio Construction R/Finance 2017 R. Michael Weylandt 2017-05-20 Rice University

In the beginning... 2

The Markowitz Problem Given µ, Σ minimize w T Σw λµ T w 3

The Markowitz Problem Given µ, Σ minimize w T Σw λµ T w Sample estimates: ˆΣ = R T R/n ˆµ = R T 1 n /n 3

The Markowitz Problem Given µ, Σ minimize w T Σw λµ T w Sample estimates: ˆΣ = R T R/n ˆµ = R T 1 n /n minimize w T (R T R/n)w λw T R T 1 n /n 3

The Markowitz Problem Given µ, Σ minimize w T Σw λµ T w Sample estimates: ˆΣ = R T R/n ˆµ = R T 1 n /n minimize w T (R T R/n)w λw T R T 1 n /n Simplify and factor: w T (R T R/n)w λw T R T 1/n = w T (R T R/n)w λw T R T 1 n /n = w T R T Rw 2w T R T (1 n λ/2) + (1 n λ/2) T (1 n λ/2) = (1 n λ/2) Rw 2 2 3

Linear Regression Connection arg min w R p (1 n λ/2) Rw 2 2 4

Linear Regression Connection arg min w R p (1 n λ/2) Rw 2 2 = Ordinary Least Squares (OLS) arg min β R p y Xβ 2 2 4

Linear Regression Connection arg min w R p (1 n λ/2) Rw 2 2 = Ordinary Least Squares (OLS) We know a lot about OLS! arg min β R p y Xβ 2 2 4

Constrained Portfolio Optimization Portfolio constraints: concentration, diversification, transaction cost management, etc. 5

Constrained Portfolio Optimization Portfolio constraints: concentration, diversification, transaction cost management, etc. Example: invest in at most K shares 5

Constrained Portfolio Optimization Portfolio constraints: concentration, diversification, transaction cost management, etc. Example: invest in at most K shares arg min (1 n λ/2) Rw 2 2 w R p : w 0 K 5

Constrained Portfolio Optimization Portfolio constraints: concentration, diversification, transaction cost management, etc. Example: invest in at most K shares NP-Hard arg min (1 n λ/2) Rw 2 2 w R p : w 0 K 5

Best Subsets and the Lasso Statistical Perspective : arg min (1 n λ/2) Rw 2 2 arg min y Xβ 2 2 w R p : w 0 K β R p : β 0 K 6

Best Subsets and the Lasso Statistical Perspective : arg min (1 n λ/2) Rw 2 2 arg min y Xβ 2 2 w R p : w 0 K β R p : β 0 K Portfolio Optimization: Find the portfolio which has the best risk/return trade-off using at most K assets Statistics: Find the statistical model which fits my data best using at most K variables 6

Best Subsets and the Lasso Statistical Perspective : arg min (1 n λ/2) Rw 2 2 arg min y Xβ 2 2 w R p : w 0 K β R p : β 0 K Portfolio Optimization: Find the portfolio which has the best risk/return trade-off using at most K assets Statistics: Find the statistical model which fits my data best using at most K variables Impossible to solve exactly for reasonable p 6

Best Subsets and the Lasso Statistical Perspective : arg min (1 n λ/2) Rw 2 2 arg min y Xβ 2 2 w R p : w 0 K β R p : β 0 K Portfolio Optimization: Find the portfolio which has the best risk/return trade-off using at most K assets Statistics: Find the statistical model which fits my data best using at most K variables Impossible to solve exactly for reasonable p State of the art MIO [BKM16] solves p 1000 in hours 6

Best Subsets and the Lasso Statistical Perspective : arg min (1 n λ/2) Rw 2 2 arg min y Xβ 2 2 w R p : w 0 K β R p : β 0 K Portfolio Optimization: Find the portfolio which has the best risk/return trade-off using at most K assets Statistics: Find the statistical model which fits my data best using at most K variables Impossible to solve exactly for reasonable p State of the art MIO [BKM16] solves p 1000 in hours In statistics, the convex relaxation of this problem is known as the Lasso [Tib96] and it has been hugely successful. 6

Sparsity arg min w R p (1 n λ/2) }{{}}{{} R y R n 2 }{{} w + γ w l1 X R n p β R p 2 Tune λ to change risk-return trade-off and γ to change sparsity level 7

Results Problem first considered by [BDM + 09] (Figure from [BDM + 09]) 8

Intuition Why does this work? 9

Intuition Why does this work? Sharp corners give sparsity [CRPW12] 9

Intuition Why does this work? Sharp corners give sparsity [CRPW12] Lasso can be prediction consistent without being model selection consistent [GR04, Cha13] 9

Intuition Why does this work? Sharp corners give sparsity [CRPW12] Lasso can be prediction consistent without being model selection consistent [GR04, Cha13] Controls over-fitting (degrees of freedom) from search [ZHT07, Tib15] 9

Further Structure Statisticians have developed many Lasso-like constraints which enforce different structures: Goal Relaxed Constraint Portfolio Analogue Select at most K variables Lasso [Tib96] Invest in at most K assets Select variables from at most G groups Group Lasso [YL06] Invest in at most G asset classes Select variables from at most B overlapping groups Group Lasso with Overlap [JOV09] Trade against at most B brokers Select variables with at least one in each of D groups Exclusive Lasso [ZJH10, CA15] Diversify over at least D asset classes as well as highly-efficient methods for solving the corresponding optimization problems (e.g., [FHT10]). See [HTF09], [JWHT13], or [HTW15] for details. 10

Acknowledgments On ArXiV soon 11

Acknowledgments On ArXiV soon Funding: NSF/DMS 1547433: RTG: Cross-Training in Statistics and Computer Science at Rice University; Ken Kennedy Institute for Information Technology (K2I) Computer Science & Engineering Enhancement Fellowship; K2I 2015/16 BP Graduate Fellowship. 11

References

References I [BDM + 09] Joshua Brodie, Ingrid Daubechies, Christine De Mol, Domenico Giannone, and Ignace Loris. Sparse and stable Markowitz portfolios. Proceedings of the National Academy of Sciences of the United STates of America, 106(30):12267 12272, 2009. http://www.pnas.org/content/106/30/12267.full. [BKM16] [CA15] Dmitris Bertsimas, Angela King, and Rahul Mazumder. Best subset selection via a modern optimization lens. Annals of Statistics, 44(2):813 852, 2016. https://projecteuclid.org/euclid.aos/1458245736. Frederick Campbell and Genevera I. Allen. Within group variable selection using the exclusive lasso, 2015. ArXiv 1505.07517. 13

References II [Cha13] Sourav Chatterjee. Assumptionless consistency of the lasso, 2013. ArXiv 1303.5817. [CRPW12] Venkat Chandrasekaran, Benjamin Recht, Pablo A. Parrilo, and Alan S. Wilsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805 849, 2012. [FHT10] Jerome H. Friedman, Trevor Hastie, and Rob Tibshirani. Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software, 33:1 22, 2010. http://www.jstatsoft.org/v33/i01/paper; CRAN: http: //cran.r-project.org/web/packages/glmnet/index.html. 14

References III [GR04] [HTF09] Eitan Greenshtein and Ya acov Ritov. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli, 10(6):971 988, 2004. https://projecteuclid.org/euclid.bj/1106314846. Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. Springer, 2nd edition, February 2009. http://statweb.stanford.edu/~tibs/elemstatlearn/. 15

References IV [HTW15] [JOV09] Trevor Hastie, Robert Tibshirani, and Martin Wainwright. Statistical Learning with Sparsity: the Lasso and Generalizations. Monographs on Statistics and Applied Probability. CRC Press, 1st edition, May 2015. http://web.stanford.edu/~hastie/statlearnsparsity/. Laurent Jacob, Guillaume Obozinski, and Jean-Philippe Vert. Group lasso with overlap and graph lasso. In Andrea Danyluk, Leon Bottou, and Michael Littman, editors, ICML 09: Proceedings of the 26th Annual International Conference on Machine Learning. ACM, 2009. http://www.machinelearning.org/archive/icml2009/papers/ 471.pdf. 16

References V [JWHT13] [Tib96] [Tib15] Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. An Introduction to Statistical Learning. Springer Series in Statistics. Springer, 1st edition, 2013. http://www-bcf.usc.edu/~gareth/isl/. Robert Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society, Series B (Methodological), 58(1):267 288, 1996. http://www.jstor.org/stable/2346178. Ryan J. Tibshirani. Degrees of freedom and model search. Statistica Sinica, 25:1265 1296, 2015. doi:10.5705/ss.2014.147. 17

References VI [YL06] [ZHT07] Ming Yuan and Yi Lin. Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B (Methodological), 68(1):49 67, 2006. doi:10.1111/j.1467-9868.2005.00532.x. Hui Zou, Trevor Hastie, and Robert Tibshirani. On the degrees of freedom of the lasso. Annals of Statistics, 35(5):2173 2192, 2007. https://projecteuclid.org/euclid.aos/1194461726. 18

References VII [ZJH10] Yang Zhou, Rong Jin, and Steven C.H. Hoi. Exclusive lasso for multi-task feature selection. In Yee Whye Teh and Mike Titterington, editors, AISTATS 2010: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics. JMLR, 2010. http://proceedings.mlr.press/v9/zhou10a.html. 19