A Minimax Theorem with Applications to Machine Learning, Signal Processing, and Finance

Similar documents
A Minimax Theorem with Applications to Machine Learning, Signal Processing, and Finance

Robust Fisher Discriminant Analysis

Robust Efficient Frontier Analysis with a Separable Uncertainty Model

Chance constrained optimization

Applications of Robust Optimization in Signal Processing: Beamforming and Power Control Fall 2012

4. Convex optimization problems

4. Convex optimization problems

Convex Optimization in Classification Problems

CS295: Convex Optimization. Xiaohui Xie Department of Computer Science University of California, Irvine

Convex optimization problems. Optimization problem in standard form

Portfolios that Contain Risky Assets 17: Fortune s Formulas

Introduction to Convex Optimization

Optimal Investment Strategies: A Constrained Optimization Approach

Robust Rayleigh quotient minimization and nonlinear eigenvalue problems

Markowitz Efficient Portfolio Frontier as Least-Norm Analytic Solution to Underdetermined Equations

Robust portfolio selection under norm uncertainty

CSCI 1951-G Optimization Methods in Finance Part 10: Conic Optimization

Lecture: Convex Optimization Problems

EE 227A: Convex Optimization and Applications April 24, 2008

Duality Methods in Portfolio Allocation with Transaction Constraints and Uncertainty

Modern Portfolio Theory with Homogeneous Risk Measures

A Brief Review on Convex Optimization

8. Geometric problems

Gaussian Estimation under Attack Uncertainty

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Convex Optimization of Graph Laplacian Eigenvalues

Robust Optimization: Applications in Portfolio Selection Problems

Bayesian Optimization

Distributionally robust optimization techniques in batch bayesian optimisation

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. January 25, 2012

MFM Practitioner Module: Risk & Asset Allocation. John Dodson. February 18, 2015

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Minimax Problems. Daniel P. Palomar. Hong Kong University of Science and Technolgy (HKUST)

A Second order Cone Programming Formulation for Classifying Missing Data

Mathematics for Economics and Finance

FE670 Algorithmic Trading Strategies. Stevens Institute of Technology

Convex Optimization of Graph Laplacian Eigenvalues

8. Geometric problems

Nonlinear Programming Models

minimize x x2 2 x 1x 2 x 1 subject to x 1 +2x 2 u 1 x 1 4x 2 u 2, 5x 1 +76x 2 1,

Lecture: Examples of LP, SOCP and SDP

Disciplined Convex Programming

ELE539A: Optimization of Communication Systems Lecture 15: Semidefinite Programming, Detection and Estimation Applications

Minimum Error Rate Classification

3. Convex functions. basic properties and examples. operations that preserve convexity. the conjugate function. quasiconvex functions

Subgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives

Maximum likelihood estimation

Robustness and bootstrap techniques in portfolio efficiency tests

Lecture 2: Convex functions

Convex Optimization M2

Thomas Knispel Leibniz Universität Hannover

Convex Optimization & Lagrange Duality

7. Statistical estimation

Convex Functions. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Second-order cone programming formulation for two player zero-sum game with chance constraints

A General Projection Property for Distribution Families

Semidefinite and Second Order Cone Programming Seminar Fall 2012 Project: Robust Optimization and its Application of Robust Portfolio Optimization

Machine Learning. Support Vector Machines. Manfred Huber

Convex optimization examples

ROBUST LINEAR DISCRIMINANT ANALYSIS WITH A LAPLACIAN ASSUMPTION ON PROJECTION DISTRIBUTION

WORST-CASE VALUE-AT-RISK AND ROBUST PORTFOLIO OPTIMIZATION: A CONIC PROGRAMMING APPROACH

Stochastic Subgradient Methods

New Rank-One Matrix Decomposition Techniques and Applications to Signal Processing

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

1. Introduction. mathematical optimization. least-squares and linear programming. convex optimization. example. course goals and topics

Mathematical Optimization Models and Applications

LECTURE 9 LECTURE OUTLINE. Min Common/Max Crossing for Min-Max

Characterizing Robust Solution Sets of Convex Programs under Data Uncertainty

The Asymptotic Theory of Transaction Costs

WE consider an array of sensors. We suppose that a narrowband

Mathematical Foundations -1- Convexity and quasi-convexity. Convex set Convex function Concave function Quasi-concave function Supporting hyperplane

On deterministic reformulations of distributionally robust joint chance constrained optimization problems

Convex Optimization M2

Robust Portfolio Selection Based on a Joint Ellipsoidal Uncertainty Set

Robust Predictions in Games with Incomplete Information

Lecture Note 5: Semidefinite Programming for Stability Analysis

Optimization Problems with Probabilistic Constraints

Distributionally Robust Convex Optimization

Portfolio Selection under Model Uncertainty:

Gaussian discriminant analysis Naive Bayes

HW1 solutions. 1. α Ef(x) β, where Ef(x) is the expected value of f(x), i.e., Ef(x) = n. i=1 p if(a i ). (The function f : R R is given.

5. Duality. Lagrangian

Exploring Granger Causality for Time series via Wald Test on Estimated Models with Guaranteed Stability

IE 521 Convex Optimization

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

44 CHAPTER 2. BAYESIAN DECISION THEORY

Chapter 1. Introduction. 1.1 Problem statement and examples

Robust Markowitz portfolio selection. ambiguous covariance matrix

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Mathematical Economics: Lecture 16

LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE

Convex Optimization: Applications

Risk Management of Portfolios by CVaR Optimization

Robust portfolio selection based on a joint ellipsoidal uncertainty set

Ambiguity in portfolio optimization

6.079/6.975 S. Boyd & P. Parrilo December 10 11, Final exam

Robust Novelty Detection with Single Class MPM

Quadratic Two-Stage Stochastic Optimization with Coherent Measures of Risk

Notes on Recursive Utility. Consider the setting of consumption in infinite time under uncertainty as in

Robust Portfolio Selection Based on a Multi-stage Scenario Tree

Transcription:

A Minimax Theorem with Applications to Machine Learning, Signal Processing, and Finance Seung Jean Kim Stephen Boyd Alessandro Magnani MIT ORC Sear 12/8/05

Outline A imax theorem Robust Fisher discriant analysis Robust matched filtering An extension Worst-case Sharpe ratio maximization MIT ORC Sear 12/8/05 1

A fractional function f(a, B, x) = at x xt Bx, a, x Rn, B S n ++ f is homogeneous, quasiconcave in x provided a T x 0: for γ 0, { {x f(a, B, x) γ} = x } γ xt Bx a T x is convex (a second-order cone) f is quasiconvex in (a, B) provided a T x 0: for γ 0, {(a, B) f(a, B, x) γ} = { (a, B) } γ xt Bx a T x is convex (since x T Bx is concave in B) f maximized over x by x = B 1 a; optimal value a T B 1 a MIT ORC Sear 12/8/05 2

A Rayleigh quotient f(a, B, x) 2 is Rayleigh quotient of (aa T, B) evaluated at x: f(a, B, x) 2 = (xt a) 2 x T Bx convex in (a, B) not quasiconcave in x MIT ORC Sear 12/8/05 3

Interpretation via Gaussian suppose z N (a, B), and x R n z T x N (a T x, x T Bx), so Prob(z T x 0) = Φ ( a T ) x xt Bx x = B 1 a gives hyperplane through origin that maximizes probability of z being on one side ( ) maximum probability is Φ at B 1 a a T B 1 a measures how well a linear function can discriate z from zero MIT ORC Sear 12/8/05 4

A picture expand confidence ellipsoid until it touches origin tangent is plane that maximizes Prob(z T x 0) PSfrag replacements x 0 MIT ORC Sear 12/8/05 5

Interpretation via Chebyshev bound suppose E z = a, E(z a) T (z a) = B (otherwise arbitrary) E z T x = a T x, E(z T x a T x) 2 = x T Bx, so by Chebyshev bound Prob(z T x 0) Ψ ψ is increasing (bound is sharp) ( a T ) x, Ψ(u) = u2 + xt Bx 1 + u 2 + x = B 1 a gives hyperplane through origin that maximizes Chebyshev lower bound for Prob(z T x 0) maximum value of Chebyshev lower bound is a T B 1 a 1 + a T B 1 a MIT ORC Sear 12/8/05 6

Worst-case discriation probability analysis uncertain statistics: (a, B) U R n \{0} S n ++ (convex and compact) for fixed x, find worst-case statistics: where z N (a, B) imize Prob(z T x > 0) subject to (a, B) U MIT ORC Sear 12/8/05 7

Worst-case discriation probability analysis worst-case discriation probability analysis is equivalent to imize subject to a T x xt Bx (a, B) U if optimal value is positive, i.e., a T x > 0 for all (a, B) U, equivalent to convex problem (a T x) 2 imize x T Bx subject to (a, B) U if optimal value is negative, can solve via bisection, convex optimization MIT ORC Sear 12/8/05 8

Worst-case discriation probability maximization find x that maximizes worst-case discriation probability (a,b) U Prob(zT x 0) equivalent to finding x that maximizes (a,b) U a T x xt Bx not concave in x; not clear how to solve studied in context of robust signal processing in 1980s (Poor, Verdú,... ) for some specific Us MIT ORC Sear 12/8/05 9

Weak imax property max x 0 (a,b) U x T a xt Bx max (a,b) U x 0 x T a xt Bx LHS: worst-case discriation probability maximization problem RHS can be evaluated via convex optimization: max (a,b) U x 0 x T a xt Bx = (a,b) U at B 1 a = [ (a,b) U at B 1 a ] 1/2 finds statistics hardest to discriate from zero MIT ORC Sear 12/8/05 10

Strong imax property in fact, we have equality: max x 0 (a,b) U x T a xt Bx = max (a,b) U x 0 x T a xt Bx (and common value is 0) equivalently, with z N (a, B), max x 0 (a,b) U Prob(zT x 0) = max (a,b) U x 0 Prob(zT x 0) (and common value is 1/2) MIT ORC Sear 12/8/05 11

Computing saddle point via convex optimization find least favorable statistics (a, B ), i.e., imize a T B 1 a over (a, B) U (x, a, B ) with x = B 1 a is saddle point: x T a xt B x x T a x T B x x T a x 0, (a, B) U x T Bx, x solves worst-case discriation probability maximization problem (a, B ) is worst-case statistics for x MIT ORC Sear 12/8/05 12

Interpretation of least favorable statistics find statistics with imum Mahalanobis distance between mean and zero: a T B 1 a = (a,b) U at B 1 a PSfrag replacements 0 MIT ORC Sear 12/8/05 13

Proof via convex analysis let (a, B ) be least favorable: (a,b) U at B 1 a = a T B 1 a by imax inequality max x 0 (a,b) U x T a xt Bx max (a,b) U x 0 x T a xt Bx = a T B 1 a x = B 1 a gives lower bound on a T B 1 a : (a,b) U x T a x T Bx max x 0 (a,b) U x T a xt Bx a T B 1 a if (a,b) U x T a x T Bx = x T a x T B x = a T B 1 a, then imax equality must hold MIT ORC Sear 12/8/05 14

(a, B ) is optimal for (a,b) U a T B 1 a, so 2x T (a a ) x T (B B )x 0, (a, B) U with x = B 1 a (since z imizes convex differentiable fct. f over convex set C iff f(z ) T (z z ) 0 z C) we conclude (a, B ) is optimal for (a,b) U (x T a) 2 /x T Bx, since its optimality condition is also 2x T (a a ) x T (B B )x 0, (a, B) U (a, B ) must be optimal for (a,b) U x T a/(x T Bx ) 1/2, since (a,b) U x T a/(x T Bx ) 1/2 > 0 MIT ORC Sear 12/8/05 15

Fisher linear discriant analysis x N (µ x, Σ x ), y N (µ y, Σ y ) find w that maximizes Prob(w T x > w T y) x y N (µ x µ y, Σ x + Σ y ), so ( ) Prob(w T x > w T w T (µ x µ y ) y) = Φ wt (Σ x + Σ y )w equivalent to maximizing w T (µ x µ y ) wt (Σ x + Σ y )w = f(µ x µ y, Σ x + Σ y, w) proposed by Fisher in 1930s MIT ORC Sear 12/8/05 16

Worst-case Fisher linear discriation uncertain statistics: (µ x, µ y, Σ x, Σ y ) V R n R n S n ++ S n ++ convex and compact; µ x µ y for each (µ x, µ y, Σ x, Σ y ) V worst-case discriation probability (for fixed w): (µ x,µ y,σ x,σ y ) V Prob(wT x > w T y) can compute via convex optimization MIT ORC Sear 12/8/05 17

Robust Fisher linear discriant analysis worst-case discriation probability maximization: find w that maximizes worst-case discriation probability equivalent to maximizing (over w) (µ x,µ y,σ x,σ y ) V Prob(wT x > w T y) (µ x,µ y,σ x,σ y ) V w T (µ x µ y ) wt (Σ x + Σ y )w our max- problem, with a = µ x µ y, B = Σ x + Σ y U = {(µ x µ y, Σ x + Σ y ) (µ x, µ y, Σ x, Σ y ) V} MIT ORC Sear 12/8/05 18

Example x, y R 2 ; only µ y is uncertain, µ y E PSfrag replacements -3 0 3-2 0 2 E µ x µ y 0 robust optimal noal optimal MIT ORC Sear 12/8/05 19

Results P nom P wc noal optimal 0.99 0.87 robust optimal 0.98 0.91 P nom : noal discriation probability P wc : worst-case discriation probability MIT ORC Sear 12/8/05 20

Matched filtering y(t) = s(t)a + v(t) R n s(t) R is desired signal; y(t) R n is received signal v(t) N (0, Σ) is noise filtered output with weight vector w R n : z(t) = w T y(t) = s(t)w T a + w T v(t) standard matched filtering: choose w to maximize (amplitude) signal to noise ratio (SNR) w T a wt Σw MIT ORC Sear 12/8/05 21

Robust matched filtering uncertain data: (a, Σ) U R n \{0} S n ++ (convex and compact) worst-case SNR for fixed w: (a,σ) U w T a wt Σw robust matched filtering: find w that maximizes worst-case SNR can be solved via convex optimization & imax theorem for x T a/ x T Bx MIT ORC Sear 12/8/05 22

Example a = (2, 3, 2, 2) is fixed (no uncertainty) uncertain Σ has form 1 + 1? + 1? 1 + means Σ ij [0, 1], means Σ ij [ 1, 0]? means Σ ij [ 1, 1] (and of course Σ 0) we take noal noise covariance as Σ = 1.5.5.5 1 0.5 1 0 1 MIT ORC Sear 12/8/05 23

Results noal SNR worst-case SNR noal optimal 5.5 3.0 robust optimal 4.9 3.6 least favorable covariance: 1 0.38.12 1.41.74 1.23 1 MIT ORC Sear 12/8/05 24

Extension of imax theorem U convex, compact; X convex cone max x X (a,b) U a T x xt Bx = max (a,b) U w X a T x xt Bx provided there exists x X s.t. a T x > 0 for all a with (a, B) U, i.e., LHS > 0 MIT ORC Sear 12/8/05 25

Computing saddle point via convex optimization convexity of -max problem: max (a,b) U x X X is dual cone a T x xt Bx = [ (a + λ) T B 1 (a + λ) ] 1/2 (a,b) U λ X [ = (a,b) U, λ X (a + λ)t B 1 (a + λ) if (a, B, λ ) is optimal for -max problem, (x, a, B ) with x = B 1 (a + λ ) is a saddle point: x T a xt B x x T a x T B x ] 1/2 x T a x X, (a, B) U x T Bx, MIT ORC Sear 12/8/05 26

A key lemma (a T x) 2 max x X x T Bx = λ X (a + λ)t B 1 (a + λ) using homogeneity of (a T x) 2 /x T Bx, write LHS as by convex duality max x X x T Bx = x X, a T x=1 (a T x) 2 x T Bx = [ x T Bx x X, a T x=1 ] 1 [ max (1/4)( λ + µa) T B 1 ( λ + µa) + µ ] λ X, µ R defining λ = λ/µ and optimizing over µ, RHS becomes x T 1 Bx = max x X, a T x=1 λ X (a + λ) T B 1 (a + λ) MIT ORC Sear 12/8/05 27

Asset allocation n risky assets with (single period) returns a N (µ, Σ) portfolio w W (convex); 1 T w = 1 for all w W w T a N (w T µ, w T Σw), so probability of beating risk-free asset with return µ rf is ( µ Prob(a T T ) w µ rf w > µ rf ) = Φ wt Σw maximized by w W that maximizes Sharpe ratio µt w µ rf wt Σw (called tangency portfolio w tp ) we re only interested in case when maximum Sharpe ratio 0 MIT ORC Sear 12/8/05 28

Interpretation via Chebyshev bound suppose E a = µ, E (a µ) T (a µ) = Σ (otherwise arbitrary) E a T w = µ T x, E (a T x µ T x) 2 = w T Σw, so by Chebyshev bound, Prob(a T w µ rf ) Ψ increasing (bound is tight) maximized by tangency portfolio ( µ T w µ rf wt Σw ), Ψ(u) = u2 + 1 + u 2 + MIT ORC Sear 12/8/05 29

PSfrag replacements CML EF tangency portfolio µ T w maximum Sharpe ratio µ rf wt Σw EF: efficient frontier; CML: capital market line MIT ORC Sear 12/8/05 30

Asset allocation with a risk-free asset affine combination of risk-free asset and risky portfolio w W: [ ] (1 θ)w x = θ θ is amount of risk-free asset Tobin s two-fund separation theorem: risk s and return r of any x = ((1 θ)w, θ) with w W cannot lie above capital market line (CML) r = µ rf + S s, S = max w W µ T w µ rf wt Σw CML is formed by affine combinations of two portfolios: tangency portfolio (w tp, 0) and risk-free portfolio (0, 1) MIT ORC Sear 12/8/05 31

Worst-case Sharpe ratio uncertain statistics: (µ, Σ) V R n S n ++ (convex and compact) for fixed w W, find worst-case statistics by computing worst-case SR (µ,σ) V µ T w µ rf wt Σw = (µ,σ) V (µ µ rf 1) T w wt Σw... can be computed via convex optimization related to worst-case probability of beating risk-free asset ( (µ,σ) V Prob(aT w > µ rf ) = Φ (µ,σ) V µ T w µ rf wt Σw ) MIT ORC Sear 12/8/05 32

Worst-case Sharpe ratio maximization find portfolio that maximizes worst-case SR: maximize subject to (µ,σ) V w W µ T w µ rf wt Σw this portfolio (called robust tangency portfolio w rtp ) maximizes worst-case probability of beating risk-free asset (µ,σ) V Prob(wT a > µ rf ), a N (µ, Σ) worst-case Chebyshev lower bound for Prob(a T w µ rf ) (µ,σ) V inf{prob(at w µ rf ) E a = µ, E (a µ) T (a µ) = Σ} MIT ORC Sear 12/8/05 33

Solution via imax 1 T w = 1 for all w W so µ T w µ rf wt Σw = (µ µ rf1) T w wt Σw, w W can use imax theorem for a T x/ x T Bx, with a = µ µ rf 1, B = Σ, X = RW U = {(µ µ rf 1, Σ) (µ, Σ) V} MIT ORC Sear 12/8/05 34

Minimax result for Sharpe ratio suppose there exists portfolio w s.t. µ T w > µ rf for all (µ, Σ) V strong imax property: 0 < max w W (µ,σ) V µ T w µ rf wt Σw = (µ,σ) V max w W µ T w µ rf wt Σw saddle point (w, µ, Σ ) can be computed via convex optimization µ T w µ rf wt Σ w µ T w µ rf w T Σ w µt w µ rf, w W, (µ, Σ) V w T Σw MIT ORC Sear 12/8/05 35

Saddle-point property PSfrag replacements EF for (µ, Σ ) P(w ) ` w T Σ w, µ T w µ T w µ rf P(w ) = wt Σw {(µ T w, w T Σw ) (µ, Σ) V } is risk-return set of w MIT ORC Sear 12/8/05 36

PSfrag replacements Example 7 risky assets with noal returns µ i, variances σ i 2, correlation matrix Ω 10% µi µ rf = 3 1 2 3 4 5 6 7 Ω = 2 6 4 0 0 σ i 30% 1.00.07.12.43.11.24.25 1.00.73.14.09.28.10 1.00.14.5.52.13 1.00.04.35.38 1.00.7.04 1.00.09 1.00 3 7 5 MIT ORC Sear 12/8/05 37

Example total short position is limited to 30% of total long position mean uncertainty model: 1 T µ 1 T µ 0.1 1 T µ, µ i µ i 0.2 µ i, i = 1,..., 6 covariance uncertainty model: Σ Σ F 0.1 Σ F, Σ ij Σ ij 0.2 Σ ij, i, j = 1,..., 6 Σ is noal covariance MIT ORC Sear 12/8/05 38

Results noal SR worst-case SR noal MP 1.56 0.56 robust MP 1.23 0.77 P nom P wc noal MP 0.92 0.77 robust MP 0.89 0.71 P nom : probability of beating risk-free asset with noal statistics ( µ, Σ) P wc : probability of beating risk-free asset with worst-case statistics MIT ORC Sear 12/8/05 39

Two-fund separation under model uncertainty robust two-fund separation theorem: risk-return set of any x = ((1 θ)w, θ) with w W cannot lie above robust capital market line (RCML) r = µ rf + S s µ T w µ rf wt Σw = with slope S = max max w W (µ,σ) V (µ,σ) V w W RCML is formed by affine combinations of two portfolios: µ T w µ rf wt Σw robust tangency portfolio (w rtp, 0), risk-free portfolio (0, 1) for V = {(µ, Σ)}, reduces to Tobin s two-fund separation theorem MIT ORC Sear 12/8/05 40

Two-fund separation under model uncertainty P(1.5w, 0.5) P(w, 0) PSfrag replacements return P(0, 1) = {(0, µ rf )} P(0.5w, 0.5) S P((1 θ)w, θ) risk P((1 θ)w, θ) is risk-return set of ((1 θ)w, θ) MIT ORC Sear 12/8/05 41

Summary & comments the function f(a, B, x) = at x xt Bx comes up alot what was known (??): simple imax result, no constraints on x, product form for U what is new (??): general imax theorem; convex optimization method for computing saddle point imax theorem for TrAX TrBX imax theorem for xt Ax x T Bx holds (with conditions) generally false MIT ORC Sear 12/8/05 42

References Kim and Boyd, Two-fund separation under model uncertainty, in preparation Kim, Magnani, and Boyd, Robust Fisher discriant analysis, www.stanford.edu/~boyd/research.html Verdú and Poor, On Minimax Robustness: A General Approach and Applications, IEEE Transactions on Information Theory, 30(2): 328-340, 1984 Sion, On general imax theorems, Pacific Journal of Mathematics, 8(1): 171-176, 1958 MIT ORC Sear 12/8/05 43