Surrogate Risk Consistency: the Classification Case
|
|
- Rosa McDaniel
- 5 years ago
- Views:
Transcription
1 Chapter 11 Surrogate Risk Consistency: the Classification Case I. The setting: supervised prediction problem (a) Have data coming in pairs (X,Y) and a loss L : R Y R (can have more general losses) (b) Often, it is hard to minimize L (for example, if L is non-convex), so we use a surrogate ϕ (c) We would like to compare the risks of functions f : X R: R ϕ (f) := E[ϕ(f(X),Y)] and R(f) := E[L(f(X),Y)] In particular, when does minimizing the surrogate give minimization of the true risk? (d) Our goal: when we define the Bayes risks R ϕ and R Definition 11.1 (Fisher consistency). We say the loss ϕ is Fisher consistent if for any sequence of functions f n Rϕ(fn) R implies R(f n ) R II. Classification case (a) We focus on the binary classification case so that Y { 1,1} 1. Margin-based losses: predict sign correctly, so for α R, ϕ L(α,y) = 1{αy 0} and ϕ(α,y) = (yα). 2. Consider conditional version of risks. Let η(x) = P(Y = 1 X = x) be conditional probability, then and R(f) = E[1{f(X)Y 0}] = P(sign(f(X)) Y) = E[η(X)1{f(X) 0}+(1 η(x))1{f(x) 0}] = E[l(f(X),η(X))] R (f) = E[(Yf(X))] = E[η(X)(f(X))+(1 η(x))( f(x))] = E[l (f(x),η(x))] where we have defined the conditional risks l(α,η) = η1{α 0}+(1 η)1{α 0} and l (α,η) = η(α)+(1 η)( α). 105
2 3. Note the minimizer of l: we have α (η) = sign(η 1/2), and f (X) = sign(η(x) 1/2) minimizes risk R(f) over all f 4. Minimizing f can be achieved pointwise, and we have R = E[inf α l(α,η(x))] and R = E[inf α l (α,η(x))]. (b) Example 11.1 (Exponential loss): Consider the exponential loss, used in AdaBoost (among other settings), which sets (α) = e α. In this case, we have argminl (α,η) = 1 α 2 log η 1 η because α l (α,η) = ηe α +(1 η)e α. Thus f (x) = 1 η(x) 2 log 1 η(x), and this is Fisher consistent. (c) Classification calibration 1. Consider pointwise versions of risk (all that is necessary, turns out) 2. Define the infimal conditional -risks as l (η) := inf l (α,η) and l wrong α (η) := inf l (α,η). α(η 1/2) 0 3. Intuition: if we always have l (η) < lwrong (η) for all η, we should do fine 4. Define the sub-optimality function H : [0,1] R ( ) ( ) H(δ) := l wrong 1+δ 1+δ l. 2 2 Definition The margin-based loss is classification calibrated if H(δ) > 0 for all δ > 0. Equivalently, for any η 1 2, we have l (η) < lwrong (η). 5. Example (Example 11.1 continued): For the exponential loss, we have l wrong { (η) = inf ηe α +(1 η)e α} = e 0 = 1 α(2η 1) 0 while the unconstrained minimal conditional risk is l (η) = η 1 η η η +(1 η) 1 η = 2 η(1 η), so that H(δ) = 1 1 δ δ2. Example 11.2 (Hinge loss): We can also consider the hinge loss, which is defined as (α) = [1 α] +. We first compute the minimizers of the conditional risk; we have l (α,η) = η[1 α] + +(1 η)[1+α] +, whose unique minimizer (for η {0, 1 2,1}) is α(η) = sign(2η 1). We thus have l (η) = 2min{η,1 η} and lwrong (η) = η +(1 η) = 1. We obtain H(δ) = 1 min{1+δ,1 δ} = δ. Comparing to the sub-optimality function for exp-loss, is tighter. 106
3 6. Pictures: use exponential loss, with η and without. (d) Our goal: using classification calibration, find some function ψ such that ψ(r (f) R ) R(f) R, where ψ(δ) > 0 for all δ > 0. Can we get a convex version of H, them maybe use Jensen s inequality to get the results? Turns out we will be able to do this. III. Some necessary asides on convex analysis (a) Epigraphs and closures 1. For a function f, the epigraph epif is the set of points (x,t) such that f(x) t 2. A function f is said to be closed if its epigraph is closed, which for convex f occurs if and only if f is lower semicontinuous (meaning liminf x x0 f(x) f(x 0 )) 3. Note: a one-dimensional closed convex function is continuous Lemma Let f : R R be convex. Then f is continuous on the interior of its domain. (Proof in notes; just give a picture) Lemma Let f : R R be closed convex. Then f is continuous on its domain. 4. The closure of a function f is the function cl f whose epigraph is the closed convex hull of epif (picture) (b) Conjugate functions (Fenchel-Legendre transform) 1. Let f : R d R be an (arbitrary) function. Its conjugate (or Fenchel-Legendre conjugate) is defined to be f (s) := sup{ t,s f(t)}. t (Picture here) Note that we always have f (s)+f(t) s,t, or f(t) s,t f (s) 2. The Fenchel biconjugate is defined to be f (t) = sup s { t,s f (s)} (Picture here, noting that f (t) = s implies f (t) = ts f(t)) 3. In fact, the biconjugate is the largest closed convex function smaller than f: Lemma We have f (x) = sup { a,x b : a,t b f(t) for all t}. a R d,b R Proof Let A R d R denote all the pairs (a,b) minorizing f, that is, those pairs such that f(t) a,t b for all t. Then we have (a,b) A f(t) a,t b for all t b a,t f(t) all t b f (a) and a domf. Thus we obtain the following sequence of equalities: sup { a,t b} = sup{ a,t b : a domf, b f (a)} (a,b) A = sup{ a,t f (a)}. So we have all the supporting hyperplanes to the graph of f as desired. 107
4 4. Other interesting lemma: Lemma Let h be either (i) continuous on [0,1] or (ii) non-decreasing on [0,1]. (And set h(1 + δ) = + for δ > 0.) If h satisfies h(t) > 0 for t > 0 and h(0) = 0, then f(t) = h (t) satisfies f(t) > 0 for any t > 0. (Proof by picture) IV. Classification calibration results: (a) Getting quantitative bounds on risk: define the ψ-transform via (b) Main theorem for today: ψ(δ) := H (δ). (11.0.1) Theorem Let be a margin-based loss function and ψ the associated ψ-transform. Then for any f : X R, Moreover, the following three are equivalent: 1. The loss is classification-calibrated 2. For any sequence δ n [0,1], ψ(r(f) R ) R (f) R. (11.0.2) ψ(δ n ) 0 δ n For any sequence of measurable functions f n : X R, R (f n ) R implies R(f n ) R. 1. Some insights from theorem. Recall examples 11.1 and For both of these, we have that ψ(δ) = H(δ), as H is convex. For the hinge loss, (α) = [1 α] +, we obtain for any f that P(Yf(X) 0) inf f P(Yf(X) 0) E[ [1 Yf(X)] + ] inf f E[ [1 Yf(X)] + ]. On the other hand, for the exponential loss, we have ( ) 1 2 P(Yf(X) 0) infp(yf(x) 0) E[exp( Yf(X))] inf 2 E[exp( Yf(X))]. f f The hinge loss is sharper. 2. Example 11.8 (Regression for classification): What about the surrogate loss 1 2 (f(x) y)2? In the homework, show which margin this corresponds to, and moreover, H(δ) = 1 2 δ2. So regressing on the labels is consistent. (c) Proof of Theorem 11.7 The proof of the theorem proceeds in several parts. 1. We first state a lemma, which follows from the results on convex functions we have already proved. The lemma is useful for several different parts of our proof. Lemma We have the following. a. The functions H and ψ are continuous. 108
5 b. We have H 0 and H(0) = 0. c. If H(δ) > 0 for all δ > 0, then ψ(δ) > 0 for all δ > 0. Because H(0) = 0 and H 0: we have l wrong (1/2) := inf l (α,1/2) = inf l (α,1/2) = l α(1 1) 0 α (1/2), so H(0) = l (1/2) l (1/2) = 0. (It is clear that the sub-optimality gap H 0 by construction.) 2. We begin with the first statement of the theorem, inequality (11.0.2). Consider first the gap (for a fixed margin α) in conditional 0-1 risk, l(α,η) inf α l(α,η) = η1{α 0}+(1 η)1{α 0} η1{η 1/2} (1 η)1{η 1/2} { 0 if sign(α) = sign(η 1 = 2 ) η (1 η) η (1 η) = 2η 1 if sign(α) sign(η 1 2 ). In particular, we obtain that the gap in risks is R(f) R = E[1{sign(f(X)) sign(2η(x) 1)} 2η(X) 1 ]. (11.0.3) Now we use expression (11.0.3) to get an upper bound on R(f) R via the -risk. Indeed, consider the ψ-transform (11.0.1). By Jensen s inequality, we have that ψ(r(f) R ) E[ψ(1{sign(f(X)) sign(2η(x) 1)} 2η(X) 1 )]. Now we recall from Lemma 11.9 that ψ(0) = 0. Thus we have ψ(r(f) R ) E[ψ(1{sign(f(X)) sign(2η(x) 1)} 2η(X) 1 )] = E[1{sign(f(X)) sign(2η(x) 1)}ψ( 2η(X) 1 )] (11.0.4) Now we use the special structure of the suboptimality function we have constructed. Note that ψ H, and moreover, we have for any α R that [ ] 1{sign(α) sign(2η 1)}H( 2η 1 ) = 1{sign(α) sign(2η 1)} inf l (α,η) l (η) α(2η 1) 0 because (1+ 2η 1 )/2 = max{η,1 η}. Combining inequalities (11.0.4) and (11.0.5), we see that l (α,η) l (η), (11.0.5) ψ(r(f) R ) E[1{sign(f(X)) sign(2η(x) 1)}H( 2η(X) 1 )] E [ l (f(x),η(x)) l (η(x))] = R (f) R, which is our desired result. 3. Having proved the quantitative bound (11.0.2), we now turn to proving the second part of Theorem Using Lemma 11.9, we can prove the equivalence of all three items. We begin by showing that IV(b)1 implies IV(b)2. If is classification calibrated, we have H(δ) > 0 for all δ > 0. Because ψ is continuous and ψ(0) = 0, if δ 0, then 109
6 ψ(δ) 0. It remains to show that ψ(δ) 0 implies that δ 0. But this is clear because we know that ψ(0) = 0 andψ(δ) > 0 whenever δ > 0, and the convexity of ψ implies that ψ is increasing. To obtain IV(b)3 from IV(b)2, note that by inequality (11.0.2), we have ψ(r(f n ) R ) R (f n ) R 0, so we must have that δ n = R(f n ) R 0. Finally, we show that IV(b)1 follows from IV(b)3. Assume for the sake of contradiction that IV(b)3 holds but IV(b)1 fails, that is, is not classification calibrated. Then there must exist η < 1/2 and a sequence α n 0 (i.e. a sequence of predictions with incorrect sign) satisfying l (α n,η) l (η). Construct the classification problem with a singleton X = {x}, and set P(Y = 1) = η. Then the sequence f n (x) = α n satisfies R (f n ) R but the true 0-1 risk R(f n) R. V. Classification calibration in the convex case a. Suppose that is convex, which we often use for computational reasons b. Theorem (Bartlett, Jordan, McAuliffe [1]). If is convex, then is classification calibrated if and only if (0) exists and (0) < 0. Proof First, suppose that is differentiable at 0 and (0) < 0. Then l (α,η) = η(α)+(1 η)( α) satisfies l (0,η) = (2η 1) (0), and if (0) < 0, this quantity is negative for η > 1/2. Thus the minimizing α(η) (0, ]. (Proof by picture, but formalize in full notes.) For the other direction assume that is classification calibrated. Recall the definition of a subgradient g α of the function at α R is any g α such that (t) (α)+g α (t α) for all t R. (Picture.) Let g 1,g 2 be such that l(α) l(0)+g 1 α and l(α) l(0)+g 2 α, which exist by convexity. We show that both g 1,g 2 < 0 and g 1 = g 2. By convexity we have l (α,η) η((0)+g 1 α)+(1 η)((0) g 2 α) = [ηg 1 (1 η)g 2 ]α+(0). (11.0.6) We first show that g 1 = g 2, meaning that is differentiable. Without loss of generality, assume g 1 > g 2. Then for η > 1/2, we would have ηg 1 (1 η)g 2 > 0, which would imply that l (α,η) (0) inf α 0 { η(α )+(1 η)( α ) } = l wrong (η), for all α 0 by (11.0.6), by taking α = 0 in the second inequality. By our assumption of classification calibration, for η > 1/2 we know that inf α l (α,η) < inf l (α,η) = l wrong α 0 (η) so l (η) = inf l (α,η), α 0 and under the assumption that g 1 > g 2 we obtain l (η) = inf α 0l (α,η) > l wrong (η), which is a contradiction to classification calibration. We thus obtain g 1 = g 2, so that the function has a unique subderivative at α = 0 and is thus differentiable. 110
7 Now that we know is differentiable at 0, consider η(α)+(1 η)( α) (2η 1) (0)α+(0). If (0) 0, then for α 0 and η > 1/2 we must have the right hand side is at least (0), which contradicts classification calibration, because we know that l (η) < lwrong (η) exactly as in the preceding argument Proofs of convex analytic results Proof of Lemma 11.4 First, let (a,b) domf and fix x 0 (a,b). Let x x 0, which is no loss of generality, and we may also assume x (a,b). Then we have for some α,β [0,1]. Rearranging by convexity, x = αa+(1 α)x 0 and x 0 = βb+(1 β)x f(x) αf(a)+(1 α)f(x 0 ) = f(x 0 )+α(f(a) f(x 0 )) and Taking α,β 0, we obtain f(x 0 ) βf(b)+(1 β)f(x), or 1 1 β f(x 0) f(x)+ β 1 β f(b). lim inf x x 0 f(x) f(x 0 ) and limsup x x 0 f(x) f(x 0 ) as desired Proof of Lemma 11.4 We need only consider the endpoints of the domain by Lemma 11.3, and we only need to show that limsup x x0 f(x) f(x 0 ). But this is obvious by convexity: let x = ty + (1 t)x 0 for any y domf, and taking t 0, we have f(x) tf(y)+(1 t)f(x 0 ) f(x 0 ) Proof of Lemma 11.6 Webeginwith thecase(i). Definethefunction h low (t) := inf s t h(s). Then becausehiscontinuous, weknowthatoveranycompactsetitattainsitsinfimum, andthus(byassumptiononh)h low (t) > 0 for all t > 0. Moreover, h low is non-decreasing. Now define f low (t) = h low (t) to be the biconjugate of h low ; it is clear that f f low as h h low. Thus we see that case (ii) implies case (i), so we turn to the more general result to see that f low (t) > 0 for all t > 0. For the result in case (ii), assume for the sake of contradiction there is some z (0,1) satisfying h (z) = 0. It is clear that h (0) = 0 and h 0, so we must have h (z/2) = 0. Now, by 111
8 assumption we have h(z/2) = b > 0, whence we have h(1) b > 0. In particular, the piecewise linear function defined by { 0 if t z/2 g(t) = b 1 z/2 (t z/2) if t > z/2 is closed, convex, and satisfies g h. But g(z) > 0 = h (z), a contradiction to the fact that h is the largest (closed) convex function below h. 112
9 Bibliography [1] P. L. Bartlett, M. I. Jordan, and J. McAuliffe. Convexity, classification, and risk bounds. Journal of the American Statistical Association, 101: ,
Calibrated Surrogate Losses
EECS 598: Statistical Learning Theory, Winter 2014 Topic 14 Calibrated Surrogate Losses Lecturer: Clayton Scott Scribe: Efrén Cruz Cortés Disclaimer: These notes have not been subjected to the usual scrutiny
More informationLecture 10 February 23
EECS 281B / STAT 241B: Advanced Topics in Statistical LearningSpring 2009 Lecture 10 February 23 Lecturer: Martin Wainwright Scribe: Dave Golland Note: These lecture notes are still rough, and have only
More informationStatistical Properties of Large Margin Classifiers
Statistical Properties of Large Margin Classifiers Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Jordan, Jon McAuliffe, Ambuj Tewari. slides
More informationCalibrated asymmetric surrogate losses
Electronic Journal of Statistics Vol. 6 (22) 958 992 ISSN: 935-7524 DOI:.24/2-EJS699 Calibrated asymmetric surrogate losses Clayton Scott Department of Electrical Engineering and Computer Science Department
More informationAdaBoost and other Large Margin Classifiers: Convexity in Classification
AdaBoost and other Large Margin Classifiers: Convexity in Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mikhail Traskin. slides at
More informationConvexity, Classification, and Risk Bounds
Convexity, Classification, and Risk Bounds Peter L. Bartlett Computer Science Division and Department of Statistics University of California, Berkeley bartlett@stat.berkeley.edu Michael I. Jordan Computer
More information8. Conjugate functions
L. Vandenberghe EE236C (Spring 2013-14) 8. Conjugate functions closed functions conjugate function 8-1 Closed set a set C is closed if it contains its boundary: x k C, x k x = x C operations that preserve
More informationSome Background Math Notes on Limsups, Sets, and Convexity
EE599 STOCHASTIC NETWORK OPTIMIZATION, MICHAEL J. NEELY, FALL 2008 1 Some Background Math Notes on Limsups, Sets, and Convexity I. LIMITS Let f(t) be a real valued function of time. Suppose f(t) converges
More informationConvexity in R n. The following lemma will be needed in a while. Lemma 1 Let x E, u R n. If τ I(x, u), τ 0, define. f(x + τu) f(x). τ.
Convexity in R n Let E be a convex subset of R n. A function f : E (, ] is convex iff f(tx + (1 t)y) (1 t)f(x) + tf(y) x, y E, t [0, 1]. A similar definition holds in any vector space. A topology is needed
More informationLarge Margin Classifiers: Convexity and Classification
Large Margin Classifiers: Convexity and Classification Peter Bartlett Division of Computer Science and Department of Statistics UC Berkeley Joint work with Mike Collins, Mike Jordan, David McAllester,
More informationSTATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION
STATISTICAL BEHAVIOR AND CONSISTENCY OF CLASSIFICATION METHODS BASED ON CONVEX RISK MINIMIZATION Tong Zhang The Annals of Statistics, 2004 Outline Motivation Approximation error under convex risk minimization
More informationConvex Functions. Pontus Giselsson
Convex Functions Pontus Giselsson 1 Today s lecture lower semicontinuity, closure, convex hull convexity preserving operations precomposition with affine mapping infimal convolution image function supremum
More informationStatic Problem Set 2 Solutions
Static Problem Set Solutions Jonathan Kreamer July, 0 Question (i) Let g, h be two concave functions. Is f = g + h a concave function? Prove it. Yes. Proof: Consider any two points x, x and α [0, ]. Let
More informationGEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, Dedicated to Franco Giannessi and Diethard Pallaschke with great respect
GEOMETRIC APPROACH TO CONVEX SUBDIFFERENTIAL CALCULUS October 10, 2018 BORIS S. MORDUKHOVICH 1 and NGUYEN MAU NAM 2 Dedicated to Franco Giannessi and Diethard Pallaschke with great respect Abstract. In
More informationBASICS OF CONVEX ANALYSIS
BASICS OF CONVEX ANALYSIS MARKUS GRASMAIR 1. Main Definitions We start with providing the central definitions of convex functions and convex sets. Definition 1. A function f : R n R + } is called convex,
More informationOptimization and Optimal Control in Banach Spaces
Optimization and Optimal Control in Banach Spaces Bernhard Schmitzer October 19, 2017 1 Convex non-smooth optimization with proximal operators Remark 1.1 (Motivation). Convex optimization: easier to solve,
More information2 Loss Functions and Their Risks
2 Loss Functions and Their Risks Overview. We saw in the introduction that the learning problems we consider in this book can be described by loss functions and their associated risks. In this chapter,
More informationAW -Convergence and Well-Posedness of Non Convex Functions
Journal of Convex Analysis Volume 10 (2003), No. 2, 351 364 AW -Convergence Well-Posedness of Non Convex Functions Silvia Villa DIMA, Università di Genova, Via Dodecaneso 35, 16146 Genova, Italy villa@dima.unige.it
More informationLECTURE 4 LECTURE OUTLINE
LECTURE 4 LECTURE OUTLINE Relative interior and closure Algebra of relative interiors and closures Continuity of convex functions Closures of functions Reading: Section 1.3 All figures are courtesy of
More informationConvex envelopes, cardinality constrained optimization and LASSO. An application in supervised learning: support vector machines (SVMs)
ORF 523 Lecture 8 Princeton University Instructor: A.A. Ahmadi Scribe: G. Hall Any typos should be emailed to a a a@princeton.edu. 1 Outline Convexity-preserving operations Convex envelopes, cardinality
More informationSubgradients. subgradients and quasigradients. subgradient calculus. optimality conditions via subgradients. directional derivatives
Subgradients subgradients and quasigradients subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE392o, Stanford University Basic inequality recall basic
More informationOn the Consistency of AUC Pairwise Optimization
On the Consistency of AUC Pairwise Optimization Wei Gao and Zhi-Hua Zhou National Key Laboratory for Novel Software Technology, Nanjing University Collaborative Innovation Center of Novel Software Technology
More informationg 2 (x) (1/3)M 1 = (1/3)(2/3)M.
COMPACTNESS If C R n is closed and bounded, then by B-W it is sequentially compact: any sequence of points in C has a subsequence converging to a point in C Conversely, any sequentially compact C R n is
More informationOnline Convex Optimization
Advanced Course in Machine Learning Spring 2010 Online Convex Optimization Handouts are jointly prepared by Shie Mannor and Shai Shalev-Shwartz A convex repeated game is a two players game that is performed
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 20 Subgradients Assumptions
More informationHandout 2: Elements of Convex Analysis
ENGG 5501: Foundations of Optimization 2018 19 First Term Handout 2: Elements of Convex Analysis Instructor: Anthony Man Cho So September 10, 2018 As briefly mentioned in Handout 1, the notion of convexity
More informationClassification objectives COMS 4771
Classification objectives COMS 4771 1. Recap: binary classification Scoring functions Consider binary classification problems with Y = { 1, +1}. 1 / 22 Scoring functions Consider binary classification
More informationConvex Optimization Theory. Chapter 3 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 3 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationConvex Analysis Background
Convex Analysis Background John C. Duchi Stanford University Park City Mathematics Institute 206 Abstract In this set of notes, we will outline several standard facts from convex analysis, the study of
More informationChapter 2: Preliminaries and elements of convex analysis
Chapter 2: Preliminaries and elements of convex analysis Edoardo Amaldi DEIB Politecnico di Milano edoardo.amaldi@polimi.it Website: http://home.deib.polimi.it/amaldi/opt-14-15.shtml Academic year 2014-15
More informationIntroduction to Convex and Quasiconvex Analysis
Introduction to Convex and Quasiconvex Analysis J.B.G.Frenk Econometric Institute, Erasmus University, Rotterdam G.Kassay Faculty of Mathematics, Babes Bolyai University, Cluj August 27, 2001 Abstract
More informationConsistency of Nearest Neighbor Methods
E0 370 Statistical Learning Theory Lecture 16 Oct 25, 2011 Consistency of Nearest Neighbor Methods Lecturer: Shivani Agarwal Scribe: Arun Rajkumar 1 Introduction In this lecture we return to the study
More informationLower semicontinuous and Convex Functions
Lower semicontinuous and Convex Functions James K. Peterson Department of Biological Sciences and Department of Mathematical Sciences Clemson University October 6, 2017 Outline Lower Semicontinuous Functions
More informationIE 521 Convex Optimization
Lecture 5: Convex II 6th February 2019 Convex Local Lipschitz Outline Local Lipschitz 1 / 23 Convex Local Lipschitz Convex Function: f : R n R is convex if dom(f ) is convex and for any λ [0, 1], x, y
More informationSemicontinuous functions and convexity
Semicontinuous functions and convexity Jordan Bell jordan.bell@gmail.com Department of Mathematics, University of Toronto April 3, 2014 1 Lattices If (A, ) is a partially ordered set and S is a subset
More informationLecture 1: Background on Convex Analysis
Lecture 1: Background on Convex Analysis John Duchi PCMI 2016 Outline I Convex sets 1.1 Definitions and examples 2.2 Basic properties 3.3 Projections onto convex sets 4.4 Separating and supporting hyperplanes
More informationConvex Optimization Notes
Convex Optimization Notes Jonathan Siegel January 2017 1 Convex Analysis This section is devoted to the study of convex functions f : B R {+ } and convex sets U B, for B a Banach space. The case of B =
More information4. Convex Sets and (Quasi-)Concave Functions
4. Convex Sets and (Quasi-)Concave Functions Daisuke Oyama Mathematics II April 17, 2017 Convex Sets Definition 4.1 A R N is convex if (1 α)x + αx A whenever x, x A and α [0, 1]. A R N is strictly convex
More informationSolution. 1 Solution of Homework 7. Sangchul Lee. March 22, Problem 1.1
Solution Sangchul Lee March, 018 1 Solution of Homework 7 Problem 1.1 For a given k N, Consider two sequences (a n ) and (b n,k ) in R. Suppose that a n b n,k for all n,k N Show that limsup a n B k :=
More informationSupport Vector Machines and Bayes Regression
Statistical Techniques in Robotics (16-831, F11) Lecture #14 (Monday ctober 31th) Support Vector Machines and Bayes Regression Lecturer: Drew Bagnell Scribe: Carl Doersch 1 1 Linear SVMs We begin by considering
More informationLecture 8. Strong Duality Results. September 22, 2008
Strong Duality Results September 22, 2008 Outline Lecture 8 Slater Condition and its Variations Convex Objective with Linear Inequality Constraints Quadratic Objective over Quadratic Constraints Representation
More informationOptimality Conditions for Nonsmooth Convex Optimization
Optimality Conditions for Nonsmooth Convex Optimization Sangkyun Lee Oct 22, 2014 Let us consider a convex function f : R n R, where R is the extended real field, R := R {, + }, which is proper (f never
More informationIowa State University. Instructor: Alex Roitershtein Summer Homework #5. Solutions
Math 50 Iowa State University Introduction to Real Analysis Department of Mathematics Instructor: Alex Roitershtein Summer 205 Homework #5 Solutions. Let α and c be real numbers, c > 0, and f is defined
More informationDedicated to Michel Théra in honor of his 70th birthday
VARIATIONAL GEOMETRIC APPROACH TO GENERALIZED DIFFERENTIAL AND CONJUGATE CALCULI IN CONVEX ANALYSIS B. S. MORDUKHOVICH 1, N. M. NAM 2, R. B. RECTOR 3 and T. TRAN 4. Dedicated to Michel Théra in honor of
More informationAbstract Monotone Operators Representable by Abstract Convex Functions
Applied Mathematical Sciences, Vol. 6, 2012, no. 113, 5649-5653 Abstract Monotone Operators Representable by Abstract Convex Functions H. Mohebi and A. R. Sattarzadeh Department of Mathematics of Shahid
More informationHOMEWORK ASSIGNMENT 6
HOMEWORK ASSIGNMENT 6 DUE 15 MARCH, 2016 1) Suppose f, g : A R are uniformly continuous on A. Show that f + g is uniformly continuous on A. Solution First we note: In order to show that f + g is uniformly
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More information2 Upper-bound of Generalization Error of AdaBoost
COS 511: Theoretical Machine Learning Lecturer: Rob Schapire Lecture #10 Scribe: Haipeng Zheng March 5, 2008 1 Review of AdaBoost Algorithm Here is the AdaBoost Algorithm: input: (x 1,y 1 ),...,(x m,y
More informationSurrogate losses and regret bounds for cost-sensitive classification with example-dependent costs
Surrogate losses and regret bounds for cost-sensitive classification with example-dependent costs Clayton Scott Dept. of Electrical Engineering and Computer Science, and of Statistics University of Michigan,
More informationChapter 2 Convex Analysis
Chapter 2 Convex Analysis The theory of nonsmooth analysis is based on convex analysis. Thus, we start this chapter by giving basic concepts and results of convexity (for further readings see also [202,
More informationLECTURE 12 LECTURE OUTLINE. Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions
LECTURE 12 LECTURE OUTLINE Subgradients Fenchel inequality Sensitivity in constrained optimization Subdifferential calculus Optimality conditions Reading: Section 5.4 All figures are courtesy of Athena
More informationPrimal/Dual Decomposition Methods
Primal/Dual Decomposition Methods Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2018-19, HKUST, Hong Kong Outline of Lecture Subgradients
More informationLECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE
LECTURE 25: REVIEW/EPILOGUE LECTURE OUTLINE CONVEX ANALYSIS AND DUALITY Basic concepts of convex analysis Basic concepts of convex optimization Geometric duality framework - MC/MC Constrained optimization
More informationDesign and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall Nov 2 Dec 2016
Design and Analysis of Algorithms Lecture Notes on Convex Optimization CS 6820, Fall 206 2 Nov 2 Dec 206 Let D be a convex subset of R n. A function f : D R is convex if it satisfies f(tx + ( t)y) tf(x)
More informationIntroduction and Preliminaries
Chapter 1 Introduction and Preliminaries This chapter serves two purposes. The first purpose is to prepare the readers for the more systematic development in later chapters of methods of real analysis
More informationDO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO
QUESTION BOOKLET EECS 227A Fall 2009 Midterm Tuesday, Ocotober 20, 11:10-12:30pm DO NOT OPEN THIS QUESTION BOOKLET UNTIL YOU ARE TOLD TO DO SO You have 80 minutes to complete the midterm. The midterm consists
More informationSparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results
Sparseness Versus Estimating Conditional Probabilities: Some Asymptotic Results Peter L. Bartlett 1 and Ambuj Tewari 2 1 Division of Computer Science and Department of Statistics University of California,
More informationQF101: Quantitative Finance August 22, Week 1: Functions. Facilitator: Christopher Ting AY 2017/2018
QF101: Quantitative Finance August 22, 2017 Week 1: Functions Facilitator: Christopher Ting AY 2017/2018 The chief function of the body is to carry the brain around. Thomas A. Edison 1.1 What is a function?
More informationHamiltonian Mechanics
Chapter 3 Hamiltonian Mechanics 3.1 Convex functions As background to discuss Hamiltonian mechanics we discuss convexity and convex functions. We will also give some applications to thermodynamics. We
More informationSubdifferential representation of convex functions: refinements and applications
Subdifferential representation of convex functions: refinements and applications Joël Benoist & Aris Daniilidis Abstract Every lower semicontinuous convex function can be represented through its subdifferential
More informationMath 118B Solutions. Charles Martin. March 6, d i (x i, y i ) + d i (y i, z i ) = d(x, y) + d(y, z). i=1
Math 8B Solutions Charles Martin March 6, Homework Problems. Let (X i, d i ), i n, be finitely many metric spaces. Construct a metric on the product space X = X X n. Proof. Denote points in X as x = (x,
More informationL p Spaces and Convexity
L p Spaces and Convexity These notes largely follow the treatments in Royden, Real Analysis, and Rudin, Real & Complex Analysis. 1. Convex functions Let I R be an interval. For I open, we say a function
More informationDEFINABLE VERSIONS OF THEOREMS BY KIRSZBRAUN AND HELLY
DEFINABLE VERSIONS OF THEOREMS BY KIRSZBRAUN AND HELLY MATTHIAS ASCHENBRENNER AND ANDREAS FISCHER Abstract. Kirszbraun s Theorem states that every Lipschitz map S R n, where S R m, has an extension to
More informationParcours OJD, Ecole Polytechnique et Université Pierre et Marie Curie 05 Mai 2015
Examen du cours Optimisation Stochastique Version 06/05/2014 Mastère de Mathématiques de la Modélisation F. Bonnans Parcours OJD, Ecole Polytechnique et Université Pierre et Marie Curie 05 Mai 2015 Authorized
More informationA SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06
A SET OF LECTURE NOTES ON CONVEX OPTIMIZATION WITH SOME APPLICATIONS TO PROBABILITY THEORY INCOMPLETE DRAFT. MAY 06 CHRISTIAN LÉONARD Contents Preliminaries 1 1. Convexity without topology 1 2. Convexity
More informationConvex Optimization. (EE227A: UC Berkeley) Lecture 4. Suvrit Sra. (Conjugates, subdifferentials) 31 Jan, 2013
Convex Optimization (EE227A: UC Berkeley) Lecture 4 (Conjugates, subdifferentials) 31 Jan, 2013 Suvrit Sra Organizational HW1 due: 14th Feb 2013 in class. Please L A TEX your solutions (contact TA if this
More informationOn duality theory of conic linear problems
On duality theory of conic linear problems Alexander Shapiro School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, Georgia 3332-25, USA e-mail: ashapiro@isye.gatech.edu
More informationZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT
ZERO DUALITY GAP FOR CONVEX PROGRAMS: A GENERAL RESULT EMIL ERNST AND MICHEL VOLLE Abstract. This article addresses a general criterion providing a zero duality gap for convex programs in the setting of
More informationMAT 771 FUNCTIONAL ANALYSIS HOMEWORK 3. (1) Let V be the vector space of all bounded or unbounded sequences of complex numbers.
MAT 771 FUNCTIONAL ANALYSIS HOMEWORK 3 (1) Let V be the vector space of all bounded or unbounded sequences of complex numbers. (a) Define d : V V + {0} by d(x, y) = 1 ξ j η j 2 j 1 + ξ j η j. Show that
More informationExamples of Dual Spaces from Measure Theory
Chapter 9 Examples of Dual Spaces from Measure Theory We have seen that L (, A, µ) is a Banach space for any measure space (, A, µ). We will extend that concept in the following section to identify an
More informationDS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM
DS-GA 1003: Machine Learning and Computational Statistics Homework 6: Generalized Hinge Loss and Multiclass SVM Due: Monday, April 11, 2016, at 6pm (Submit via NYU Classes) Instructions: Your answers to
More informationSubgradients. subgradients. strong and weak subgradient calculus. optimality conditions via subgradients. directional derivatives
Subgradients subgradients strong and weak subgradient calculus optimality conditions via subgradients directional derivatives Prof. S. Boyd, EE364b, Stanford University Basic inequality recall basic inequality
More informationHence, (f(x) f(x 0 )) 2 + (g(x) g(x 0 )) 2 < ɛ
Matthew Straughn Math 402 Homework 5 Homework 5 (p. 429) 13.3.5, 13.3.6 (p. 432) 13.4.1, 13.4.2, 13.4.7*, 13.4.9 (p. 448-449) 14.2.1, 14.2.2 Exercise 13.3.5. Let (X, d X ) be a metric space, and let f
More informationEpiconvergence and ε-subgradients of Convex Functions
Journal of Convex Analysis Volume 1 (1994), No.1, 87 100 Epiconvergence and ε-subgradients of Convex Functions Andrei Verona Department of Mathematics, California State University Los Angeles, CA 90032,
More informationarxiv:math/ v1 [math.fa] 4 Feb 1993
Lectures on Maximal Monotone Operators R. R. Phelps Dept. Math. GN 50, Univ. of Wash., Seattle WA 98195; phelps@math.washington.edu (Lectures given at Prague/Paseky Summer School, Czech Republic, August
More informationCourse 212: Academic Year Section 1: Metric Spaces
Course 212: Academic Year 1991-2 Section 1: Metric Spaces D. R. Wilkins Contents 1 Metric Spaces 3 1.1 Distance Functions and Metric Spaces............. 3 1.2 Convergence and Continuity in Metric Spaces.........
More informationNotes on uniform convergence
Notes on uniform convergence Erik Wahlén erik.wahlen@math.lu.se January 17, 2012 1 Numerical sequences We begin by recalling some properties of numerical sequences. By a numerical sequence we simply mean
More informationFUNCTIONAL COMPRESSION-EXPANSION FIXED POINT THEOREM
Electronic Journal of Differential Equations, Vol. 28(28), No. 22, pp. 1 12. ISSN: 172-6691. URL: http://ejde.math.txstate.edu or http://ejde.math.unt.edu ftp ejde.math.txstate.edu (login: ftp) FUNCTIONAL
More informationAdvanced Calculus I Chapter 2 & 3 Homework Solutions October 30, Prove that f has a limit at 2 and x + 2 find it. f(x) = 2x2 + 3x 2 x + 2
Advanced Calculus I Chapter 2 & 3 Homework Solutions October 30, 2009 2. Define f : ( 2, 0) R by f(x) = 2x2 + 3x 2. Prove that f has a limit at 2 and x + 2 find it. Note that when x 2 we have f(x) = 2x2
More informationOn surrogate loss functions and f-divergences
On surrogate loss functions and f-divergences XuanLong Nguyen, Martin J. Wainwright, xuanlong.nguyen@stat.duke.edu wainwrig@stat.berkeley.edu Michael I. Jordan, jordan@stat.berkeley.edu Department of Statistical
More information6.1 Variational representation of f-divergences
ECE598: Information-theoretic methods in high-dimensional statistics Spring 2016 Lecture 6: Variational representation, HCR and CR lower bounds Lecturer: Yihong Wu Scribe: Georgios Rovatsos, Feb 11, 2016
More informationMath 273a: Optimization Subgradients of convex functions
Math 273a: Optimization Subgradients of convex functions Made by: Damek Davis Edited by Wotao Yin Department of Mathematics, UCLA Fall 2015 online discussions on piazza.com 1 / 42 Subgradients Assumptions
More informationAnalysis Finite and Infinite Sets The Real Numbers The Cantor Set
Analysis Finite and Infinite Sets Definition. An initial segment is {n N n n 0 }. Definition. A finite set can be put into one-to-one correspondence with an initial segment. The empty set is also considered
More informationThe proximal mapping
The proximal mapping http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes Outline 2/37 1 closed function 2 Conjugate function
More informationFenchel Duality between Strong Convexity and Lipschitz Continuous Gradient
Fenchel Duality between Strong Convexity and Lipschitz Continuous Gradient Xingyu Zhou The Ohio State University zhou.2055@osu.edu December 5, 2017 Xingyu Zhou (OSU) Fenchel Duality December 5, 2017 1
More informationIntegral Jensen inequality
Integral Jensen inequality Let us consider a convex set R d, and a convex function f : (, + ]. For any x,..., x n and λ,..., λ n with n λ i =, we have () f( n λ ix i ) n λ if(x i ). For a R d, let δ a
More informationStanford Statistics 311/Electrical Engineering 377
I. Bayes risk in classification problems a. Recall definition (1.2.3) of f-divergence between two distributions P and Q as ( ) p(x) D f (P Q) : q(x)f dx, q(x) where f : R + R is a convex function satisfying
More informationTHE UNIQUE MINIMAL DUAL REPRESENTATION OF A CONVEX FUNCTION
THE UNIQUE MINIMAL DUAL REPRESENTATION OF A CONVEX FUNCTION HALUK ERGIN AND TODD SARVER Abstract. Suppose (i) X is a separable Banach space, (ii) C is a convex subset of X that is a Baire space (when endowed
More informationIntroduction to Convex Analysis Microeconomics II - Tutoring Class
Introduction to Convex Analysis Microeconomics II - Tutoring Class Professor: V. Filipe Martins-da-Rocha TA: Cinthia Konichi April 2010 1 Basic Concepts and Results This is a first glance on basic convex
More informationExtended Monotropic Programming and Duality 1
March 2006 (Revised February 2010) Report LIDS - 2692 Extended Monotropic Programming and Duality 1 by Dimitri P. Bertsekas 2 Abstract We consider the problem minimize f i (x i ) subject to x S, where
More informationConvex Optimization Theory. Chapter 5 Exercises and Solutions: Extended Version
Convex Optimization Theory Chapter 5 Exercises and Solutions: Extended Version Dimitri P. Bertsekas Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationLECTURE SLIDES ON BASED ON CLASS LECTURES AT THE CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS.
LECTURE SLIDES ON CONVEX ANALYSIS AND OPTIMIZATION BASED ON 6.253 CLASS LECTURES AT THE MASSACHUSETTS INSTITUTE OF TECHNOLOGY CAMBRIDGE, MASS FALL 2007 BY DIMITRI P. BERTSEKAS http://web.mit.edu/dimitrib/www/home.html
More informationLecture 1: January 12
10-725/36-725: Convex Optimization Fall 2015 Lecturer: Ryan Tibshirani Lecture 1: January 12 Scribes: Seo-Jin Bang, Prabhat KC, Josue Orellana 1.1 Review We begin by going through some examples and key
More informationA function(al) f is convex if dom f is a convex set, and. f(θx + (1 θ)y) < θf(x) + (1 θ)f(y) f(x) = x 3
Convex functions The domain dom f of a functional f : R N R is the subset of R N where f is well-defined. A function(al) f is convex if dom f is a convex set, and f(θx + (1 θ)y) θf(x) + (1 θ)f(y) for all
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationConvex Analysis and Optimization Chapter 4 Solutions
Convex Analysis and Optimization Chapter 4 Solutions Dimitri P. Bertsekas with Angelia Nedić and Asuman E. Ozdaglar Massachusetts Institute of Technology Athena Scientific, Belmont, Massachusetts http://www.athenasc.com
More informationCharacterizations of the solution set for non-essentially quasiconvex programming
Optimization Letters manuscript No. (will be inserted by the editor) Characterizations of the solution set for non-essentially quasiconvex programming Satoshi Suzuki Daishi Kuroiwa Received: date / Accepted:
More informationEC9A0: Pre-sessional Advanced Mathematics Course. Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1
EC9A0: Pre-sessional Advanced Mathematics Course Lecture Notes: Unconstrained Optimisation By Pablo F. Beker 1 1 Infimum and Supremum Definition 1. Fix a set Y R. A number α R is an upper bound of Y if
More informationConvex Optimization Theory
Convex Optimization Theory A SUMMARY BY DIMITRI P. BERTSEKAS We provide a summary of theoretical concepts and results relating to convex analysis, convex optimization, and duality theory. In particular,
More informationLecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016
Lecture 1: Entropy, convexity, and matrix scaling CSE 599S: Entropy optimality, Winter 2016 Instructor: James R. Lee Last updated: January 24, 2016 1 Entropy Since this course is about entropy maximization,
More information