Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Similar documents
CS-E4830 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning

Constrained Optimization and Lagrangian Duality

5. Duality. Lagrangian

Lecture: Duality of LP, SOCP and SDP

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

Convex Optimization Boyd & Vandenberghe. 5. Duality

Lecture 18: Optimization Programming

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Introduction to Machine Learning Lecture 7. Mehryar Mohri Courant Institute and Google Research

Karush-Kuhn-Tucker Conditions. Lecturer: Ryan Tibshirani Convex Optimization /36-725

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

Optimization for Machine Learning

Lecture: Duality.

Machine Learning. Support Vector Machines. Manfred Huber

Support vector machines

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Subgradient. Acknowledgement: this slides is based on Prof. Lieven Vandenberghes lecture notes. definition. subgradient calculus

Convex Optimization M2

Convex Optimization & Lagrange Duality

Shiqian Ma, MAT-258A: Numerical Optimization 1. Chapter 4. Subgradient

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

On the Method of Lagrange Multipliers

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Lecture 6: Conic Optimization September 8

Optimality Conditions for Constrained Optimization

Primal/Dual Decomposition Methods

A Brief Review on Convex Optimization

Support Vector Machines and Kernel Methods

CSCI : Optimization and Control of Networks. Review on Convex Optimization

Lagrange duality. The Lagrangian. We consider an optimization program of the form

Support Vector Machines: Maximum Margin Classifiers

EE/AA 578, Univ of Washington, Fall Duality

Lecture 3. Optimization Problems and Iterative Algorithms

Support Vector Machines

Lecture 3: Lagrangian duality and algorithms for the Lagrangian dual problem

Convex Optimization and SVM

Introduction to Machine Learning Spring 2018 Note Duality. 1.1 Primal and Dual Problem

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Lecture Notes on Support Vector Machine

Optimization for Communications and Networks. Poompat Saengudomlert. Session 4 Duality and Lagrange Multipliers

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

10701 Recitation 5 Duality and SVM. Ahmed Hefny

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machine

Convex Optimization Overview (cnt d)

Lagrangian Duality and Convex Optimization

Gaps in Support Vector Optimization

Lecture Note 5: Semidefinite Programming for Stability Analysis

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Support Vector Machine (continued)

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Support Vector Machines

Lecture 2: Linear SVM in the Dual

SMO Algorithms for Support Vector Machines without Bias Term

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods

Lectures 9 and 10: Constrained optimization problems and their optimality conditions

Lecture Support Vector Machine (SVM) Classifiers

Statistical Machine Learning from Data

Linear classifiers selecting hyperplane maximizing separation margin between classes (large margin classifiers)

Constrained Optimization

Convex Optimization and Modeling

Lecture 7: Convex Optimizations

Math 273a: Optimization Subgradients of convex functions

minimize x subject to (x 2)(x 4) u,

ISM206 Lecture Optimization of Nonlinear Objective with Linear Constraints

Solving the SVM Optimization Problem

Tutorial on Convex Optimization for Engineers Part II

Support Vector Machine I

Duality. Geoff Gordon & Ryan Tibshirani Optimization /

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) and Kernel Methods

CSC 411 Lecture 17: Support Vector Machine

Tutorial on Convex Optimization: Part II

SVMs, Duality and the Kernel Trick

Support Vector Machines

Machine Learning A Geometric Approach

Lagrangian Duality. Richard Lusby. Department of Management Engineering Technical University of Denmark

Interior Point Algorithms for Constrained Convex Optimization

Support Vector Machines

Nonlinear Optimization: What s important?

Duality Uses and Correspondences. Ryan Tibshirani Convex Optimization

Introduction to Support Vector Machines

Quiz Discussion. IE417: Nonlinear Programming: Lecture 12. Motivation. Why do we care? Jeff Linderoth. 16th March 2006

Convex Optimization M2

Convex Optimization. Newton s method. ENSAE: Optimisation 1/44

Lecture 10: A brief introduction to Support Vector Machine

Applied inductive learning - Lecture 7

Max Margin-Classifier

Support Vector Machines for Regression

subject to (x 2)(x 4) u,

WHY DUALITY? Gradient descent Newton s method Quasi-newton Conjugate gradients. No constraints. Non-differentiable ???? Constrained problems? ????

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

12. Interior-point methods

Dual Decomposition.

CONVEX OPTIMIZATION, DUALITY, AND THEIR APPLICATION TO SUPPORT VECTOR MACHINES. Contents 1. Introduction 1 2. Convex Sets

Convex Optimization Overview (cnt d)

Course Notes for EE227C (Spring 2018): Convex Optimization and Approximation

Transcription:

Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 1 / 26

Key Concepts in Convex Analysis: Convex Sets Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 2 / 26

Key Concepts in Convex Analysis: Convex Functions Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 3 / 26

Key Concepts in Convex Analysis: Minimizers Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 4 / 26

Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: λ [0, 1], f (λx + (1 λ)x ) λf (x) + (1 λ)f (x ) convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26

Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: λ [0, 1], f (λx + (1 λ)x ) λf (x) + (1 λ)f (x ) A β strongly convex function satisfies a stronger condition: λ [0, 1] f (λx + (1 λ)x ) λf (x) + (1 λ)f (x ) β 2 λ(1 λ) x x 2 2 convexity strong convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26

Key Concepts in Convex Analysis: Strong Convexity Recall the definition of convex function: λ [0, 1], f (λx + (1 λ)x ) λf (x) + (1 λ)f (x ) A β strongly convex function satisfies a stronger condition: λ [0, 1] f (λx + (1 λ)x ) λf (x) + (1 λ)f (x ) β 2 λ(1 λ) x x 2 2 convexity Strong convexity strict convexity. strong convexity Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 5 / 26

Key Concepts in Convex Analysis: Subgradients Convexity continuity; convexity differentiability (e.g., f (w) = w 1 ). Subgradients generalize gradients for (maybe non-diff.) convex functions: Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26

Key Concepts in Convex Analysis: Subgradients Convexity continuity; convexity differentiability (e.g., f (w) = w 1 ). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x ) f (x) + v (x x) Subdifferential: f (x) = {v : v is a subgradient of f at x} linear lower bound Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26

Key Concepts in Convex Analysis: Subgradients Convexity continuity; convexity differentiability (e.g., f (w) = w 1 ). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x ) f (x) + v (x x) Subdifferential: f (x) = {v : v is a subgradient of f at x} If f is differentiable, f (x) = { f (x)} linear lower bound Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26

Key Concepts in Convex Analysis: Subgradients Convexity continuity; convexity differentiability (e.g., f (w) = w 1 ). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x ) f (x) + v (x x) Subdifferential: f (x) = {v : v is a subgradient of f at x} If f is differentiable, f (x) = { f (x)} linear lower bound non-differentiable case Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26

Key Concepts in Convex Analysis: Subgradients Convexity continuity; convexity differentiability (e.g., f (w) = w 1 ). Subgradients generalize gradients for (maybe non-diff.) convex functions: v is a subgradient of f at x if f (x ) f (x) + v (x x) Subdifferential: f (x) = {v : v is a subgradient of f at x} If f is differentiable, f (x) = { f (x)} linear lower bound non-differentiable case Notation: f (x) is a subgradient of f at x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 6 / 26

Establishing convexity How to check if f (x) is a convex function? Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 7 / 26

Establishing convexity How to check if f (x) is a convex function? Verify definition of a convex function. Check if 2 f (x) greater than or equal to 0 (for twice differentiable 2 x function). Show that it is constructed from simple convex functions with operations that preserver convexity. nonnegative weighted sum composition with affine function pointwise maximum and supremum composition minimization perspective Reference: Boyd and Vandenberghe (2004) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 7 / 26

Unconstrained Optimization Algorithms: First order methods (gradient descent, FISTA, etc.) Higher order methods (Newton s method, ellipsoid, etc.)... Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 8 / 26

Gradient descent Problem: min f (x) x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 9 / 26

Gradient descent Problem: min f (x) x Algorithm: g t = f (xt) x. x t = x t 1 ηg t. Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 9 / 26

Newton s method Problem: Assume f is twice differentiable. min f (x) x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26

Newton s method Problem: Assume f is twice differentiable. Algorithm: g t = f (xt) x. min f (x) x s t = H 1 g t, where H is the Hessian. x t = x t 1 ηs t. Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26

Newton s method Problem: Assume f is twice differentiable. Algorithm: g t = f (xt) x. min f (x) x s t = H 1 g t, where H is the Hessian. x t = x t 1 ηs t. Repeat until convergence. Newton s method is a special case of steepest descent using Hessian norm. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 10 / 26

Primal problem: Duality min x f (x) subject to g i (x) 0 i = 1,..., m h i (x) = 0 i = 1,..., p for x X. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26

Primal problem: Duality min x f (x) subject to g i (x) 0 i = 1,..., m h i (x) = 0 i = 1,..., p for x X. Lagrangian: m L(x, λ, ν) = f (x) + λ i g i (x) + i=1 λ i and ν i are Lagrange multipliers. p ν i h i (x) i=1 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26

Primal problem: Duality min x f (x) subject to g i (x) 0 i = 1,..., m h i (x) = 0 i = 1,..., p for x X. Lagrangian: m L(x, λ, ν) = f (x) + λ i g i (x) + i=1 λ i and ν i are Lagrange multipliers. p ν i h i (x) Suppose x is feasible and λ 0, then we get the lower bound: f (x) L(x, λ, ν) x X, λ R m + i=1 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 11 / 26

Duality Primal optimal: p = min x max L(x, λ, ν) λ 0,ν Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 12 / 26

Duality Primal optimal: p = min x max L(x, λ, ν) λ 0,ν Lagrange dual function: min x L(x, λ, ν) This is a concave function, regardless of whether f (x) convex or not. Can be for some λ and ν. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 12 / 26

Duality Primal optimal: p = min x max L(x, λ, ν) λ 0,ν Lagrange dual function: min x L(x, λ, ν) This is a concave function, regardless of whether f (x) convex or not. Can be for some λ and ν. Lagrange dual problem: max L(x, λ, ν) subject to λ 0 λ,ν Dual feasible: if λ 0 and λ, ν dom L(x, λ, ν). Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 12 / 26

Duality Primal optimal: p = min x max L(x, λ, ν) λ 0,ν Lagrange dual function: min x L(x, λ, ν) This is a concave function, regardless of whether f (x) convex or not. Can be for some λ and ν. Lagrange dual problem: max L(x, λ, ν) subject to λ 0 λ,ν Dual feasible: if λ 0 and λ, ν dom L(x, λ, ν). Dual optimal: d = max min λ 0,ν x L(x, λ, ν) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 12 / 26

Duality Weak duality p d always holds for convex and nonconvex problems Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 13 / 26

Duality Weak duality p d always holds for convex and nonconvex problems Strong duality p = d does not hold in general, but usually holds for convex problems. Strong duality is guaranteed by Slater s constraint qualification. Strong duality holds if the problem is strictly feasible, i.e. x int D s.t. g i (x) < 0, i = 1,..., m, h i (x) = 0, i = 1,..., p Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 13 / 26

Duality Weak duality p d always holds for convex and nonconvex problems Strong duality p = d does not hold in general, but usually holds for convex problems. Strong duality is guaranteed by Slater s constraint qualification. Strong duality holds if the problem is strictly feasible, i.e. x int D s.t. g i (x) < 0, i = 1,..., m, h i (x) = 0, i = 1,..., p Assume strong duality holds and p and d are attained. p = f (x ) = d = min L(x, λ, ν ) L(x, λ, ν ) f (x ) = p x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 13 / 26

Duality Weak duality p d always holds for convex and nonconvex problems Strong duality p = d does not hold in general, but usually holds for convex problems. Strong duality is guaranteed by Slater s constraint qualification. Strong duality holds if the problem is strictly feasible, i.e. x int D s.t. g i (x) < 0, i = 1,..., m, h i (x) = 0, i = 1,..., p Assume strong duality holds and p and d are attained. p = f (x ) = d = min L(x, λ, ν ) L(x, λ, ν ) f (x ) = p x We have: x arg min x L(x, λ, ν ). λ i g i(x ) = 0 for i = 1,..., m (complementary slackness). Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 13 / 26

Karush-Kuhn-Tucker condition For a differentiable g(x) and h(x), the KKT conditions are: g i (x ) 0, h i (x ) = 0, λ i 0, λ i g i (x ) = 0, primal feasibility dual feasibility complementary slackness L(x, λ, ν ) x=x = 0 Lagrangian stationarity x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 14 / 26

Karush-Kuhn-Tucker condition For a differentiable g(x) and h(x), the KKT conditions are: g i (x ) 0, h i (x ) = 0, λ i 0, λ i g i (x ) = 0, primal feasibility dual feasibility complementary slackness L(x, λ, ν ) x=x = 0 Lagrangian stationarity x If ˆx, ˆλ, ˆν satify the KKT for a convex problem, they are optimal. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 14 / 26

Support Vector Machines Primal problem (hard constraint): 1 min w 2 w 2 2 subject to y i x i, w 1, i = 1,..., n Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 15 / 26

Support Vector Machines Primal problem (hard constraint): 1 min w 2 w 2 2 subject to y i x i, w 1, i = 1,..., n Lagrangian: L(w, λ) = 1 n 2 w 2 2 λ i (y i x i, w 1) i=1 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 15 / 26

Support Vector Machines Primal problem (hard constraint): 1 min w 2 w 2 2 subject to y i x i, w 1, i = 1,..., n Lagrangian: L(w, λ) = 1 n 2 w 2 2 λ i (y i x i, w 1) i=1 Minimizing with respect to w, we have: L(w,λ) w = 0 w n i=1 λ iy i x i = 0 n w = λ i y i x i i=1 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 15 / 26

Plug this back into the Lagrangian: Support Vector Machines L(λ) = n λ i 1 2 1=1 n n y i y j λ i λ j xi i=1 j=1 x j Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 16 / 26

Plug this back into the Lagrangian: Support Vector Machines L(λ) = Lagrange dual problem is: n λ i 1 2 1=1 n n y i y j λ i λ j xi i=1 j=1 x j max λ subject to n λ i 1 2 1=1 n n y i y j λ i λ j xi i=1 j=1 λ i 0, i = 1,..., n n λ i y i = 0 i=1 x j Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 16 / 26

Plug this back into the Lagrangian: Support Vector Machines L(λ) = Lagrange dual problem is: n λ i 1 2 1=1 n n y i y j λ i λ j xi i=1 j=1 x j max λ subject to n λ i 1 2 1=1 n n y i y j λ i λ j xi i=1 j=1 λ i 0, i = 1,..., n n λ i y i = 0 i=1 x j Since this problem only depends on xi x j, we can use kernels and learn in high dimensional space without having to explicitly represent φ(x). Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 16 / 26

Support Vector Machines Primal problem (soft constraint): min w 1 2 w 2 2 + C subject to n i=1 y i x i, w 1 ξ i, i = 1,..., n ξ i ξ i 0, i = 1..., n Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 17 / 26

Support Vector Machines Lagrange dual problem for the soft constraint: max λ n λ i 1 n n y i y j λ i λ j xi x j (1) 2 1=1 i=1 j=1 subject to 0 λ i C, i = 1,..., n (2) n λ i y i = 0 (3) i=1 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 18 / 26

Support Vector Machines Lagrange dual problem for the soft constraint: max λ n λ i 1 n n y i y j λ i λ j xi x j (1) 2 1=1 i=1 j=1 subject to 0 λ i C, i = 1,..., n (2) n λ i y i = 0 (3) i=1 KKT conditions, for all i: λ i = 0 y i x i, w 1 (4) 0 < λ i < C y i x i, w = 1 (5) λ i = C y i x i, w 1 (6) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 18 / 26

Sequential Minimal Optimization (Platt, 1998) An efficient way to solve SVM dual problem. Break a large QP program into a series of smallest possible QP problems. Solve these small subproblems analytically. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 19 / 26

Sequential Minimal Optimization (Platt, 1998) An efficient way to solve SVM dual problem. Break a large QP program into a series of smallest possible QP problems. Solve these small subproblems analytically. In a nutshell Choose two Lagrange multipliers λ i and λ j. Optimize the dual problem with respect to these two Lagrange multipliers, holding others fixed. Repeat until convergence. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 19 / 26

Sequential Minimal Optimization (Platt, 1998) An efficient way to solve SVM dual problem. Break a large QP program into a series of smallest possible QP problems. Solve these small subproblems analytically. In a nutshell Choose two Lagrange multipliers λ i and λ j. Optimize the dual problem with respect to these two Lagrange multipliers, holding others fixed. Repeat until convergence. There are heuristics to choose Lagrange multipliers that maximizes the step size towards the global maximum. The first one is chosen from examples that violate the KKT condition. The second one is chosen using approximation based on absolute difference in error values (see Platt (1998)). Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 19 / 26

Sequential Minimal Optimization (Platt, 1998) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 20 / 26

Sequential Minimal Optimization (Platt, 1998) For any two Lagrange multipliers, the constraints are:: 0 < λ i, λ j < C (7) y i λ i + y j λ j = k i,j y kλ k = γ (8) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 21 / 26

Sequential Minimal Optimization (Platt, 1998) For any two Lagrange multipliers, the constraints are:: 0 < λ i, λ j < C (7) y i λ i + y j λ j = k i,j y kλ k = γ (8) Express λ i in terms of λ j λ i = γ λ jy j y i Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 21 / 26

Sequential Minimal Optimization (Platt, 1998) For any two Lagrange multipliers, the constraints are:: 0 < λ i, λ j < C (7) y i λ i + y j λ j = k i,j y kλ k = γ (8) Express λ i in terms of λ j λ i = γ λ jy j y i Plug this back into our original function. We are then left with a very simple quadratic problem with one variable λ j Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 21 / 26

Sequential Minimal Optimization (Platt, 1998) Solve for the second Lagrange multiplier λ j. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 22 / 26

Sequential Minimal Optimization (Platt, 1998) Solve for the second Lagrange multiplier λ j. If y i y j, the following bounds apply to λ j : L H = max(0, λ t 1 j = min(c, C + λ t 1 j λ y 1 i ) (9) λ y 1 i ) (10) If y i = y j, the following bounds apply to λ j : L H = max(0, λ t 1 j = min(c, λ t 1 j + λ y 1 i C) (11) + λ y 1 i ) (12) Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 22 / 26

Sequential Minimal Optimization (Platt, 1998) Solve for the second Lagrange multiplier λ j. If y i y j, the following bounds apply to λ j : L H = max(0, λ t 1 j = min(c, C + λ t 1 j λ y 1 i ) (9) λ y 1 i ) (10) If y i = y j, the following bounds apply to λ j : L H = max(0, λ t 1 j = min(c, λ t 1 j + λ y 1 i C) (11) + λ y 1 i ) (12) The solution is: H λ j = λ j L ifλ j > H ifl λ j H ifλ j < L Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 22 / 26

Fenchel duality If a convex conjugate of f (x) is known, the dual function can be easily derived. The convex conjugate of a function f is: f (y) = max y, x f (x) (13) x Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 23 / 26

Fenchel duality If a convex conjugate of f (x) is known, the dual function can be easily derived. The convex conjugate of a function f is: For a generic problem f (y) = max y, x f (x) (13) x min x subject to f (x) Ax b Cx = d The dual function is: f ( A λ C ν) b λ d ν Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 23 / 26

Fenchel duality If a convex conjugate of f (x) is known, the dual function can be easily derived. The convex conjugate of a function f is: For a generic problem f (y) = max y, x f (x) (13) x min x subject to f (x) Ax b Cx = d The dual function is: f ( A λ C ν) b λ d ν There are many functions whose conjugate are easy to compute: Exponential Logistic Quadratic form Log determinant... Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 23 / 26

Parting notes Dual formulation is useful. Give new insights into our problem, Allow us to develop better optimization methods and use kernel tricks. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 24 / 26

Thank you! Questions? Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 25 / 26

References I Some slides are from an upcoming EACL 2014 tutorial with Andre F. T. Martins, Noah A. Smith, and Mario F. T. Figueiredo Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press. Platt, J. (1998). Fast training of support vector machines using sequential minimal optimization. In Scholkopf, B., Burges, C., and Smola, A., editors, Advances in Kernel Methods - Support Vector Learning. MIT Press. Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12, 2014 26 / 26