CS711008Z Algorithm Design and Analysis

Size: px
Start display at page:

Download "CS711008Z Algorithm Design and Analysis"

Transcription

1 .. CS711008Z Algorithm Design and Analysis Lecture 9. Lagrangian duality and SVM Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 136

2 . Outline Classification problem and maximum margin strategy; Solving maximum margin problem using Lagrangian duality; SMO technique; Kernel tricks; 2 / 136

3 ..Classification problem and maximum margin strategy 3 / 136

4 . Classification problem Given a set of samples with their category labels (denoted as (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), y i { 1, +1}, the goal of classification problem is to find an appropriate function f(x) that can describe the dependency between y i and x i ; thus, for a new sample x, we can infer its category based on f(x ). A great variety of classification algorithms have been designed, including Fisher s linear discriminant, logistic regression, decision tree, neural network and SVM. 4 / 136

5 . Linear classifier Unlike decision tree, SVM adopts the classifier with the following type: If f(x) > 0 then y = +1; If f(x) < 0 then y = 1; Let s first restrict the f(x) to be linear, i.e. f(x) = ω T x + b The hyperplane ω T x + b = 0 is denoted as separating hyperplane. 5 / 136

6 The objective of training procedure is to find an appropriate setting of ω and b such that all samples in the training set can be correctly labelled using the classifier. We will consider the torlerance of several mislabelled samples later. 6 / 136

7 . Maximum margin strategy There are always multiple settings of ω and b that the corresponding classifier works perfectly on all samples. Which one should we use? We prefer the one such that the margin between positive and negative samples is maximized: The wider the margin is, the larger the generality performance on new samples. Thus, we needs to solve the following optimization problem: min w,b 2 ω s.t. y i (w x i + b) 1 0 i = 1, 2,, n Note: The restriction f(x) > 0 for positive sample x is implemented as f(x) = 1. The distance for any point x to the hyperplane ω T x + b = 0 is ω T x+b 2 ω. Thus, the margin is: ω. 7 / 136

8 . An equivalent form with quadratic objective function An equivalent form is: min w,b 1 2 w 2 s.t. y i (w x i + b) 1 0 i = 1, 2,, n Question: how to solve this optimization problem subject to inequality constraints? Of course we solve the problem (called primal problem hereafter) directly using convex quadratic programming techniques; however, consider its dual problem will bring great benefits. Let s review the conditions of the optimal solution first. 8 / 136

9 . Lagrangian multiplier method, KKT conditions and Lagrangian. dual 9 / 136

10 . A brief introduction to Lagrangian multiplier and KKT conditions It is relatively easy to optimize an objective function without any constraint, say: min f(x) But how to optimize an objective function with equality constraints? min f(x) s.t. g i (x) = 0 i = 1, 2,..., m And how to optimize an objective function with inequality constraints? min f(x) s.t. g i (x) = 0 i = 1, 2,..., m h i (x) 0 i = 1, 2,..., p 10 / 136

11 . Lagrangian multiplier: under equality constraints Consider the following optimization problem: min f(x, y) s.t. g(x, y) = 0 Intuition: suppose (x, y ) is the optimum point. Thus at (x, y ), f(x, y) does not change when we walk along the curve g(x, y) = 0; otherwise, we can follow the curve to make f(x, y) smaller, meaning that the starting point (x, y ) is not optimum. g(x, y) g(x, y) = 0 x f(x, y). f(x, y) = 1 11 / 136

12 g(x, y) g(x, y) = 0 x f(x, y). f(x, y) = 1 f(x, y) = 2 So at (x, y ), the red line tangentially touches a blue contour, i.e. there exists a λ such that: f(x, y) = λ g(x, y) Lagrange must have cleverly noticed that the equation above looks like partial derivatives of some larger scalar function: L(x, y, λ) = f(x, y) λg(x, y) Necessary conditions of optimum point: If (x, y ) is local optimum, then there exists λ such that L(x, y, λ) = / 136

13 . Understanding Lagrangian function Lagrangian function: a combination of the original optimization objective function and constraints: Note: L(x,y,λ) λ g(x, y) = 0. L(x, y, λ) = f(x, y) λg(x, y) = 0 is essentially the equality constraint The critical point of Lagrangian L(x, y, λ) occurs at saddle points rather than local minima (or maxima). To utilize numerical optimization techniques, we must first transform the problem such that the critical points lie at local minima. This is done by calculating the magnitude of the gradient of Lagrangian. 13 / 136

14 . KKT conditions: under inequality constraints Consider the following optimization problem: min f(x) s.t. g i (x) 0 i = 1, 2,...m g(x) 0 g(x) 0 x. f(x) = 1 f(x) = 2 x. f(x) = 1 f(x) = 2 Figure: Case 1: the optimum point lies in the curve g(x) = 0. Thus Lagrangian condition L(x, λ) = 0 applies. Case 2: the optimum point lies within the region g(x) 0; thus we have f(x) = 0 at x. 14 / 136

15 . Understanding complementary slackness g(x) 0 g(x) 0 x. f(x) = 1 f(x) = 2 x. f(x) = 1 f(x) = 2 Figure: Case 1: the optimum point lies in the curve g(x) = 0. Thus Lagrangian condition L(x, λ) = 0 applies. Case 2: the optimum point lies within the region g(x) 0; thus, we have f(x) = 0 at x These two cases were summarized as the following two conditions: (Stationary point) L(x, λ) = 0 (Complementary slackness) λ i g i (x ) = 0, i = 1, 2,..., m (Note that in case 2, g(x )<0 λ = 0 f(x ) = 0.) 15 / 136

16 . KKT conditions Lagrangian: L(x, λ) = f(x) m λ i g i (x) Necessary conditions of optimum point: If x is local optimum satisfying some regularity conditions, then there exists λ such that:.1 (Stationary point) L(x, λ) = 0.2 (Primal feasibility) g(x ) 0.3 (Dual feasibility) λ i 0, i = 1, 2,..., m.4 (Complementary slackness) λ i g i (x ) = 0, i = 1, 2,..., m i=1 1 Note: KKT conditions are usually not solved directly in optimization; instead, iterative successive approximation is most often used. However, the final results should meet KKT conditions. 1 KKT conditions were named after William Karush, Harold W. Kuhn, and. Albert. W. Tucker / 136

17 . Three strategies to solve an optimization problem under inequality constraints Solve the primal problem subject to primal feasibility; Solve the dual problem subject to the dual feasibility; Interior-point method: Complementary slackness, also called orthogonality by Gomory, essentially equals to the strong duality, i.e., the primal and dual problems have identical objective value. A relaxation of this condition, i.e., λ i g i (x ) = µ, is used in the interior point method to reflect duality gap. 17 / 136

18 . Ạpplying Lagrangian dual to the maximum margin problem 18 / 136

19 . An example Primal problem: Lagrangian: min x s.t. x 2 x 0 L(x, y, z) = x y (x 2) z x = 2y + x (1 y z) Notice that when y 0, z 0 and x 2, L(x, y, z) is a lower bound of the primal objective function x. Lagrangian dual function: g(y, z) = inf x L(x, y, z) = Dual problem: { 2y if 1 y z = 0 max 2y s.t. y 1 y 0 otherwise 19 / 136

20 . Lagrangian connecting primal and dual Primal: c^t x = x L(x,y) Dual: g(y) = 2y y x Observation: Primal objective function x Lagrangian Dual objective function y in the feasible region. 20 / 136

21 . Lagrangian dual explanation of LP duality Consider a LP model: min c T x s.t. Ax b x 0 Lagrangian: L(x, λ, s) = c T x m λ i(a i1 x a in x n b i ) m s ix i i=1 i=1 Notice that Lagrangian is a lower bound of the primal objective function, i.e. c T x L(x, λ, s), when λ 0, s 0 and x is feasible. Furthermore we have c T x L(x, λ) inf L(x, λ) x when λ 0, s 0 and x is feasible. Denote Lagrangian dual g(λ, s) = inf x L(x, λ, s). The above inequality can be rewritten as: c T x L(x, λ, s) g(λ, s) 21 / 136

22 . Lagrangian dual explanation of LP duality cont d What is the Lagrangian dual g(λ, s)? g(λ, s) = inf x L(x, λ, s) = inf x (ct x m λ i (a i1 x a in x n b i ) i=1 = inf x (λt b + (c T λ T A s T )x) { λ T b if c T = λ T A + s T = otherwise Thus λ T b is a lower bound of f(x) when c T = λ T A + s T and x 0. Note g(λ) is always concave even if the constraints are not convex. m s i x i ) i=1 22 / 136

23 . Find a tight bound Now let s try to find the best lower bound of f(x). Thus the tight lower bound max g(λ) can be described as: Notes: max λ T b s.t. λ T A c T λ 0.1 This is actually the Dual form of LP if replacing λ by y; thus, we have another explanation of Dual variables y the Lagrangian multiplier..2 Lagrangian dual is a lower bound of the primal optimum under some conditions. In addition, Lagrangian dual is concave; thus, the dual problem is always a convex programming problem even if the primal problem is not a convex programming problem. 23 / 136

24 . Lagrangian dual explanation of maximum margin problem Primal problem: min w,b 1 2 w 2 s.t. y i (w T x i + b) 1, i {1,..., n} Lagrangian: L(w, b, α) = 1 2 w 2 n i=1 α iy i (w x i + b) + n i=1 α i Notice that Lagrangian is a lower bound of the primal objective function, i.e. 1 2 w 2 L(w, b, α), when α 0 and w, b is feasible. Furthermore we have 1 2 w 2 L(w, b, α) inf L(w, b, α) w,b when α 0 and w, b is feasible. Denote Lagrangian dual g(α) = inf w,b L(w, b, α). The above inequality can be rewritten as: 1 2 w 2 L(w, b, α) g(α) 24 / 136

25 . Lagrangian dual explanation of maximum margin duality cont d What is the Lagrangian dual g(α)? g(α) = inf L(w, b, α) w,b To calculate the inferior bound of L(w, b, α), we set its derivates to be 0, i.e., L(w, b, α) w n = w α i y i x i = 0 i=1 L(w, b, α) b = n α i y i = 0 i=1 25 / 136

26 . Lagrangian dual function Lagrangian dual function: g(α) = 1 2 n n α i α j y i y j (x T i x j ) + n i=1 j=1 i=1 α i Thus g(α) is a lower bound of 1 2 w 2 and α 0. when n i=1 α iy i = 0 26 / 136

27 . Lagrangian dual problem Now let s try to find the tightest lower bound of 1 2 w 2, which can be calculated by solving the following Lagrangian dual problem: max 1 n n 2 i=1 j=1 α iα j y i y j (x T i x j) + n i=1 α i s.t. n i=1 α iy i = 0 α 0 27 / 136

28 . Two explanations of dual variables y: L. Kantorovich vs. T. Koopmans.1 Price interpretation: constrained optimization plays an important role in economics. Dual variables are also called as shadow price (by T. Koopmans), i.e. the instantaneous change in the optimization objective function when constraints are relaxed, or marginal cost when strengthening constraints..2 Lagrangian multiplier: the effect of constraints on the objective function (by L. Kantorovich). For example, when b i increase to b i + b i, how much the objective function value will change. In fact, we have L(x,λ) b i = λ i. 28 / 136

29 . Explanation of dual variables y: using Diet as an example Optimal solution to primal problem with b 1 = 2000, b 2 = 55, b 3 = 800: x = (14.24, 2.70, 0, 0), c T x = Optimal solution to dual problem: y = (0.0269, 0, ), y T b = Let s make a slight change on b, and watch the effect on max c T x..1 b 1 = 2001: max c T x = (Note that.2 y 1 = = ) b 2 = 56: max c T x = (Note that.3 y 2 = 0 = ) b 3 = 801: max c T x = (Note that y 3 = = ) (See extra slides) 29 / 136

30 . Ẉeak duality, strong duality, and Slater conditions 30 / 136

31 . Weak duality and strong duality Sometimes the optimal solution of the primal problem doesn t has identical value to that of the dual problem. The difference between them is denoted as duality gap. This case is denoted as weak duality. The opposite case is denoted as strong duality. Conditions under which strong duality holds are denoted as constraint qualifications. 31 / 136

32 . Slater s conditions for strong duality The best known sufficient condition of strong duality is: Primal problem is convex programming, i.e., min f(x) s.t. g i (x) 0 i = 1, 2,...m Ax = b where f(x) and g i (x) are convex. Slater s condition holds, i,e. there exists some strictly feasible point x relint(d) such that and Ax = b g i (b)<0, i = 1, 2,..., m Here D represents the feasible domain of the primal problem, and relint(d) denotes the relative interior of D. 32 / 136

33 . Slater s conditions for strong duality (cont d) The Slater s conditions are trivial if the constraints of the primal problem are affine, i.e., Consider the following convex programming problem: min f(x) s.t. g i (x) 0 i = 1, 2,...m Ax = b where f(x) and g i (x) are convex. Slater s condition: If g i (x) are affine, the inequalities are no longer required to be strict and thus reduce to the original form, i.e. there exists some strictly feasible point x relint(d) such that Ax = b and g i (b) 0, i = 1, 2,..., m 33 / 136

34 . Strong duality in SVM For the maximum margin problem, its dual problem has the identical optimal objective function value. 34 / 136

35 . Ḳernel method to overcome linear insperability 35 / 136

36 . Linearly inseperable Some samples are linearly insuperable. How should we do? Kernel method: If the samples are linearly inseperable in the original input space, let s map them into a higher-dimensional space (called feature space) and make them linearly separable in that space. In other words, nonlinearity was introduced into the original input space through using the map. 36 / 136

37 . An example: XOR problem Let s define a map: ϕ : R 2 R 3 as below: [x (1), x (2) ] Φ([x (1), x (2) ]) = [x (1), x (2), x (1) x (2) ] These samples are linearly seperable after mapping into R 3 using Φ(x (1), x (2) ) 37 / 136

38 . Replace the inner product in the optimization problem for SVM Remember that in the optimization problem for SVM, the objective function contains inner product x T x: max 1 N N 2 i=1 j=1 α iα j y i y j (x T i x j) + N i=1 α i s.t. N i=1 α iy i = 0 α 0 After mapping x into Φ(x), the inner product x T x changes into Φ(x) T Φ(x), and the optimization problem changes accordingly: max 1 N N 2 i=1 j=1 α iα j y i y j (Φ(x i ) T Φ(x j )) + N i=1 α i s.t. N i=1 α iy i = 0 α 0 Note we simply replace this inner product with Φ(x) T Φ(x) without even knowing the map Φ(x). 38 / 136

39 . The seperating hyperplane Remember that in the optimization problem for SVM, the seperating hyperplane is: f(x) = ω T x + b = i,j α i y i x T i x = 0 After mapping x into Φ(x), the sporting hyperplane changes into: f(x) = ω T Φ(x) + b = i,j α i y i Φ(x) T i Φ(x) = 0 Note that Φ(x) T i Φ(x) = k(φ(x) i, Φ(x)). 39 / 136

40 . Ḍeriving kernel function from map 40 / 136

41 . Example 1 Consider the map Φ : R 2 R 4 : [x (1), x (2) ] Φ([x (1), x (2) ]) = [x (1)2, x (2)2, x (1) x (2), x (1) x (2) ] The corresponding kernel function is: k(x, z) = x (1)2 z (1)2 +x (2)2 z (2)2 +2x (1) x (2) z (1) z (2) =< x, z > 2 R 2 41 / 136

42 . Example 2 Consider the map Φ : R 3 R 13 : [x (1), x (2), x (3) ] Φ([x (1), x (2), x (3) ]) = [x (1)2, x (1) x (2),..., x (3)2, 2c The corresponding kernel function is: k(x, z) = x T z + c 42 / 136

43 43 / 136

44 . The power of polynomial kernel We can map into a d-dimensional feature space. 44 / 136

45 . The power of polynomial kernel 45 / 136

46 . Ṭhe requirement of kernel funcitons 46 / 136

47 . Definition of kernel function A function k : X X R is a kernel function if: k is symmetric: k(x, z) = k(z, x). For any m N and any x 1, x 2,..., x m mathcalx, the matrix K defined by K i,j = k(x i, x j ) is positive semidefinite. 47 / 136

48 . Question: Could we map from input space into a feature space. with infinite dimension? 48 / 136

49 . Map into an infinite dimensional feature space 49 / 136

50 Let s consider the feature space where all elements are functions. The span of this space is Φ(x i ) = k(., x i ). Thus, any element can be represented as: For two elements we define inner product as: 50 / 136

51 . Reproducing kernel Hilbert space Now we find a map such that 51 / 136

52 52 / 136

53 . Theorem (Farkas lemma). Given vectors a 1, a 2,..., a m, c R n. Then either..1 c C(a 1, a 2,..., a m ); or.2 there is a vector y R n such that for all i, y T a i 0 but y T c < 0.. y a 2 c C(a 1, a 2 ) y a 1. y a 2 C(a 1, a 2 ) a 1 c Figure: Case 1: c C(a 1, a 2 ) Figure: Case 2: c / C(a 1, a 2 ) Here, C(a 1,..., a m ) denotes the cone spanned by a 1,..., a m, i.e. C(a 1,..., a m ) = {x x = m i=1 λ ia i, λ i 0}. 53 / 136

54 . Proof.. Suppose for any vector y R n, y T a i 0 (i = 1, 2,..., m), we always have y T c 0. We will show that c should lie within the cone C(a 1, a 2,..., a m ). Consider the following Primal problem: min c T y s.t. a T i y 0 i = 1, 2,..., m It is obvious that the Primal problem has a feasible solution y = 0, and is bounded since c T y 0. Thus the Dual problem also has a bounded optimal solution: max 0 s.t. x T A T = c T x 0 In other words, there exists a vector x such that c = m i=1 x ia i and x i / 136

55 . Variants of Farkas lemma Farkas lemma lies at the core of linear optimization. Using Farkas lemma, we can prove Separation theorem, and MiniMax theorem in the game theory.. Theorem. Let A be an m n matrix, and b R m. Then either..1 Ax = b, x 0 has a feasible solution; or.2 there is a vector y R m such that y T A 0 but y T b < / 136

56 . Variants of Farkas lemma. Theorem. Let A be an m n matrix, and b R m. Then either..1 Ax b has a feasible solution; or.2 there is a vector y R m such that y 0, y T A 0 but y T b < / 136

57 . Caratheodory s theorem. Theorem. Given vectors a 1, a 2,..., a m R n. If x C(a 1, a 2,..., a m ), then there is a linearly independent vector set of a 1, a 2,..., a m, say a. 1, a 2,..., a r, such that x C(a 1, a 2,..., a r ). 57 / 136

58 . Separation theorem. Theorem. Let C R n be a closed, convex set, and let x R n. If x / C,. then there exists a hyperplane separating x from C. 58 / 136

59 . Ạpplication 2: von Neumann s MiniMax theorem on game theory 59 / 136

60 . Game theory Game theory studies competing and cooperative behaviours among intelligent and rational decision-makers. In 1928, John von Neumann proved the existence of mixed-strategy equilibria in two-person zero-sum games. In 1950, John Forbes Nash Jr. developed a criterion of mutual consistency of players strategies, which applies to a wider range of games than that proposed by J. von Neumann. He proved the existence of Nash equilibrium in every n-player, non-zero-sum, non-cooperative game (not just 2-player, zero-sum games). Game theory was widely applied in mathematical economics, in biology (e.g., analysis of evolution and stability) and computer science (e.g., analysis of interactive computations and lower bound on the complexity of randomized algorithms, the equivalence between linear program and two-person zero-sum game). 60 / 136

61 . Paper-rock-scissors: an example of two-player zero-sum game Paper-rock-scissors is a hand game usually played by two players, denoted as row player and column player: each player selects one of the three hand shapes, including paper, rock, and scissors ; then the players show their selections simultaneously. It has two possible outcomes other than tie: one player wins and the other player loses, which can be formally described using the following payoff matrix. Paper Rock Scissors Paper 0, 0 1, -1-1, 1 Rock -1, 1 0, 0 1, -1 Scissors 1, -1-1, 1 0, 0 Each player attempts to select appropriate action to maximize his gain. 61 / 136

62 . Matching penny: another example of two-person zero-sum game Matching pennies is a game played by two players, namely, row player and column player. Each player has a penny and secretly turns it to head or tail. The players then reveal their selections simultaneously. If the pennies match, then row player keeps both pennies; otherwise, column player keeps both. The payoff matrix is as follows. Head Tail Head 1, -1-1, 1 Tail -1, 1 1, -1 Each player tries to maximize his gain via making an appropriate selection. 62 / 136

63 . Simultaneous games vs. sequential games Simultaneous games are games in which all players move simultaneously. Thus, no player have information of the others selections in advance. Sequential games are games in which the later player has some information, although maybe imperfect, of previous actions by the other players. A complete plan of action for every stage of the game, regardless of whether the action actually arises in play, is denoted as a (pure) strategy. Normal form is used to describe simultaneous games while extensive form is used to describe sequential games. J. von Neumann proposed an approach to transform transform strategies in sequential games into actions in simultaneous games. Note that the transformation is one-way, i.e., multiple sequential games might correspond to the same simultaneous game, and it may result in an exponential blowup in the size of the representation. 63 / 136

64 . Normal form A game Γ in normal form among m players contains the following items: Each player k has a finite number of pure strategies S k = {1, 2,..., n k }. Each player k is associated with a payoff function H k : S 1 S 2... S m R. To play the game, each player selects a strategy without information of others, and then reveals the selection simultaneously. The players gain are calculated using corresponding payoff functions. Each player attempts to maximize his gain via selecting an appropriate strategy. 64 / 136

65 . Two-person zero-sum game in normal form In a two-person zero-sum game game Γ, a player s gain or less is exactly balanced by the other player s loss or gain, i.e., H 1 (s 1, s 2 ) + H 2 (s 1, s 2 ) = 0. Thus we can define another function H(s 1, s 2 ) = H 1 (s 1, s 2 ) = H 2 (s 1, s 2 ) and represent it using a payoff matrix. Head Tail Head 1-1 Tail -1 1 Row player aims to maximize H(s 1, s 2 ) by selecting an appropriate strategy s 1 while column player aims to minimize H(s 1, s 2 ) by selecting an appropriate strategy s / 136

66 . von Neumann s MiniMax theorem: motivation When analyzing a two-person zero-sum game Γ, von Neumann noticed that the difficulty comes from the difference between games and ordinary optimization problems: row player tries to maximize H(s 1, s 2 ); however, he can control s 1 only as he has no information of the other player s selection s 2, and so does column player. Thus von Neumann suggested to investigate two auxiliary games without this difficulty, denoted as Γ 1 and Γ 2, before attacking the challenging game Γ..1 Γ 1 : Row player selects a strategy s 1 first, and exposes his selection to column player before column player selects a strategy s 2..2 Γ 2 : Column player selects a strategy s 2 first, and exposes his selection to row player before row player selects a strategy s 1. The two auxiliary games are much easier than the original game Γ, and more importantly, they provide upper and lower bounds for Γ. 66 / 136

67 . Auxiliary game Γ 1 Let s consider column player first. As he knows row player s selection s 1, the objective function H(s 1, s 2 ) becomes an ordinary optimization function over a single variable s 2, and column player can simply select a strategy s 2 with the minimum objective function value min s2 H(s 1, s 2 ). Head Tail Row minimum Head Tail -1 2 v 1 = 1 Now consider row player. When he selects a strategy s 1, he can definitely predict the selection of column player. Since min s2 H(s 1, s 2 ) is an ordinary function over a single s 1, it is easy for row player to select a strategy s 1 with the maximum objective function value v 1 = max min H(s 1, s 2 ). s 1 s 2 67 / 136

68 . Auxiliary game Γ 2 Let s consider row player first. As he knows column player s selection s 2, the objective function H(s 1, s 2 ) becomes an ordinary optimization function over a single variable s 1, and row player can simply select a strategy s 1 with the maximum objective function value max s1 H(s 1, s 2 ). Head Tail Head -2 1 Tail -1 2 Column maximum v 2 = 1 2 Now consider column player. When he selects a strategy s 2, he can definitely predict the selection of row player. Since max s1 H(s 1, s 2 ) is an ordinary function over a single variable s 2, it is easy for column player to select a strategy s 2 with the minimum objective function value v 2 = min max H(s 1, s 2 ). s 2 s 1 68 / 136

69 . Γ 1 and Γ 2 bound Γ For row player, it is clearly Γ 1 is disadvantageous to him as he should expose his selection s 1 to column player. On the contrary, Γ 2 is beneficial to row player as he knows column player s selection s 2 before making decision. Head Tail Row minimum Head Tail -1 2 v 1 = 1 Column maximum v 2 = 1 2 Thus these two auxiliary games provides lower and upper bounds: v 1 v v 2 where v denote row player s gain in the original game Γ. 69 / 136

70 . Case 1: v 1 = v 2 For a game with the following payoff matrix, we have v 1 = v = v 2 and call this game strictly determined. Head Tail Row minimum Head Tail -1 2 v 1 = 1 Column maximum v 2 = 1 2 The saddle point of the payoff matrix H(s 1, s 2 ) represents a pure strategy equilibrium. In this equilibrium, each player has nothing to gain by changing only his own strategy. In addition, knowing the opponent s selection will bring no gain. von Neumann proved the existence of the optimal strategy in a perfect information two-person zero-sum game, e.g., chess. L. S. Shapley further showed that a finite two-person zero-sum game has a pure strategy equilibrium if every 2 2 submatrix of the game has a pure strategy equilibrium [?]. 70 / 136

71 . Case 2: v 1 < v 2 In contrast, matching penny does not have a pure strategy equilibrium as there is no saddle point in the payoff matrix. So does the paper-rock-scissors game. Head Tail Row minimum Head Tail -1 1 v 1 = 1 Column maximum v 2 = 1 1 This fact implies that knowing the opponent s selection might bring gain; however, it is impossible to know the opponent s selection as the players reveal their selections simultaneously. In this case, let s play a mixed strategy rather than a pure strategy. 71 / 136

72 . From pure strategy to mixed strategy A mixed strategy is an assignment of probability to pure strategies, allowing a player to randomly select a pure strategy. Consider the payoff matrix as below. If the row player select strategy A with probability 1, he is said to play a pure strategy. If he tosses a coin and select strategy A if the coin lands head and B otherwise, then he is said to play a mixed strategy. A B A 1-1 B / 136

73 . Two types of interpretation of mixed strategy From a player s viewpoint: J. von Neumann described the motivation underlying the introduction of mixed strategy as follows: since it is impossible to exactly know opponent s selection, a player could switch to protect himself by randomly selecting his own strategy, making it difficult for the opponent to know the player s selection. However, this interpretation came under heavy fire for lacking of behaviour supports: Seldom do people make choices following a lottery. From opponent s viewpoint: Robert Aumann and Adam Brandenburger interpreted mixed strategy of a player as opponent s belief of the player s selection. Thus, Nash equilibrium is an equilibrium of belief rather than actions. 73 / 136

74 . Existence of mixed strategy equilibrium Consider a mixed strategy game: row player has m strategies available and he selects a strategy s 1 according to a distribution u, while column player has n strategies available and he selects a strategy s 2 according to a distribution v, i.e., Pr(s 1 = i) = u i, i = 1,..., n Here, u and v are independent. Thus the expected gain of row player is: m i=1 n Pr(s 2 = j) = v j, j = 1,..., m j=1 u ih ij v j = u T Hv row player attempts to minimize u T Hv via selecting an appropriate u, while column player attempts to maximize it via selecting an appropriate v. Now let s consider the two auxiliary games Γ 1 and Γ 2 again and answer the following questions: what happens if row player exposes his mixed strategy to column player? And if we reverse the order of the players? 74 / 136

75 . von Neumann s MiniMax theorem [1928] This question has been answered by the von Neumann s MiniMax theorem.. Theorem.. max u min v ut Hv = min v max u ut Hv The theorem states that knowing the other player s strategy will bring no gain in a mixed-strategy zero-sum game, and the order doesn t change the value. A mixed-strategy Nash equilibrium exists for any two-person zero-sum game with a finite set of actions. A Nash equilibrium in a two-player game is a pair of strategies, each of which is a best response to the other; i.e., each gives the player using it the highest possible payoff, given the other players strategy. 75 / 136

76 . von Neumann s MiniMax theorem: proof Let s consider the auxiliary game Γ 1 first, in which the strategy of row player, i.e., u, was exposed to column player. This is of course beneficial to column player since he can select the optimal strategy v to minimize u T Hv, which is inf{u T Hv v 0, 1 T v = 1} = min j=1,...,n (ut H) j Thus row player should select u to maximize the above value, which can be formulated as a linear program: max min j=1,...,n (ut H) j s.t. 1 T u = 1 u 0 76 / 136

77 . von Neumann s MiniMax theorem: proof The linear program can be rewritten as below. max s s.t. u T H s1 T 1 T u = 1 u 0 Similarly we consider the auxiliary game Γ 2 and calculate the optimal strategy v by solving the following linear program. min t s.t. Hv t1 1 T v = 1 v 0 These two linear programs are both feasible and form Lagrangian dual. Thus they have the same optimal objective value according to the strong duality property. 77 / 136

78 . An example: paper-rock-scissors game For the paper-rock-scissors game, we have the following two linear programs. Linear program for Γ 1 : Linear program for Γ 2 : max s s.t. 0u 1 u 2 + u 3 s u 1 + 0u 2 u 3 s u 1 + u 2 + 0u 3 s u 1 + u 2 + u 3 = 1 u 1, u 2, u 3 0 min t s.t. 0v 1 + v 2 v 3 t v 1 + 0v 2 + v 3 t v 1 v 2 + 0v 3 t v 1 + v 2 + v 3 = 1 v 1, v 2, v 3 0 The mixed strategy equilibrium is u T = [ 1 3, 1 3, 1 3 ] and u T = [ 1 3, 1 3, 1 3 ] with the game value / 136

79 . Comments on the mixed strategy equilibrium by von Neumann Note that a mixed strategy equilibrium always exists no matter whether the payoff matrix H has a saddle point or not. Regardless of column player s selection, row player can select an appropriate strategy to guarantee his gain v 1 0. Regardless of row player s selection, column player can select an appropriate strategy to guarantee row player s gain v 1 0. Using the strategy u T = [ 1 3, 1 3, 1 3 ], row player can guarantee that he won t lose, i.e., the probability of losing is less than the probability of winning. The strategy u T = [ 1 3, 1 3, 1 3 ] is designed for protecting himself rather than attacking his opponent, i.e., it cannot be used to benefit from opponent s fault. 79 / 136

80 . Ạpplication 3: Yao s MiniMax principle [1977] 80 / 136

81 . Yao s MiniMax principle Consider a problem Π. Let A = {A 1, A 2,..., A n } be algorithms to Π, and I = {I 1, I 2,..., I m } be the inputs with a given size. Let T (A i, I j ) be the running time of algorithm A i on the input I j. Inputs Algorithms A 1 A 2 I 1 T 11 T 12 I 2 T 21 T 22 Thus max T (A i, I j ) represents the worst-case time for the I j I deterministic algorithm A i. For a randomized algorithms, however, it is usually difficult to bound its expected running time on worst-case inputs. Yao s MiniMax principle provides a technique to build lower bound for the expected running time of any randomized algorithm on its worst-case input. 81 / 136

82 . Expected running time of a randomized algorithm A q A Las Vegas randomized algorithm can be viewed as a distribution over all deterministic algorithms A = {A 1, A 2,..., A n }. Specifically, let q be a distribution over A, and A q be a randomized algorithm chosen according to q, i.e., A q refers to a deterministic algorithm A i with probability q i. Given a input I j, the expected running time of A q can be written as n E[T (A q, I j )] = q i T (A i, I j ) Thus max I j I E[T (A q, I j )] represents the expected running time of A q on its worst-case input. i=1 82 / 136

83 . Expected running time of a deterministic algorithm A i on random input Now consider a deterministic algorithm A i running on random input. Let p be a distribution over I, and I p be a random input chosen from I, i.e., I p refers to I j with probability p j. Given a deterministic algorithm A i, its expected running time on random input I p can be written as E[T (A i, I p )] = m p j T (A i, I j ) j=1 Thus min A i A E[T (A i, I p )] represents the expected running time of the best deterministic algorithm on the random input I p. 83 / 136

84 . Yao s MiniMax principle. Theorem. For any random input I p and randomized algorithm A q,. min E[T (A i, I p )] max E[T (A q, I j )] A i A I j I To establish a lower bound for the expected running time of a randomized algorithm on its worst-case input, it suffices to find an appropriate distribution over inputs and prove that on this random input, no deterministic algorithm can do better than the randomized one. The power of this technique lies at the fact that one can choose any distribution over inputs and the lower bound is constructed based on deterministic algorithms. 84 / 136

85 . Yao s MiniMax principle: proof. Proof... min E[T (A i, I p )] max min E[T (A i, I u )] (1) A i A u m A i A = max min E[T (A v, I u )] u m v n (2) = min max E[T (A v, I u )] v n u m (3) = min max v, I j )] v n I j I (4) max E[T (A q, I j )] (5) I j I Here, n denotes the set of n-dimensional probability vectors. Equation (3) follows by the von Neumann s MiniMax theorem. 85 / 136

86 . Ạpplication 4: Revisiting ShortestPath algorithm 86 / 136

87 . ShortestPath problem. INPUT: n cities, and a collection of roads. A road from city i to j has a distance d(i, j). Two specific cities: s and t.. OUTPUT: the shortest path from city s to t. s 5 u. 8 1 v 6 2 t 87 / 136

88 . ShorestPath problem: Primal problem s x 1 5 u x 2. 8 x 3 1 x 4 6 v x 5 2 t Primal problem: set variables for roads (Intuition: x i = 0/1 means whether edge i appears in the shortest path), and a constraint means that we enter a node through an edge and leaves it through another edge. min 5x 1 + 8x 2 + 1x 3 + 6x 4 + 2x 5 s.t. x 1 + x 2 = 1 vertex s x 4 x 5 = 1 vertex t x 1 + x 3 + x 4 = 0 vertex u x 2 x 3 + x 5 = 0 vertex v x 1, x 2, x 3, x 4, x 5 = 0/1 88 / 136

89 . ShorestPath problem: Primal problem s x 1 5 u x 2. 8 x 3 1 x 4 6 v x 5 2 t Primal problem: relax the 0/1 integer linear program into linear program by the totally uni-modular property. min 5x 1 + 8x 2 + 1x 3 + 6x 4 + 2x 5 s.t. x 1 + x 2 = 1 vertex s x 4 x 5 = 1 vertex t x 1 + x 3 + x 4 = 0 vertex u x 2 x 3 + x 5 = 0 vertex v x 1, x 2, x 3, x 4, x 5 0 x 1, x 2, x 3, x 4, x / 136

90 . ShortestPath problem: Dual problem height y s s y 5 u u. 8 1 y v y t v 6 2 t Dual problem: set variables for cities. (Intuition: y i means the height of city i; thus, y s y t denotes the height difference between s and t, providing a lower bound of the shortest path length.) max y s y t s.t. y s y u 5 x 1 : edge (s, u) y s y v 8 x 2 : edge (s, v) y u y v 1 x 3 : edge (u, v) y t + y u 6 x 4 : edge (u, t) y t + y v 2 x 5 : edge (v, t) 90 / 136

91 . Ḍual simplex method 91 / 136

92 . Revisiting Primal Simplex algorithm Consider the following Primal problem P: Primal simplex tabular: min x x 2 + 6x 3 s.t. x 1 + x 2 + x 3 4 x 1 2 x 3 3 3x 2 + x 3 6 x 1, x 2, x 3 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 -z= 0 c 1 = 1 c 2 =14 c 3 =6 c 4 =0 c 5 =0 c 6 =0 c 7 =0 x B1 = b 1 = x B2 = b 2 = x B3 = b 3 = x B4 = b 4 = Primal variables: x; Feasible: B 1 b 0. A basis B is called primal feasible if all elements in B 1 b (the first column except for z) are non-negative. 92 / 136

93 . Revisiting Primal Simplex algorithm cont d Now let s consider the Dual problem D: max 4y 1 + 2y 2 + 3y 3 + 6y 4 s.t. y 1 + y 2 1 y 1 + 3y 4 14 y 1 + y 3 + y 4 6 y 1, y 2, y 3, y 4 0 Primal simplex tabular: x 1 x 2 x 3 x 4 x 5 x 6 x 7 -z= 0 c 1 = 1 c 2 =14 c 3 =6 c 4 =0 c 5 =0 c 6 =0 c 7 =0 x B1 = b 1 = x B2 = b 2 = x B3 = b 3 = x B4 = b 4 = Dual variables: y T = c T B B 1 ; Feasible: y T A c T. A basis B is called dual feasible if all elements in c T = c c T B B 1 A = c T y T A (the first row except for z) are non-negative. 93 / 136

94 . Another view point of the Primal Simplex algorithm Thus another view point of the Primal Simplex algorithm can be described as:.1 Starting point: The Primal Simplex algorithm starts with a primal feasible solution (x B = B 1 b 0);.2 Maintenance: Throughout the process we maintain the primal feasibility and move towards the dual feasibility;.3 Stopping criteria: c T = c T c T B B 1 A 0, i.e., y T A c T. In other words, the iteration process ends when the basis is both primal feasible and dual feasible. 94 / 136

95 . Dual simplex works in just an opposite fashion Dual simplex:.1 Starting point: The Dual Simplex algorithm starts with a dual feasible solution (c T 0);.2 Maintenance: Throughout the process we maintain the dual feasibility and move towards the dual feasibility;.3 Stopping criteria: x B = B 1 b 0. In other words, the iteration process ends when the basis is both primal feasible and dual feasible. 95 / 136

96 . Primal simplex vs. Dual simplex Both primal simplex and dual simplex terminate at the same condition, i.e. the basis is both primal feasible and dual feasible. However, the final objective is achieved in totally opposite fashions the primal simplex method keeps the primal feasibility while the dual simplex method keeps the dual feasibility during the pivoting process. The primal simplex algorithm first selects an entering variable and then determines the leaving variable. In contrast, the dual simplex method does the opposite; it first selects a leaving variable and then determines an entering variable. 96 / 136

97 Dual simplex(b I, z, A, b, c) 1: //Dual simplex starts with a dual feasible basis. Here, B I contains the indices of the basic variables. 2: while TRUE do 3: if there is no index l (1 l m) has b l < 0 then 4: x =CalculateX(B I, A, b, c); 5: return (x, z); 6: end if; 7: choose an index l having b l < 0 according to a certain rule; 8: for each index j (1 i n) do 9: if a lj < 0 then 10: j = c j a lj ; 11: else 12: j = ; 13: end if 14: end for 15: choose an index e that minimizes j; 16: if e = then 17: return no feasible solution ; 18: end if 19: (B I, A, b, c, z) = Pivot(B I, A, b, c, z, e, l); 20: end while 97 / 136

98 . An example Standard form: Slack form: min 5x x x 3 s.t. x 1 x 2 x 3 2 x 1 3x 2 3 x 1, x 2, x 3 0 min 5x x x 3 s.t. x 1 x 2 x 3 + x 4 = 2 x 1 3x 2 + x 5 = 3 x 1, x 2, x 3, x 4, x / 136

99 . Step 1 x 1 x 2 x 3 x 4 x 5 -z= 0 c 1 = 5 c 2 =35 c 3 =20 c 4 =0 c 5 =0 x B1 = b 1 = x B2 = b 2 = Basis (in blue): B = {a 4, a 5 } [ B Solution: x = 1 ] b = [0, 0, 0, 2, 3] 0 T. Pivoting: choose a 5 to leave basis since b 2 = 3 < 0; choose c a 1 to enter basis since min j j,a2j <0 a 2j = c 1 a / 136

100 . Step 2 x 1 x 2 x 3 x 4 x 5 -z= -15 c 1 = 0 c 2 =20 c 3 =20 c 4 =0 c 5 =5 x B1 = b 1 = x B2 = b 2 = Basis (in blue): B = {a 1, a 4 } [ B Solution: x = 1 ] b = [3, 0, 0, 5, 0] 0 T. Pivoting: choose a 4 to leave basis since b 1 = 5 < 0; choose c a 2 to enter basis since min j j,a1j <0 a 1j = c 2 a / 136

101 . Step 3 x 1 x 2 x 3 x 4 x 5 -z= -40 c 1 = 0 c 2 =0 c 3 =15 c 4 =5 c 5 =10 x B1 = b 1 = x B2 = b 2 = Basis (in blue): B = {a 1, a 2 } [ B Solution: x = 1 ] b = [ 5 0 4, 3 4, 0, 0, 0]T. Pivoting: choose a 1 to leave basis since b 2 = 3 4 c a 3 to enter basis since min j j,a2j <0 a 2j = c 3 a 23. < 0; choose 101 / 136

102 . Step 4 x 1 x 2 x 3 x 4 x 5 -z= -55 c 1 = 20 c 2 =0 c 3 =0 c 4 =20 c 5 =5 x B1 = b 1 = x B2 = b 2 = Basis (in blue): B = {a 2, a 3 } [ B Solution: x = 1 ] b = [0, 1, 1, 0, 0] 0 T. Done! 102 / 136

103 . When dual simplex method is useful?.1 The dual simplex algorithm is most suited for problems for which an initial dual feasible solution is easily available..2 It is particularly useful for reoptimizing a problem after a constraint has been added or some parameters have been changed so that the previously optimal basis is no longer feasible..3 Trying dual simplex is particularly useful if your LP appears to be highly degenerate, i.e. there are many vertices of the feasible region for which the associated basis is degenerate. We may find that a large number of iterations (moves between adjacent vertices) occur with little or no improvement. 2 2 References: Operations Research Models and Methods, Paul A. Jensen and Jonathan F. Bard; OR-Notes, J. E. Beasley 103 / 136

104 ..Primal Dual: another Improvment approach 104 / 136

105 . Primal Dual method: a brief history In 1955, H. Kuhn proposed the Hungarian method for the MaxWeightedMatching problem. This method effectively explores the duality property of linear programming. In 1956, G. Dantzig, R. Ford, and D. Fulkerson extended this idea to solve linear programming problems. In 1957, R. Ford, and D. Fulkerson applied this idea to solve network-flow problem and Hitchcock problem. In 1957, J. Munkres applied this idea to solve the transportation problem. 105 / 136

106 . Primal Dual method: basic idea 106 / 136

107 . Primal Dual Method Primal Dual method is a dual method, which exploits the lower bound information in subsequent linear programming operations. Advantages:.1 Unlike dual simplex starting from a dual basic feasible solution, primal dual method requires only a dual feasible solution..2 An optimal solution to DRP usually has combinatorial explanation, especially for graph-theory problems.. Primal P x Dual D y Step 1: y x Restricted Primal RP x Step 2: x y Dual of RP DRP y Step 3: y = y+θ y 107 / 136

108 . Basic idea of primal dual method Primal P: min c 1x 1 + c 2x c nx n s.t. a 11x 1 + a 12x a 1nx n = b 1 (y 1) a 21 x 1 + a 22 x a 2n x n = b 2 (y 2 )... a m1 x 1 + a m2 x a mn x n = b m (y m ) x 1, x 2,..., x n 0 Dual D: max b 1 y 1 + b 2 y b m y m s.t. a 11y 1 + a 21y a m1y m c 1 a 12y 1 + a 22y a m2y m c 2... a 1n y 1 + a 2n y a mn y m c n Basic idea: Suppose we are given a dual feasible solution y. Let s verify whether y is an optimal solution or not:.1 If y is an optimal solution to the dual problem D, then the corresponding primal variables x should satisfy a restricted primal problem called RP ;.2 Furthermore, even if y is not optimal, the solution to the dual of RP (called DRP ) still provide invaluable. information.. it. can. be. 108 / 136 used to improve y.

109 . Step 1: y x I Dual problem D: max b 1 y 1 + b 2 y b m y m s.t. a 11 y 1 + a 21 y a m1 y m c 1 ( = x 1 0)... a 1n y 1 + a 2n y a mn y m c n ( < x n = 0) y provides information of the corresponding primal variables x:.1 Given a dual feasible solution y. Let s check whether y is optimal solution or not..2 If y is optimal, we have the following restrictions on x: a 1i y 1 + a 2i y a mi y m <c i x i = 0 (Reason: complement slackness. An optimal solution y satisfies (a 1i y 1 + a 2i y a mi y m c i ) x i = 0).3 Let s use J to record the index of tight constraints where = holds. 109 / 136

110 . Step 1: y x II 110 / 136

111 . Step 1: y x III.4 Thus the corresponding primal solution x should satisfy the following restricted primal (RP):.5 RP: a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2... a m1 x 1 + a m2 x a mn x n = b m x i = 0 i / J x i 0 i J.6 In other words, the optimality of y is determined via solving RP. 111 / 136

112 . But how to solve RP? I RP: a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2... a m1 x 1 + a m2 x a mn x n = b m x i = 0 i / J x i 0 i J How to solve RP? Recall that Ax = b, x 0 can be solved via solving an extended LP. 112 / 136

113 . But how to solve RP? II RP (extended through introducing slack variables): min ϵ = s 1 +s s m s.t. s 1 +a 11 x a 1n x n = b 1 s 2 +a 21 x a 2n x n = b s m +a m1 x a mn x n = b m x i = 0 i x i 0 i s i 0.1 If ϵ OP T = 0, then we find a feasible solution to RP, implying that y is an optimal solution;.2 If ϵ OP T > 0, y is not an optimal solution. 113 / 136

114 . Step 2: x y I Alternatively, we can solve the dual of RP, called DRP : max w = b 1 y 1 + b 2 y b m y m s.t. a 11 y 1 + a 21 y a m1 y m 0 a 12 y 1 + a 22 y a m2 y m 0... a 1 J y 1 + a 2 J y a m J y m 0 y 1, y 2,, y m 1.1 If w OP T = 0, y is an optimal solution.2 If w OP T > 0, y is not an optimal solution. However, the optimal solution still provides useful information the optimal solution to DRP can be used to improve y. 114 / 136

115 . The difference between DRP and D Dual problem D: DRP: max b 1 y 1 + b 2 y b m y m s.t. a 11 y 1 + a 21 y a m1 y m c 1 a 12 y 1 + a 22 y a m2 y m c 2... a 1n y 1 + a 2n y a mn y m c n max w = b 1 y 1 + b 2 y b m y m s.t. a 11 y 1 + a 21 y a m1 y m 0 a 12 y 1 + a 22 y a m2 y m 0... a 1 J y 1 + a 2 J y a m J y m 0 y 1, y 2,, y m 1 How to write DRP from D? Replacing c i with 0; Only J restrictions in DRP ; An additional restriction: y 1, y 2,..., y m 1; 115 / 136

116 . Step 3: y y I Why y can be used to improve y? Consider an improved dual solution y = y+θ y, θ > 0. We have: Objective function: Since y T b = w OP T > 0, y T b = y T b + θw OP T > y T b. In other words, (y + θ y) is better than y. Constraints: The dual feasibility requires that: For any j J, a 1j y 1 + a 2j y a mj y m 0. Thus we have y T a j = y T a j + θ y T a j c j for any θ > 0. For any j / J, there are two cases: 116 / 136

117 . Step 3: y y II.1 j / J, a 1j y 1 + a 2j y a mj y m 0: Thus y is feasible for any θ > 0 since for 1 j n, a 1j y 1 + a 2j y a mj y m (6) = a 1j y 1 + a 2j y a mj y m (7) + θ(a 1j y 1 + a 2j y a mj y m ) (8) c j (9) Hence dual problem D is unbounded and the primal problem P is infeasible..2 j / J, a 1j y 1 + a 2j y a mj y m > 0: We can safely set θ c j (a 1j y 1 +a 2j y a mj y m ) a 1j y 1 +a 2j y a mj y m to guarantee that y T a j = y T a j + θ y T a j c j. = c j y T a j y T a j 117 / 136

118 . Primal Dual algorithm 1: Infeasible = No Optimal = No y = y 0 ; //y 0 is a feasible solution to the dual problem D 2: while TRUE do 3: Finding tight constraints index J, and set corresponding x j = 0 for j / J. 4: Thus we have a smaller RP. 5: Solve DRP. Denote the solution as y. 6: if DRP objective function w OP T = 0 then 7: Optimal= Yes 8: return y; 9: end if 10: if y T a j 0 (for all j / J) then 11: Infeasible = Yes ; 12: return ; 13: end if 14: Set θ = min c j y T a j y T a j for y T a j > 0, j / J. 15: Update y as y = y + θ y; 16: end while 118 / 136

119 . Advantages of Primal Dual algorithm Some facts: Primal dual algorithm ends if using anti-cycling rule. (Reason: the objective value y T b increases if there is no degeneracy.) Both RP and DRP do not explicitly rely on c. In fact, the information of c is represented in J. This leads to another advantage of primal dual technique, i.e., RP is usually a purely combinatorial problem. Take ShortestPath as an example. RP corresponds to a connection problem. More and more constraints become tight in the primal dual process. (See Lecture 10 for a primal dual algorithm for MaximumFlow problem. ) 119 / 136

120 . ShortestPath: Dijkstra s algorithm is essentially Primal Dual. algorithm 120 / 136

121 . ShorestPath problem s x 1 5 u x 2. 8 x 3 1 x 4 6 v x 5 2 t Primal problem: relax the 0/1 integer linear program into linear program by the totally uni-modular property. min 5x 1 + 8x 2 + 1x 3 + 6x 4 + 2x 5 s.t. x 1 + x 2 = 1 vertex s x 4 x 5 = 1 vertex t x 1 + x 3 + x 4 = 0 vertex u x 2 x 3 + x 5 = 0 vertex v x 1, x 2, x 3, x 4, x 5 0 x 1, x 2, x 3, x 4, x / 136

122 . Dual of ShortestPath problem height y s s y 5 u u. 8 1 y v y t v 6 2 t Dual problem: set variables for cities. (Intuition: y i means the height of city i; thus, y s y t denotes the height difference between s and t, providing a lower bound of the shortest path length.) max y s y t s.t. y s y u 5 x 1 : edge (s, u) y s y v 8 x 2 : edge (s, v) y u y v 1 x 3 : edge (u, v) y t + y u 6 x 4 : edge (u, t) y t + y v 2 x 5 : edge (v, t) 122 / 136

123 . A simplified version height y s s y 5 u u. 8 1 y v y t v 6 2 t Dual problem: simplify by setting y t = 0 (and remove the 2nd constraint in the primal problem P, accordingly) max y s s.t. y s y u 5 x 1 : edge (s, u) y s y v 8 x 2 : edge (s, v) y u y v 1 x 3 : edge (u, v) y u 6 x 4 : edge (u, t) y v 2 x 5 : edge (v, t) 123 / 136

124 . Iteration 1 I Dual feasible solution: y T = (0, 0, 0). Let s check the constraints in D: y s y u < 5 x 1 = 0 y s y v < 8 x 2 = 0 y u y v < 1 x 3 = 0 y u < 6 x 4 = 0 y v < 2 x 5 = 0 Identifying tight constraints in D: J = Φ, implying that x 1, x 2, x 3, x 4, x 5 = 0. RP: min s 1 +s 2 +s 3 s.t. s 1 +x 1 +x 2 = 1 node s s 2 x 1 +x 3 +x 4 = 0 node u s 3 x 2 x 3 +x 5 = 0 node v s 1, s 2, s 3, 0 x 1, x 2, x 3, x 4, x 5 = / 136

125 . Iteration 1 II DRP : max y s s.t. y s 1 y u 1 y v 1 Solve DRP using combinatorial technique: optimal solution y T = (1, 0, 0). Note: the optimal solution is not unique Step length θ: θ = min{ c 1 y T a 1 y T a 1, c 2 y T a 2 y T a 2 } = min{5, 8} = 5 Update y: y T = y T + θ y T = (5, 0, 0). 125 / 136

126 . Iteration 1 III height y s = 5 s 5 8 y u = y v = y t = 0.u. v 1 2 t 6 From the point of view of Dijkstra s algorithm: Optimal solution to DRP is y T = (1, 0, 0): the explored vertex set S = {s} in Dijkstra s algorithm. In fact, DRP is solved via identifying the nodes reachable from s. Step length θ = min{ c 1 y T a 1 y T a 1, c 2 y T a 2 y T a 2 } = min{5, 8} = 5: finding the closest vertex to the nodes in S via comparing all edges going out from S. 126 / 136

127 . Iteration 2 I Dual feasible solution: y T = (5, 0, 0). Let s check the constraints in D: y s y u = 5 y s y v < 8 x 2 = 0 y u y v < 1 x 3 = 0 y u < 6 x 4 = 0 y v < 2 x 5 = 0 Identifying tight constraints in D: J = {1}, implying that x 2, x 3, x 4, x 5 = 0. RP: min s 1 +s 2 +s 3 s.t. s 1 +x 1 +x 2 = 1 node s s 2 x 1 +x 3 +x 4 = 0 node u s 3 x 2 x 3 +x 5 = 0 node v s 1, s 2, s 3, 0 x 2, x 3, x 4, x 5 = / 136

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis .. CS711008Z Algorithm Design and Analysis Lecture 9. Algorithm design technique: Linear programming and duality Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China

More information

Part IB Optimisation

Part IB Optimisation Part IB Optimisation Theorems Based on lectures by F. A. Fischer Notes taken by Dexter Chua Easter 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

CS711008Z Algorithm Design and Analysis

CS711008Z Algorithm Design and Analysis CS711008Z Algorithm Design and Analysis Lecture 8 Linear programming: interior point method Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 31 Outline Brief

More information

Optimality, Duality, Complementarity for Constrained Optimization

Optimality, Duality, Complementarity for Constrained Optimization Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear

More information

ICS-E4030 Kernel Methods in Machine Learning

ICS-E4030 Kernel Methods in Machine Learning ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Convex Optimization and Modeling

Convex Optimization and Modeling Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual

More information

Lectures 6, 7 and part of 8

Lectures 6, 7 and part of 8 Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Support Vector Machines: Maximum Margin Classifiers

Support Vector Machines: Maximum Margin Classifiers Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind

More information

Linear Programming: Simplex

Linear Programming: Simplex Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table

More information

CS-E4830 Kernel Methods in Machine Learning

CS-E4830 Kernel Methods in Machine Learning CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This

More information

Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT

More information

2.098/6.255/ Optimization Methods Practice True/False Questions

2.098/6.255/ Optimization Methods Practice True/False Questions 2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence

More information

Nonlinear Programming and the Kuhn-Tucker Conditions

Nonlinear Programming and the Kuhn-Tucker Conditions Nonlinear Programming and the Kuhn-Tucker Conditions The Kuhn-Tucker (KT) conditions are first-order conditions for constrained optimization problems, a generalization of the first-order conditions we

More information

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization

Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The

More information

Support Vector Machines, Kernel SVM

Support Vector Machines, Kernel SVM Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM

More information

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities

Duality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form

More information

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010

I.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010 I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization

More information

5. Duality. Lagrangian

5. Duality. Lagrangian 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Constrained Optimization and Lagrangian Duality

Constrained Optimization and Lagrangian Duality CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may

More information

Linear Programming Duality

Linear Programming Duality Summer 2011 Optimization I Lecture 8 1 Duality recap Linear Programming Duality We motivated the dual of a linear program by thinking about the best possible lower bound on the optimal value we can achieve

More information

Support Vector Machines

Support Vector Machines EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Lecture: Duality of LP, SOCP and SDP

Lecture: Duality of LP, SOCP and SDP 1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:

More information

Lecture 18: Optimization Programming

Lecture 18: Optimization Programming Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming

More information

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines

Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation

More information

The Primal-Dual Algorithm P&S Chapter 5 Last Revised October 30, 2006

The Primal-Dual Algorithm P&S Chapter 5 Last Revised October 30, 2006 The Primal-Dual Algorithm P&S Chapter 5 Last Revised October 30, 2006 1 Simplex solves LP by starting at a Basic Feasible Solution (BFS) and moving from BFS to BFS, always improving the objective function,

More information

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:

The Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem: HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method

Yinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method The Simplex Method Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters 2.3-2.5, 3.1-3.4) 1 Geometry of Linear

More information

Math 5593 Linear Programming Week 1

Math 5593 Linear Programming Week 1 University of Colorado Denver, Fall 2013, Prof. Engau 1 Problem-Solving in Operations Research 2 Brief History of Linear Programming 3 Review of Basic Linear Algebra Linear Programming - The Story About

More information

Motivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory

Motivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory Instructor: Shengyu Zhang 1 LP Motivating examples Introduction to algorithms Simplex algorithm On a particular example General algorithm Duality An application to game theory 2 Example 1: profit maximization

More information

Convex Optimization Boyd & Vandenberghe. 5. Duality

Convex Optimization Boyd & Vandenberghe. 5. Duality 5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Convex Optimization and Support Vector Machine

Convex Optimization and Support Vector Machine Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.

14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness. CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity

More information

Chapter 1 Linear Programming. Paragraph 5 Duality

Chapter 1 Linear Programming. Paragraph 5 Duality Chapter 1 Linear Programming Paragraph 5 Duality What we did so far We developed the 2-Phase Simplex Algorithm: Hop (reasonably) from basic solution (bs) to bs until you find a basic feasible solution

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

OPTIMISATION /09 EXAM PREPARATION GUIDELINES

OPTIMISATION /09 EXAM PREPARATION GUIDELINES General: OPTIMISATION 2 2008/09 EXAM PREPARATION GUIDELINES This points out some important directions for your revision. The exam is fully based on what was taught in class: lecture notes, handouts and

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Optimization 4. GAME THEORY

Optimization 4. GAME THEORY Optimization GAME THEORY DPK Easter Term Saddle points of two-person zero-sum games We consider a game with two players Player I can choose one of m strategies, indexed by i =,, m and Player II can choose

More information

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*

Computational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture

More information

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:

Motivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem: CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through

More information

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings

Structural and Multidisciplinary Optimization. P. Duysinx and P. Tossings Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

EE/AA 578, Univ of Washington, Fall Duality

EE/AA 578, Univ of Washington, Fall Duality 7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized

More information

Lecture: Duality.

Lecture: Duality. Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong

More information

Linear and Combinatorial Optimization

Linear and Combinatorial Optimization Linear and Combinatorial Optimization The dual of an LP-problem. Connections between primal and dual. Duality theorems and complementary slack. Philipp Birken (Ctr. for the Math. Sc.) Lecture 3: Duality

More information

Game Theory: Lecture 2

Game Theory: Lecture 2 Game Theory: Lecture 2 Tai-Wei Hu June 29, 2011 Outline Two-person zero-sum games normal-form games Minimax theorem Simplex method 1 2-person 0-sum games 1.1 2-Person Normal Form Games A 2-person normal

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

15-780: LinearProgramming

15-780: LinearProgramming 15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Appendix A Taylor Approximations and Definite Matrices

Appendix A Taylor Approximations and Definite Matrices Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

Part IB Optimisation

Part IB Optimisation Part IB Optimisation Based on lectures by F. A. Fischer Notes taken by Dexter Chua Easter 015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.

More information

"SYMMETRIC" PRIMAL-DUAL PAIR

SYMMETRIC PRIMAL-DUAL PAIR "SYMMETRIC" PRIMAL-DUAL PAIR PRIMAL Minimize cx DUAL Maximize y T b st Ax b st A T y c T x y Here c 1 n, x n 1, b m 1, A m n, y m 1, WITH THE PRIMAL IN STANDARD FORM... Minimize cx Maximize y T b st Ax

More information

CO 250 Final Exam Guide

CO 250 Final Exam Guide Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,

More information

Today: Linear Programming (con t.)

Today: Linear Programming (con t.) Today: Linear Programming (con t.) COSC 581, Algorithms April 10, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 29.4 Reading assignment for

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.

More information

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to

g(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to 1 of 11 11/29/2010 10:39 AM From Wikipedia, the free encyclopedia In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) provides a strategy for finding the

More information

A Brief Review on Convex Optimization

A Brief Review on Convex Optimization A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review

More information

Lecture: Algorithms for LP, SOCP and SDP

Lecture: Algorithms for LP, SOCP and SDP 1/53 Lecture: Algorithms for LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:

More information

Support Vector Machines for Regression

Support Vector Machines for Regression COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x

More information

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)

Lagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST) Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual

More information

Lecture 6: Conic Optimization September 8

Lecture 6: Conic Optimization September 8 IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions

More information

Support Vector Machines for Classification and Regression

Support Vector Machines for Classification and Regression CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may

More information

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs

Introduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs Introduction to Mathematical Programming IE406 Lecture 10 Dr. Ted Ralphs IE406 Lecture 10 1 Reading for This Lecture Bertsimas 4.1-4.3 IE406 Lecture 10 2 Duality Theory: Motivation Consider the following

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-

More information

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module - 5 Lecture - 22 SVM: The Dual Formulation Good morning.

More information

Kernel Methods and Support Vector Machines

Kernel Methods and Support Vector Machines Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014

Convex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014 Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,

More information

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396

Data Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396 Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction

More information

An introductory example

An introductory example CS1 Lecture 9 An introductory example Suppose that a company that produces three products wishes to decide the level of production of each so as to maximize profits. Let x 1 be the amount of Product 1

More information

CONSTRAINED NONLINEAR PROGRAMMING

CONSTRAINED NONLINEAR PROGRAMMING 149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach

More information

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs

Support Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in

More information

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin

Game Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij

More information

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES

OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES General: OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES This points out some important directions for your revision. The exam is fully based on what was taught in class: lecture notes, handouts and homework.

More information

Lecture 2: Linear SVM in the Dual

Lecture 2: Linear SVM in the Dual Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu São Paulo 2015 July 22, 2015 Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Convex Optimization M2

Convex Optimization M2 Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial

More information

Chap 2. Optimality conditions

Chap 2. Optimality conditions Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION. Part I: Short Questions

DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION. Part I: Short Questions DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION Part I: Short Questions August 12, 2008 9:00 am - 12 pm General Instructions This examination is

More information

III. Linear Programming

III. Linear Programming III. Linear Programming Thomas Sauerwald Easter 2017 Outline Introduction Standard and Slack Forms Formulating Problems as Linear Programs Simplex Algorithm Finding an Initial Solution III. Linear Programming

More information

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition

LINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input

More information