CS711008Z Algorithm Design and Analysis
|
|
- Domenic Richard
- 5 years ago
- Views:
Transcription
1 .. CS711008Z Algorithm Design and Analysis Lecture 9. Lagrangian duality and SVM Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 136
2 . Outline Classification problem and maximum margin strategy; Solving maximum margin problem using Lagrangian duality; SMO technique; Kernel tricks; 2 / 136
3 ..Classification problem and maximum margin strategy 3 / 136
4 . Classification problem Given a set of samples with their category labels (denoted as (x 1, y 1 ), (x 2, y 2 ),..., (x n, y n ), y i { 1, +1}, the goal of classification problem is to find an appropriate function f(x) that can describe the dependency between y i and x i ; thus, for a new sample x, we can infer its category based on f(x ). A great variety of classification algorithms have been designed, including Fisher s linear discriminant, logistic regression, decision tree, neural network and SVM. 4 / 136
5 . Linear classifier Unlike decision tree, SVM adopts the classifier with the following type: If f(x) > 0 then y = +1; If f(x) < 0 then y = 1; Let s first restrict the f(x) to be linear, i.e. f(x) = ω T x + b The hyperplane ω T x + b = 0 is denoted as separating hyperplane. 5 / 136
6 The objective of training procedure is to find an appropriate setting of ω and b such that all samples in the training set can be correctly labelled using the classifier. We will consider the torlerance of several mislabelled samples later. 6 / 136
7 . Maximum margin strategy There are always multiple settings of ω and b that the corresponding classifier works perfectly on all samples. Which one should we use? We prefer the one such that the margin between positive and negative samples is maximized: The wider the margin is, the larger the generality performance on new samples. Thus, we needs to solve the following optimization problem: min w,b 2 ω s.t. y i (w x i + b) 1 0 i = 1, 2,, n Note: The restriction f(x) > 0 for positive sample x is implemented as f(x) = 1. The distance for any point x to the hyperplane ω T x + b = 0 is ω T x+b 2 ω. Thus, the margin is: ω. 7 / 136
8 . An equivalent form with quadratic objective function An equivalent form is: min w,b 1 2 w 2 s.t. y i (w x i + b) 1 0 i = 1, 2,, n Question: how to solve this optimization problem subject to inequality constraints? Of course we solve the problem (called primal problem hereafter) directly using convex quadratic programming techniques; however, consider its dual problem will bring great benefits. Let s review the conditions of the optimal solution first. 8 / 136
9 . Lagrangian multiplier method, KKT conditions and Lagrangian. dual 9 / 136
10 . A brief introduction to Lagrangian multiplier and KKT conditions It is relatively easy to optimize an objective function without any constraint, say: min f(x) But how to optimize an objective function with equality constraints? min f(x) s.t. g i (x) = 0 i = 1, 2,..., m And how to optimize an objective function with inequality constraints? min f(x) s.t. g i (x) = 0 i = 1, 2,..., m h i (x) 0 i = 1, 2,..., p 10 / 136
11 . Lagrangian multiplier: under equality constraints Consider the following optimization problem: min f(x, y) s.t. g(x, y) = 0 Intuition: suppose (x, y ) is the optimum point. Thus at (x, y ), f(x, y) does not change when we walk along the curve g(x, y) = 0; otherwise, we can follow the curve to make f(x, y) smaller, meaning that the starting point (x, y ) is not optimum. g(x, y) g(x, y) = 0 x f(x, y). f(x, y) = 1 11 / 136
12 g(x, y) g(x, y) = 0 x f(x, y). f(x, y) = 1 f(x, y) = 2 So at (x, y ), the red line tangentially touches a blue contour, i.e. there exists a λ such that: f(x, y) = λ g(x, y) Lagrange must have cleverly noticed that the equation above looks like partial derivatives of some larger scalar function: L(x, y, λ) = f(x, y) λg(x, y) Necessary conditions of optimum point: If (x, y ) is local optimum, then there exists λ such that L(x, y, λ) = / 136
13 . Understanding Lagrangian function Lagrangian function: a combination of the original optimization objective function and constraints: Note: L(x,y,λ) λ g(x, y) = 0. L(x, y, λ) = f(x, y) λg(x, y) = 0 is essentially the equality constraint The critical point of Lagrangian L(x, y, λ) occurs at saddle points rather than local minima (or maxima). To utilize numerical optimization techniques, we must first transform the problem such that the critical points lie at local minima. This is done by calculating the magnitude of the gradient of Lagrangian. 13 / 136
14 . KKT conditions: under inequality constraints Consider the following optimization problem: min f(x) s.t. g i (x) 0 i = 1, 2,...m g(x) 0 g(x) 0 x. f(x) = 1 f(x) = 2 x. f(x) = 1 f(x) = 2 Figure: Case 1: the optimum point lies in the curve g(x) = 0. Thus Lagrangian condition L(x, λ) = 0 applies. Case 2: the optimum point lies within the region g(x) 0; thus we have f(x) = 0 at x. 14 / 136
15 . Understanding complementary slackness g(x) 0 g(x) 0 x. f(x) = 1 f(x) = 2 x. f(x) = 1 f(x) = 2 Figure: Case 1: the optimum point lies in the curve g(x) = 0. Thus Lagrangian condition L(x, λ) = 0 applies. Case 2: the optimum point lies within the region g(x) 0; thus, we have f(x) = 0 at x These two cases were summarized as the following two conditions: (Stationary point) L(x, λ) = 0 (Complementary slackness) λ i g i (x ) = 0, i = 1, 2,..., m (Note that in case 2, g(x )<0 λ = 0 f(x ) = 0.) 15 / 136
16 . KKT conditions Lagrangian: L(x, λ) = f(x) m λ i g i (x) Necessary conditions of optimum point: If x is local optimum satisfying some regularity conditions, then there exists λ such that:.1 (Stationary point) L(x, λ) = 0.2 (Primal feasibility) g(x ) 0.3 (Dual feasibility) λ i 0, i = 1, 2,..., m.4 (Complementary slackness) λ i g i (x ) = 0, i = 1, 2,..., m i=1 1 Note: KKT conditions are usually not solved directly in optimization; instead, iterative successive approximation is most often used. However, the final results should meet KKT conditions. 1 KKT conditions were named after William Karush, Harold W. Kuhn, and. Albert. W. Tucker / 136
17 . Three strategies to solve an optimization problem under inequality constraints Solve the primal problem subject to primal feasibility; Solve the dual problem subject to the dual feasibility; Interior-point method: Complementary slackness, also called orthogonality by Gomory, essentially equals to the strong duality, i.e., the primal and dual problems have identical objective value. A relaxation of this condition, i.e., λ i g i (x ) = µ, is used in the interior point method to reflect duality gap. 17 / 136
18 . Ạpplying Lagrangian dual to the maximum margin problem 18 / 136
19 . An example Primal problem: Lagrangian: min x s.t. x 2 x 0 L(x, y, z) = x y (x 2) z x = 2y + x (1 y z) Notice that when y 0, z 0 and x 2, L(x, y, z) is a lower bound of the primal objective function x. Lagrangian dual function: g(y, z) = inf x L(x, y, z) = Dual problem: { 2y if 1 y z = 0 max 2y s.t. y 1 y 0 otherwise 19 / 136
20 . Lagrangian connecting primal and dual Primal: c^t x = x L(x,y) Dual: g(y) = 2y y x Observation: Primal objective function x Lagrangian Dual objective function y in the feasible region. 20 / 136
21 . Lagrangian dual explanation of LP duality Consider a LP model: min c T x s.t. Ax b x 0 Lagrangian: L(x, λ, s) = c T x m λ i(a i1 x a in x n b i ) m s ix i i=1 i=1 Notice that Lagrangian is a lower bound of the primal objective function, i.e. c T x L(x, λ, s), when λ 0, s 0 and x is feasible. Furthermore we have c T x L(x, λ) inf L(x, λ) x when λ 0, s 0 and x is feasible. Denote Lagrangian dual g(λ, s) = inf x L(x, λ, s). The above inequality can be rewritten as: c T x L(x, λ, s) g(λ, s) 21 / 136
22 . Lagrangian dual explanation of LP duality cont d What is the Lagrangian dual g(λ, s)? g(λ, s) = inf x L(x, λ, s) = inf x (ct x m λ i (a i1 x a in x n b i ) i=1 = inf x (λt b + (c T λ T A s T )x) { λ T b if c T = λ T A + s T = otherwise Thus λ T b is a lower bound of f(x) when c T = λ T A + s T and x 0. Note g(λ) is always concave even if the constraints are not convex. m s i x i ) i=1 22 / 136
23 . Find a tight bound Now let s try to find the best lower bound of f(x). Thus the tight lower bound max g(λ) can be described as: Notes: max λ T b s.t. λ T A c T λ 0.1 This is actually the Dual form of LP if replacing λ by y; thus, we have another explanation of Dual variables y the Lagrangian multiplier..2 Lagrangian dual is a lower bound of the primal optimum under some conditions. In addition, Lagrangian dual is concave; thus, the dual problem is always a convex programming problem even if the primal problem is not a convex programming problem. 23 / 136
24 . Lagrangian dual explanation of maximum margin problem Primal problem: min w,b 1 2 w 2 s.t. y i (w T x i + b) 1, i {1,..., n} Lagrangian: L(w, b, α) = 1 2 w 2 n i=1 α iy i (w x i + b) + n i=1 α i Notice that Lagrangian is a lower bound of the primal objective function, i.e. 1 2 w 2 L(w, b, α), when α 0 and w, b is feasible. Furthermore we have 1 2 w 2 L(w, b, α) inf L(w, b, α) w,b when α 0 and w, b is feasible. Denote Lagrangian dual g(α) = inf w,b L(w, b, α). The above inequality can be rewritten as: 1 2 w 2 L(w, b, α) g(α) 24 / 136
25 . Lagrangian dual explanation of maximum margin duality cont d What is the Lagrangian dual g(α)? g(α) = inf L(w, b, α) w,b To calculate the inferior bound of L(w, b, α), we set its derivates to be 0, i.e., L(w, b, α) w n = w α i y i x i = 0 i=1 L(w, b, α) b = n α i y i = 0 i=1 25 / 136
26 . Lagrangian dual function Lagrangian dual function: g(α) = 1 2 n n α i α j y i y j (x T i x j ) + n i=1 j=1 i=1 α i Thus g(α) is a lower bound of 1 2 w 2 and α 0. when n i=1 α iy i = 0 26 / 136
27 . Lagrangian dual problem Now let s try to find the tightest lower bound of 1 2 w 2, which can be calculated by solving the following Lagrangian dual problem: max 1 n n 2 i=1 j=1 α iα j y i y j (x T i x j) + n i=1 α i s.t. n i=1 α iy i = 0 α 0 27 / 136
28 . Two explanations of dual variables y: L. Kantorovich vs. T. Koopmans.1 Price interpretation: constrained optimization plays an important role in economics. Dual variables are also called as shadow price (by T. Koopmans), i.e. the instantaneous change in the optimization objective function when constraints are relaxed, or marginal cost when strengthening constraints..2 Lagrangian multiplier: the effect of constraints on the objective function (by L. Kantorovich). For example, when b i increase to b i + b i, how much the objective function value will change. In fact, we have L(x,λ) b i = λ i. 28 / 136
29 . Explanation of dual variables y: using Diet as an example Optimal solution to primal problem with b 1 = 2000, b 2 = 55, b 3 = 800: x = (14.24, 2.70, 0, 0), c T x = Optimal solution to dual problem: y = (0.0269, 0, ), y T b = Let s make a slight change on b, and watch the effect on max c T x..1 b 1 = 2001: max c T x = (Note that.2 y 1 = = ) b 2 = 56: max c T x = (Note that.3 y 2 = 0 = ) b 3 = 801: max c T x = (Note that y 3 = = ) (See extra slides) 29 / 136
30 . Ẉeak duality, strong duality, and Slater conditions 30 / 136
31 . Weak duality and strong duality Sometimes the optimal solution of the primal problem doesn t has identical value to that of the dual problem. The difference between them is denoted as duality gap. This case is denoted as weak duality. The opposite case is denoted as strong duality. Conditions under which strong duality holds are denoted as constraint qualifications. 31 / 136
32 . Slater s conditions for strong duality The best known sufficient condition of strong duality is: Primal problem is convex programming, i.e., min f(x) s.t. g i (x) 0 i = 1, 2,...m Ax = b where f(x) and g i (x) are convex. Slater s condition holds, i,e. there exists some strictly feasible point x relint(d) such that and Ax = b g i (b)<0, i = 1, 2,..., m Here D represents the feasible domain of the primal problem, and relint(d) denotes the relative interior of D. 32 / 136
33 . Slater s conditions for strong duality (cont d) The Slater s conditions are trivial if the constraints of the primal problem are affine, i.e., Consider the following convex programming problem: min f(x) s.t. g i (x) 0 i = 1, 2,...m Ax = b where f(x) and g i (x) are convex. Slater s condition: If g i (x) are affine, the inequalities are no longer required to be strict and thus reduce to the original form, i.e. there exists some strictly feasible point x relint(d) such that Ax = b and g i (b) 0, i = 1, 2,..., m 33 / 136
34 . Strong duality in SVM For the maximum margin problem, its dual problem has the identical optimal objective function value. 34 / 136
35 . Ḳernel method to overcome linear insperability 35 / 136
36 . Linearly inseperable Some samples are linearly insuperable. How should we do? Kernel method: If the samples are linearly inseperable in the original input space, let s map them into a higher-dimensional space (called feature space) and make them linearly separable in that space. In other words, nonlinearity was introduced into the original input space through using the map. 36 / 136
37 . An example: XOR problem Let s define a map: ϕ : R 2 R 3 as below: [x (1), x (2) ] Φ([x (1), x (2) ]) = [x (1), x (2), x (1) x (2) ] These samples are linearly seperable after mapping into R 3 using Φ(x (1), x (2) ) 37 / 136
38 . Replace the inner product in the optimization problem for SVM Remember that in the optimization problem for SVM, the objective function contains inner product x T x: max 1 N N 2 i=1 j=1 α iα j y i y j (x T i x j) + N i=1 α i s.t. N i=1 α iy i = 0 α 0 After mapping x into Φ(x), the inner product x T x changes into Φ(x) T Φ(x), and the optimization problem changes accordingly: max 1 N N 2 i=1 j=1 α iα j y i y j (Φ(x i ) T Φ(x j )) + N i=1 α i s.t. N i=1 α iy i = 0 α 0 Note we simply replace this inner product with Φ(x) T Φ(x) without even knowing the map Φ(x). 38 / 136
39 . The seperating hyperplane Remember that in the optimization problem for SVM, the seperating hyperplane is: f(x) = ω T x + b = i,j α i y i x T i x = 0 After mapping x into Φ(x), the sporting hyperplane changes into: f(x) = ω T Φ(x) + b = i,j α i y i Φ(x) T i Φ(x) = 0 Note that Φ(x) T i Φ(x) = k(φ(x) i, Φ(x)). 39 / 136
40 . Ḍeriving kernel function from map 40 / 136
41 . Example 1 Consider the map Φ : R 2 R 4 : [x (1), x (2) ] Φ([x (1), x (2) ]) = [x (1)2, x (2)2, x (1) x (2), x (1) x (2) ] The corresponding kernel function is: k(x, z) = x (1)2 z (1)2 +x (2)2 z (2)2 +2x (1) x (2) z (1) z (2) =< x, z > 2 R 2 41 / 136
42 . Example 2 Consider the map Φ : R 3 R 13 : [x (1), x (2), x (3) ] Φ([x (1), x (2), x (3) ]) = [x (1)2, x (1) x (2),..., x (3)2, 2c The corresponding kernel function is: k(x, z) = x T z + c 42 / 136
43 43 / 136
44 . The power of polynomial kernel We can map into a d-dimensional feature space. 44 / 136
45 . The power of polynomial kernel 45 / 136
46 . Ṭhe requirement of kernel funcitons 46 / 136
47 . Definition of kernel function A function k : X X R is a kernel function if: k is symmetric: k(x, z) = k(z, x). For any m N and any x 1, x 2,..., x m mathcalx, the matrix K defined by K i,j = k(x i, x j ) is positive semidefinite. 47 / 136
48 . Question: Could we map from input space into a feature space. with infinite dimension? 48 / 136
49 . Map into an infinite dimensional feature space 49 / 136
50 Let s consider the feature space where all elements are functions. The span of this space is Φ(x i ) = k(., x i ). Thus, any element can be represented as: For two elements we define inner product as: 50 / 136
51 . Reproducing kernel Hilbert space Now we find a map such that 51 / 136
52 52 / 136
53 . Theorem (Farkas lemma). Given vectors a 1, a 2,..., a m, c R n. Then either..1 c C(a 1, a 2,..., a m ); or.2 there is a vector y R n such that for all i, y T a i 0 but y T c < 0.. y a 2 c C(a 1, a 2 ) y a 1. y a 2 C(a 1, a 2 ) a 1 c Figure: Case 1: c C(a 1, a 2 ) Figure: Case 2: c / C(a 1, a 2 ) Here, C(a 1,..., a m ) denotes the cone spanned by a 1,..., a m, i.e. C(a 1,..., a m ) = {x x = m i=1 λ ia i, λ i 0}. 53 / 136
54 . Proof.. Suppose for any vector y R n, y T a i 0 (i = 1, 2,..., m), we always have y T c 0. We will show that c should lie within the cone C(a 1, a 2,..., a m ). Consider the following Primal problem: min c T y s.t. a T i y 0 i = 1, 2,..., m It is obvious that the Primal problem has a feasible solution y = 0, and is bounded since c T y 0. Thus the Dual problem also has a bounded optimal solution: max 0 s.t. x T A T = c T x 0 In other words, there exists a vector x such that c = m i=1 x ia i and x i / 136
55 . Variants of Farkas lemma Farkas lemma lies at the core of linear optimization. Using Farkas lemma, we can prove Separation theorem, and MiniMax theorem in the game theory.. Theorem. Let A be an m n matrix, and b R m. Then either..1 Ax = b, x 0 has a feasible solution; or.2 there is a vector y R m such that y T A 0 but y T b < / 136
56 . Variants of Farkas lemma. Theorem. Let A be an m n matrix, and b R m. Then either..1 Ax b has a feasible solution; or.2 there is a vector y R m such that y 0, y T A 0 but y T b < / 136
57 . Caratheodory s theorem. Theorem. Given vectors a 1, a 2,..., a m R n. If x C(a 1, a 2,..., a m ), then there is a linearly independent vector set of a 1, a 2,..., a m, say a. 1, a 2,..., a r, such that x C(a 1, a 2,..., a r ). 57 / 136
58 . Separation theorem. Theorem. Let C R n be a closed, convex set, and let x R n. If x / C,. then there exists a hyperplane separating x from C. 58 / 136
59 . Ạpplication 2: von Neumann s MiniMax theorem on game theory 59 / 136
60 . Game theory Game theory studies competing and cooperative behaviours among intelligent and rational decision-makers. In 1928, John von Neumann proved the existence of mixed-strategy equilibria in two-person zero-sum games. In 1950, John Forbes Nash Jr. developed a criterion of mutual consistency of players strategies, which applies to a wider range of games than that proposed by J. von Neumann. He proved the existence of Nash equilibrium in every n-player, non-zero-sum, non-cooperative game (not just 2-player, zero-sum games). Game theory was widely applied in mathematical economics, in biology (e.g., analysis of evolution and stability) and computer science (e.g., analysis of interactive computations and lower bound on the complexity of randomized algorithms, the equivalence between linear program and two-person zero-sum game). 60 / 136
61 . Paper-rock-scissors: an example of two-player zero-sum game Paper-rock-scissors is a hand game usually played by two players, denoted as row player and column player: each player selects one of the three hand shapes, including paper, rock, and scissors ; then the players show their selections simultaneously. It has two possible outcomes other than tie: one player wins and the other player loses, which can be formally described using the following payoff matrix. Paper Rock Scissors Paper 0, 0 1, -1-1, 1 Rock -1, 1 0, 0 1, -1 Scissors 1, -1-1, 1 0, 0 Each player attempts to select appropriate action to maximize his gain. 61 / 136
62 . Matching penny: another example of two-person zero-sum game Matching pennies is a game played by two players, namely, row player and column player. Each player has a penny and secretly turns it to head or tail. The players then reveal their selections simultaneously. If the pennies match, then row player keeps both pennies; otherwise, column player keeps both. The payoff matrix is as follows. Head Tail Head 1, -1-1, 1 Tail -1, 1 1, -1 Each player tries to maximize his gain via making an appropriate selection. 62 / 136
63 . Simultaneous games vs. sequential games Simultaneous games are games in which all players move simultaneously. Thus, no player have information of the others selections in advance. Sequential games are games in which the later player has some information, although maybe imperfect, of previous actions by the other players. A complete plan of action for every stage of the game, regardless of whether the action actually arises in play, is denoted as a (pure) strategy. Normal form is used to describe simultaneous games while extensive form is used to describe sequential games. J. von Neumann proposed an approach to transform transform strategies in sequential games into actions in simultaneous games. Note that the transformation is one-way, i.e., multiple sequential games might correspond to the same simultaneous game, and it may result in an exponential blowup in the size of the representation. 63 / 136
64 . Normal form A game Γ in normal form among m players contains the following items: Each player k has a finite number of pure strategies S k = {1, 2,..., n k }. Each player k is associated with a payoff function H k : S 1 S 2... S m R. To play the game, each player selects a strategy without information of others, and then reveals the selection simultaneously. The players gain are calculated using corresponding payoff functions. Each player attempts to maximize his gain via selecting an appropriate strategy. 64 / 136
65 . Two-person zero-sum game in normal form In a two-person zero-sum game game Γ, a player s gain or less is exactly balanced by the other player s loss or gain, i.e., H 1 (s 1, s 2 ) + H 2 (s 1, s 2 ) = 0. Thus we can define another function H(s 1, s 2 ) = H 1 (s 1, s 2 ) = H 2 (s 1, s 2 ) and represent it using a payoff matrix. Head Tail Head 1-1 Tail -1 1 Row player aims to maximize H(s 1, s 2 ) by selecting an appropriate strategy s 1 while column player aims to minimize H(s 1, s 2 ) by selecting an appropriate strategy s / 136
66 . von Neumann s MiniMax theorem: motivation When analyzing a two-person zero-sum game Γ, von Neumann noticed that the difficulty comes from the difference between games and ordinary optimization problems: row player tries to maximize H(s 1, s 2 ); however, he can control s 1 only as he has no information of the other player s selection s 2, and so does column player. Thus von Neumann suggested to investigate two auxiliary games without this difficulty, denoted as Γ 1 and Γ 2, before attacking the challenging game Γ..1 Γ 1 : Row player selects a strategy s 1 first, and exposes his selection to column player before column player selects a strategy s 2..2 Γ 2 : Column player selects a strategy s 2 first, and exposes his selection to row player before row player selects a strategy s 1. The two auxiliary games are much easier than the original game Γ, and more importantly, they provide upper and lower bounds for Γ. 66 / 136
67 . Auxiliary game Γ 1 Let s consider column player first. As he knows row player s selection s 1, the objective function H(s 1, s 2 ) becomes an ordinary optimization function over a single variable s 2, and column player can simply select a strategy s 2 with the minimum objective function value min s2 H(s 1, s 2 ). Head Tail Row minimum Head Tail -1 2 v 1 = 1 Now consider row player. When he selects a strategy s 1, he can definitely predict the selection of column player. Since min s2 H(s 1, s 2 ) is an ordinary function over a single s 1, it is easy for row player to select a strategy s 1 with the maximum objective function value v 1 = max min H(s 1, s 2 ). s 1 s 2 67 / 136
68 . Auxiliary game Γ 2 Let s consider row player first. As he knows column player s selection s 2, the objective function H(s 1, s 2 ) becomes an ordinary optimization function over a single variable s 1, and row player can simply select a strategy s 1 with the maximum objective function value max s1 H(s 1, s 2 ). Head Tail Head -2 1 Tail -1 2 Column maximum v 2 = 1 2 Now consider column player. When he selects a strategy s 2, he can definitely predict the selection of row player. Since max s1 H(s 1, s 2 ) is an ordinary function over a single variable s 2, it is easy for column player to select a strategy s 2 with the minimum objective function value v 2 = min max H(s 1, s 2 ). s 2 s 1 68 / 136
69 . Γ 1 and Γ 2 bound Γ For row player, it is clearly Γ 1 is disadvantageous to him as he should expose his selection s 1 to column player. On the contrary, Γ 2 is beneficial to row player as he knows column player s selection s 2 before making decision. Head Tail Row minimum Head Tail -1 2 v 1 = 1 Column maximum v 2 = 1 2 Thus these two auxiliary games provides lower and upper bounds: v 1 v v 2 where v denote row player s gain in the original game Γ. 69 / 136
70 . Case 1: v 1 = v 2 For a game with the following payoff matrix, we have v 1 = v = v 2 and call this game strictly determined. Head Tail Row minimum Head Tail -1 2 v 1 = 1 Column maximum v 2 = 1 2 The saddle point of the payoff matrix H(s 1, s 2 ) represents a pure strategy equilibrium. In this equilibrium, each player has nothing to gain by changing only his own strategy. In addition, knowing the opponent s selection will bring no gain. von Neumann proved the existence of the optimal strategy in a perfect information two-person zero-sum game, e.g., chess. L. S. Shapley further showed that a finite two-person zero-sum game has a pure strategy equilibrium if every 2 2 submatrix of the game has a pure strategy equilibrium [?]. 70 / 136
71 . Case 2: v 1 < v 2 In contrast, matching penny does not have a pure strategy equilibrium as there is no saddle point in the payoff matrix. So does the paper-rock-scissors game. Head Tail Row minimum Head Tail -1 1 v 1 = 1 Column maximum v 2 = 1 1 This fact implies that knowing the opponent s selection might bring gain; however, it is impossible to know the opponent s selection as the players reveal their selections simultaneously. In this case, let s play a mixed strategy rather than a pure strategy. 71 / 136
72 . From pure strategy to mixed strategy A mixed strategy is an assignment of probability to pure strategies, allowing a player to randomly select a pure strategy. Consider the payoff matrix as below. If the row player select strategy A with probability 1, he is said to play a pure strategy. If he tosses a coin and select strategy A if the coin lands head and B otherwise, then he is said to play a mixed strategy. A B A 1-1 B / 136
73 . Two types of interpretation of mixed strategy From a player s viewpoint: J. von Neumann described the motivation underlying the introduction of mixed strategy as follows: since it is impossible to exactly know opponent s selection, a player could switch to protect himself by randomly selecting his own strategy, making it difficult for the opponent to know the player s selection. However, this interpretation came under heavy fire for lacking of behaviour supports: Seldom do people make choices following a lottery. From opponent s viewpoint: Robert Aumann and Adam Brandenburger interpreted mixed strategy of a player as opponent s belief of the player s selection. Thus, Nash equilibrium is an equilibrium of belief rather than actions. 73 / 136
74 . Existence of mixed strategy equilibrium Consider a mixed strategy game: row player has m strategies available and he selects a strategy s 1 according to a distribution u, while column player has n strategies available and he selects a strategy s 2 according to a distribution v, i.e., Pr(s 1 = i) = u i, i = 1,..., n Here, u and v are independent. Thus the expected gain of row player is: m i=1 n Pr(s 2 = j) = v j, j = 1,..., m j=1 u ih ij v j = u T Hv row player attempts to minimize u T Hv via selecting an appropriate u, while column player attempts to maximize it via selecting an appropriate v. Now let s consider the two auxiliary games Γ 1 and Γ 2 again and answer the following questions: what happens if row player exposes his mixed strategy to column player? And if we reverse the order of the players? 74 / 136
75 . von Neumann s MiniMax theorem [1928] This question has been answered by the von Neumann s MiniMax theorem.. Theorem.. max u min v ut Hv = min v max u ut Hv The theorem states that knowing the other player s strategy will bring no gain in a mixed-strategy zero-sum game, and the order doesn t change the value. A mixed-strategy Nash equilibrium exists for any two-person zero-sum game with a finite set of actions. A Nash equilibrium in a two-player game is a pair of strategies, each of which is a best response to the other; i.e., each gives the player using it the highest possible payoff, given the other players strategy. 75 / 136
76 . von Neumann s MiniMax theorem: proof Let s consider the auxiliary game Γ 1 first, in which the strategy of row player, i.e., u, was exposed to column player. This is of course beneficial to column player since he can select the optimal strategy v to minimize u T Hv, which is inf{u T Hv v 0, 1 T v = 1} = min j=1,...,n (ut H) j Thus row player should select u to maximize the above value, which can be formulated as a linear program: max min j=1,...,n (ut H) j s.t. 1 T u = 1 u 0 76 / 136
77 . von Neumann s MiniMax theorem: proof The linear program can be rewritten as below. max s s.t. u T H s1 T 1 T u = 1 u 0 Similarly we consider the auxiliary game Γ 2 and calculate the optimal strategy v by solving the following linear program. min t s.t. Hv t1 1 T v = 1 v 0 These two linear programs are both feasible and form Lagrangian dual. Thus they have the same optimal objective value according to the strong duality property. 77 / 136
78 . An example: paper-rock-scissors game For the paper-rock-scissors game, we have the following two linear programs. Linear program for Γ 1 : Linear program for Γ 2 : max s s.t. 0u 1 u 2 + u 3 s u 1 + 0u 2 u 3 s u 1 + u 2 + 0u 3 s u 1 + u 2 + u 3 = 1 u 1, u 2, u 3 0 min t s.t. 0v 1 + v 2 v 3 t v 1 + 0v 2 + v 3 t v 1 v 2 + 0v 3 t v 1 + v 2 + v 3 = 1 v 1, v 2, v 3 0 The mixed strategy equilibrium is u T = [ 1 3, 1 3, 1 3 ] and u T = [ 1 3, 1 3, 1 3 ] with the game value / 136
79 . Comments on the mixed strategy equilibrium by von Neumann Note that a mixed strategy equilibrium always exists no matter whether the payoff matrix H has a saddle point or not. Regardless of column player s selection, row player can select an appropriate strategy to guarantee his gain v 1 0. Regardless of row player s selection, column player can select an appropriate strategy to guarantee row player s gain v 1 0. Using the strategy u T = [ 1 3, 1 3, 1 3 ], row player can guarantee that he won t lose, i.e., the probability of losing is less than the probability of winning. The strategy u T = [ 1 3, 1 3, 1 3 ] is designed for protecting himself rather than attacking his opponent, i.e., it cannot be used to benefit from opponent s fault. 79 / 136
80 . Ạpplication 3: Yao s MiniMax principle [1977] 80 / 136
81 . Yao s MiniMax principle Consider a problem Π. Let A = {A 1, A 2,..., A n } be algorithms to Π, and I = {I 1, I 2,..., I m } be the inputs with a given size. Let T (A i, I j ) be the running time of algorithm A i on the input I j. Inputs Algorithms A 1 A 2 I 1 T 11 T 12 I 2 T 21 T 22 Thus max T (A i, I j ) represents the worst-case time for the I j I deterministic algorithm A i. For a randomized algorithms, however, it is usually difficult to bound its expected running time on worst-case inputs. Yao s MiniMax principle provides a technique to build lower bound for the expected running time of any randomized algorithm on its worst-case input. 81 / 136
82 . Expected running time of a randomized algorithm A q A Las Vegas randomized algorithm can be viewed as a distribution over all deterministic algorithms A = {A 1, A 2,..., A n }. Specifically, let q be a distribution over A, and A q be a randomized algorithm chosen according to q, i.e., A q refers to a deterministic algorithm A i with probability q i. Given a input I j, the expected running time of A q can be written as n E[T (A q, I j )] = q i T (A i, I j ) Thus max I j I E[T (A q, I j )] represents the expected running time of A q on its worst-case input. i=1 82 / 136
83 . Expected running time of a deterministic algorithm A i on random input Now consider a deterministic algorithm A i running on random input. Let p be a distribution over I, and I p be a random input chosen from I, i.e., I p refers to I j with probability p j. Given a deterministic algorithm A i, its expected running time on random input I p can be written as E[T (A i, I p )] = m p j T (A i, I j ) j=1 Thus min A i A E[T (A i, I p )] represents the expected running time of the best deterministic algorithm on the random input I p. 83 / 136
84 . Yao s MiniMax principle. Theorem. For any random input I p and randomized algorithm A q,. min E[T (A i, I p )] max E[T (A q, I j )] A i A I j I To establish a lower bound for the expected running time of a randomized algorithm on its worst-case input, it suffices to find an appropriate distribution over inputs and prove that on this random input, no deterministic algorithm can do better than the randomized one. The power of this technique lies at the fact that one can choose any distribution over inputs and the lower bound is constructed based on deterministic algorithms. 84 / 136
85 . Yao s MiniMax principle: proof. Proof... min E[T (A i, I p )] max min E[T (A i, I u )] (1) A i A u m A i A = max min E[T (A v, I u )] u m v n (2) = min max E[T (A v, I u )] v n u m (3) = min max v, I j )] v n I j I (4) max E[T (A q, I j )] (5) I j I Here, n denotes the set of n-dimensional probability vectors. Equation (3) follows by the von Neumann s MiniMax theorem. 85 / 136
86 . Ạpplication 4: Revisiting ShortestPath algorithm 86 / 136
87 . ShortestPath problem. INPUT: n cities, and a collection of roads. A road from city i to j has a distance d(i, j). Two specific cities: s and t.. OUTPUT: the shortest path from city s to t. s 5 u. 8 1 v 6 2 t 87 / 136
88 . ShorestPath problem: Primal problem s x 1 5 u x 2. 8 x 3 1 x 4 6 v x 5 2 t Primal problem: set variables for roads (Intuition: x i = 0/1 means whether edge i appears in the shortest path), and a constraint means that we enter a node through an edge and leaves it through another edge. min 5x 1 + 8x 2 + 1x 3 + 6x 4 + 2x 5 s.t. x 1 + x 2 = 1 vertex s x 4 x 5 = 1 vertex t x 1 + x 3 + x 4 = 0 vertex u x 2 x 3 + x 5 = 0 vertex v x 1, x 2, x 3, x 4, x 5 = 0/1 88 / 136
89 . ShorestPath problem: Primal problem s x 1 5 u x 2. 8 x 3 1 x 4 6 v x 5 2 t Primal problem: relax the 0/1 integer linear program into linear program by the totally uni-modular property. min 5x 1 + 8x 2 + 1x 3 + 6x 4 + 2x 5 s.t. x 1 + x 2 = 1 vertex s x 4 x 5 = 1 vertex t x 1 + x 3 + x 4 = 0 vertex u x 2 x 3 + x 5 = 0 vertex v x 1, x 2, x 3, x 4, x 5 0 x 1, x 2, x 3, x 4, x / 136
90 . ShortestPath problem: Dual problem height y s s y 5 u u. 8 1 y v y t v 6 2 t Dual problem: set variables for cities. (Intuition: y i means the height of city i; thus, y s y t denotes the height difference between s and t, providing a lower bound of the shortest path length.) max y s y t s.t. y s y u 5 x 1 : edge (s, u) y s y v 8 x 2 : edge (s, v) y u y v 1 x 3 : edge (u, v) y t + y u 6 x 4 : edge (u, t) y t + y v 2 x 5 : edge (v, t) 90 / 136
91 . Ḍual simplex method 91 / 136
92 . Revisiting Primal Simplex algorithm Consider the following Primal problem P: Primal simplex tabular: min x x 2 + 6x 3 s.t. x 1 + x 2 + x 3 4 x 1 2 x 3 3 3x 2 + x 3 6 x 1, x 2, x 3 0 x 1 x 2 x 3 x 4 x 5 x 6 x 7 -z= 0 c 1 = 1 c 2 =14 c 3 =6 c 4 =0 c 5 =0 c 6 =0 c 7 =0 x B1 = b 1 = x B2 = b 2 = x B3 = b 3 = x B4 = b 4 = Primal variables: x; Feasible: B 1 b 0. A basis B is called primal feasible if all elements in B 1 b (the first column except for z) are non-negative. 92 / 136
93 . Revisiting Primal Simplex algorithm cont d Now let s consider the Dual problem D: max 4y 1 + 2y 2 + 3y 3 + 6y 4 s.t. y 1 + y 2 1 y 1 + 3y 4 14 y 1 + y 3 + y 4 6 y 1, y 2, y 3, y 4 0 Primal simplex tabular: x 1 x 2 x 3 x 4 x 5 x 6 x 7 -z= 0 c 1 = 1 c 2 =14 c 3 =6 c 4 =0 c 5 =0 c 6 =0 c 7 =0 x B1 = b 1 = x B2 = b 2 = x B3 = b 3 = x B4 = b 4 = Dual variables: y T = c T B B 1 ; Feasible: y T A c T. A basis B is called dual feasible if all elements in c T = c c T B B 1 A = c T y T A (the first row except for z) are non-negative. 93 / 136
94 . Another view point of the Primal Simplex algorithm Thus another view point of the Primal Simplex algorithm can be described as:.1 Starting point: The Primal Simplex algorithm starts with a primal feasible solution (x B = B 1 b 0);.2 Maintenance: Throughout the process we maintain the primal feasibility and move towards the dual feasibility;.3 Stopping criteria: c T = c T c T B B 1 A 0, i.e., y T A c T. In other words, the iteration process ends when the basis is both primal feasible and dual feasible. 94 / 136
95 . Dual simplex works in just an opposite fashion Dual simplex:.1 Starting point: The Dual Simplex algorithm starts with a dual feasible solution (c T 0);.2 Maintenance: Throughout the process we maintain the dual feasibility and move towards the dual feasibility;.3 Stopping criteria: x B = B 1 b 0. In other words, the iteration process ends when the basis is both primal feasible and dual feasible. 95 / 136
96 . Primal simplex vs. Dual simplex Both primal simplex and dual simplex terminate at the same condition, i.e. the basis is both primal feasible and dual feasible. However, the final objective is achieved in totally opposite fashions the primal simplex method keeps the primal feasibility while the dual simplex method keeps the dual feasibility during the pivoting process. The primal simplex algorithm first selects an entering variable and then determines the leaving variable. In contrast, the dual simplex method does the opposite; it first selects a leaving variable and then determines an entering variable. 96 / 136
97 Dual simplex(b I, z, A, b, c) 1: //Dual simplex starts with a dual feasible basis. Here, B I contains the indices of the basic variables. 2: while TRUE do 3: if there is no index l (1 l m) has b l < 0 then 4: x =CalculateX(B I, A, b, c); 5: return (x, z); 6: end if; 7: choose an index l having b l < 0 according to a certain rule; 8: for each index j (1 i n) do 9: if a lj < 0 then 10: j = c j a lj ; 11: else 12: j = ; 13: end if 14: end for 15: choose an index e that minimizes j; 16: if e = then 17: return no feasible solution ; 18: end if 19: (B I, A, b, c, z) = Pivot(B I, A, b, c, z, e, l); 20: end while 97 / 136
98 . An example Standard form: Slack form: min 5x x x 3 s.t. x 1 x 2 x 3 2 x 1 3x 2 3 x 1, x 2, x 3 0 min 5x x x 3 s.t. x 1 x 2 x 3 + x 4 = 2 x 1 3x 2 + x 5 = 3 x 1, x 2, x 3, x 4, x / 136
99 . Step 1 x 1 x 2 x 3 x 4 x 5 -z= 0 c 1 = 5 c 2 =35 c 3 =20 c 4 =0 c 5 =0 x B1 = b 1 = x B2 = b 2 = Basis (in blue): B = {a 4, a 5 } [ B Solution: x = 1 ] b = [0, 0, 0, 2, 3] 0 T. Pivoting: choose a 5 to leave basis since b 2 = 3 < 0; choose c a 1 to enter basis since min j j,a2j <0 a 2j = c 1 a / 136
100 . Step 2 x 1 x 2 x 3 x 4 x 5 -z= -15 c 1 = 0 c 2 =20 c 3 =20 c 4 =0 c 5 =5 x B1 = b 1 = x B2 = b 2 = Basis (in blue): B = {a 1, a 4 } [ B Solution: x = 1 ] b = [3, 0, 0, 5, 0] 0 T. Pivoting: choose a 4 to leave basis since b 1 = 5 < 0; choose c a 2 to enter basis since min j j,a1j <0 a 1j = c 2 a / 136
101 . Step 3 x 1 x 2 x 3 x 4 x 5 -z= -40 c 1 = 0 c 2 =0 c 3 =15 c 4 =5 c 5 =10 x B1 = b 1 = x B2 = b 2 = Basis (in blue): B = {a 1, a 2 } [ B Solution: x = 1 ] b = [ 5 0 4, 3 4, 0, 0, 0]T. Pivoting: choose a 1 to leave basis since b 2 = 3 4 c a 3 to enter basis since min j j,a2j <0 a 2j = c 3 a 23. < 0; choose 101 / 136
102 . Step 4 x 1 x 2 x 3 x 4 x 5 -z= -55 c 1 = 20 c 2 =0 c 3 =0 c 4 =20 c 5 =5 x B1 = b 1 = x B2 = b 2 = Basis (in blue): B = {a 2, a 3 } [ B Solution: x = 1 ] b = [0, 1, 1, 0, 0] 0 T. Done! 102 / 136
103 . When dual simplex method is useful?.1 The dual simplex algorithm is most suited for problems for which an initial dual feasible solution is easily available..2 It is particularly useful for reoptimizing a problem after a constraint has been added or some parameters have been changed so that the previously optimal basis is no longer feasible..3 Trying dual simplex is particularly useful if your LP appears to be highly degenerate, i.e. there are many vertices of the feasible region for which the associated basis is degenerate. We may find that a large number of iterations (moves between adjacent vertices) occur with little or no improvement. 2 2 References: Operations Research Models and Methods, Paul A. Jensen and Jonathan F. Bard; OR-Notes, J. E. Beasley 103 / 136
104 ..Primal Dual: another Improvment approach 104 / 136
105 . Primal Dual method: a brief history In 1955, H. Kuhn proposed the Hungarian method for the MaxWeightedMatching problem. This method effectively explores the duality property of linear programming. In 1956, G. Dantzig, R. Ford, and D. Fulkerson extended this idea to solve linear programming problems. In 1957, R. Ford, and D. Fulkerson applied this idea to solve network-flow problem and Hitchcock problem. In 1957, J. Munkres applied this idea to solve the transportation problem. 105 / 136
106 . Primal Dual method: basic idea 106 / 136
107 . Primal Dual Method Primal Dual method is a dual method, which exploits the lower bound information in subsequent linear programming operations. Advantages:.1 Unlike dual simplex starting from a dual basic feasible solution, primal dual method requires only a dual feasible solution..2 An optimal solution to DRP usually has combinatorial explanation, especially for graph-theory problems.. Primal P x Dual D y Step 1: y x Restricted Primal RP x Step 2: x y Dual of RP DRP y Step 3: y = y+θ y 107 / 136
108 . Basic idea of primal dual method Primal P: min c 1x 1 + c 2x c nx n s.t. a 11x 1 + a 12x a 1nx n = b 1 (y 1) a 21 x 1 + a 22 x a 2n x n = b 2 (y 2 )... a m1 x 1 + a m2 x a mn x n = b m (y m ) x 1, x 2,..., x n 0 Dual D: max b 1 y 1 + b 2 y b m y m s.t. a 11y 1 + a 21y a m1y m c 1 a 12y 1 + a 22y a m2y m c 2... a 1n y 1 + a 2n y a mn y m c n Basic idea: Suppose we are given a dual feasible solution y. Let s verify whether y is an optimal solution or not:.1 If y is an optimal solution to the dual problem D, then the corresponding primal variables x should satisfy a restricted primal problem called RP ;.2 Furthermore, even if y is not optimal, the solution to the dual of RP (called DRP ) still provide invaluable. information.. it. can. be. 108 / 136 used to improve y.
109 . Step 1: y x I Dual problem D: max b 1 y 1 + b 2 y b m y m s.t. a 11 y 1 + a 21 y a m1 y m c 1 ( = x 1 0)... a 1n y 1 + a 2n y a mn y m c n ( < x n = 0) y provides information of the corresponding primal variables x:.1 Given a dual feasible solution y. Let s check whether y is optimal solution or not..2 If y is optimal, we have the following restrictions on x: a 1i y 1 + a 2i y a mi y m <c i x i = 0 (Reason: complement slackness. An optimal solution y satisfies (a 1i y 1 + a 2i y a mi y m c i ) x i = 0).3 Let s use J to record the index of tight constraints where = holds. 109 / 136
110 . Step 1: y x II 110 / 136
111 . Step 1: y x III.4 Thus the corresponding primal solution x should satisfy the following restricted primal (RP):.5 RP: a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2... a m1 x 1 + a m2 x a mn x n = b m x i = 0 i / J x i 0 i J.6 In other words, the optimality of y is determined via solving RP. 111 / 136
112 . But how to solve RP? I RP: a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2... a m1 x 1 + a m2 x a mn x n = b m x i = 0 i / J x i 0 i J How to solve RP? Recall that Ax = b, x 0 can be solved via solving an extended LP. 112 / 136
113 . But how to solve RP? II RP (extended through introducing slack variables): min ϵ = s 1 +s s m s.t. s 1 +a 11 x a 1n x n = b 1 s 2 +a 21 x a 2n x n = b s m +a m1 x a mn x n = b m x i = 0 i x i 0 i s i 0.1 If ϵ OP T = 0, then we find a feasible solution to RP, implying that y is an optimal solution;.2 If ϵ OP T > 0, y is not an optimal solution. 113 / 136
114 . Step 2: x y I Alternatively, we can solve the dual of RP, called DRP : max w = b 1 y 1 + b 2 y b m y m s.t. a 11 y 1 + a 21 y a m1 y m 0 a 12 y 1 + a 22 y a m2 y m 0... a 1 J y 1 + a 2 J y a m J y m 0 y 1, y 2,, y m 1.1 If w OP T = 0, y is an optimal solution.2 If w OP T > 0, y is not an optimal solution. However, the optimal solution still provides useful information the optimal solution to DRP can be used to improve y. 114 / 136
115 . The difference between DRP and D Dual problem D: DRP: max b 1 y 1 + b 2 y b m y m s.t. a 11 y 1 + a 21 y a m1 y m c 1 a 12 y 1 + a 22 y a m2 y m c 2... a 1n y 1 + a 2n y a mn y m c n max w = b 1 y 1 + b 2 y b m y m s.t. a 11 y 1 + a 21 y a m1 y m 0 a 12 y 1 + a 22 y a m2 y m 0... a 1 J y 1 + a 2 J y a m J y m 0 y 1, y 2,, y m 1 How to write DRP from D? Replacing c i with 0; Only J restrictions in DRP ; An additional restriction: y 1, y 2,..., y m 1; 115 / 136
116 . Step 3: y y I Why y can be used to improve y? Consider an improved dual solution y = y+θ y, θ > 0. We have: Objective function: Since y T b = w OP T > 0, y T b = y T b + θw OP T > y T b. In other words, (y + θ y) is better than y. Constraints: The dual feasibility requires that: For any j J, a 1j y 1 + a 2j y a mj y m 0. Thus we have y T a j = y T a j + θ y T a j c j for any θ > 0. For any j / J, there are two cases: 116 / 136
117 . Step 3: y y II.1 j / J, a 1j y 1 + a 2j y a mj y m 0: Thus y is feasible for any θ > 0 since for 1 j n, a 1j y 1 + a 2j y a mj y m (6) = a 1j y 1 + a 2j y a mj y m (7) + θ(a 1j y 1 + a 2j y a mj y m ) (8) c j (9) Hence dual problem D is unbounded and the primal problem P is infeasible..2 j / J, a 1j y 1 + a 2j y a mj y m > 0: We can safely set θ c j (a 1j y 1 +a 2j y a mj y m ) a 1j y 1 +a 2j y a mj y m to guarantee that y T a j = y T a j + θ y T a j c j. = c j y T a j y T a j 117 / 136
118 . Primal Dual algorithm 1: Infeasible = No Optimal = No y = y 0 ; //y 0 is a feasible solution to the dual problem D 2: while TRUE do 3: Finding tight constraints index J, and set corresponding x j = 0 for j / J. 4: Thus we have a smaller RP. 5: Solve DRP. Denote the solution as y. 6: if DRP objective function w OP T = 0 then 7: Optimal= Yes 8: return y; 9: end if 10: if y T a j 0 (for all j / J) then 11: Infeasible = Yes ; 12: return ; 13: end if 14: Set θ = min c j y T a j y T a j for y T a j > 0, j / J. 15: Update y as y = y + θ y; 16: end while 118 / 136
119 . Advantages of Primal Dual algorithm Some facts: Primal dual algorithm ends if using anti-cycling rule. (Reason: the objective value y T b increases if there is no degeneracy.) Both RP and DRP do not explicitly rely on c. In fact, the information of c is represented in J. This leads to another advantage of primal dual technique, i.e., RP is usually a purely combinatorial problem. Take ShortestPath as an example. RP corresponds to a connection problem. More and more constraints become tight in the primal dual process. (See Lecture 10 for a primal dual algorithm for MaximumFlow problem. ) 119 / 136
120 . ShortestPath: Dijkstra s algorithm is essentially Primal Dual. algorithm 120 / 136
121 . ShorestPath problem s x 1 5 u x 2. 8 x 3 1 x 4 6 v x 5 2 t Primal problem: relax the 0/1 integer linear program into linear program by the totally uni-modular property. min 5x 1 + 8x 2 + 1x 3 + 6x 4 + 2x 5 s.t. x 1 + x 2 = 1 vertex s x 4 x 5 = 1 vertex t x 1 + x 3 + x 4 = 0 vertex u x 2 x 3 + x 5 = 0 vertex v x 1, x 2, x 3, x 4, x 5 0 x 1, x 2, x 3, x 4, x / 136
122 . Dual of ShortestPath problem height y s s y 5 u u. 8 1 y v y t v 6 2 t Dual problem: set variables for cities. (Intuition: y i means the height of city i; thus, y s y t denotes the height difference between s and t, providing a lower bound of the shortest path length.) max y s y t s.t. y s y u 5 x 1 : edge (s, u) y s y v 8 x 2 : edge (s, v) y u y v 1 x 3 : edge (u, v) y t + y u 6 x 4 : edge (u, t) y t + y v 2 x 5 : edge (v, t) 122 / 136
123 . A simplified version height y s s y 5 u u. 8 1 y v y t v 6 2 t Dual problem: simplify by setting y t = 0 (and remove the 2nd constraint in the primal problem P, accordingly) max y s s.t. y s y u 5 x 1 : edge (s, u) y s y v 8 x 2 : edge (s, v) y u y v 1 x 3 : edge (u, v) y u 6 x 4 : edge (u, t) y v 2 x 5 : edge (v, t) 123 / 136
124 . Iteration 1 I Dual feasible solution: y T = (0, 0, 0). Let s check the constraints in D: y s y u < 5 x 1 = 0 y s y v < 8 x 2 = 0 y u y v < 1 x 3 = 0 y u < 6 x 4 = 0 y v < 2 x 5 = 0 Identifying tight constraints in D: J = Φ, implying that x 1, x 2, x 3, x 4, x 5 = 0. RP: min s 1 +s 2 +s 3 s.t. s 1 +x 1 +x 2 = 1 node s s 2 x 1 +x 3 +x 4 = 0 node u s 3 x 2 x 3 +x 5 = 0 node v s 1, s 2, s 3, 0 x 1, x 2, x 3, x 4, x 5 = / 136
125 . Iteration 1 II DRP : max y s s.t. y s 1 y u 1 y v 1 Solve DRP using combinatorial technique: optimal solution y T = (1, 0, 0). Note: the optimal solution is not unique Step length θ: θ = min{ c 1 y T a 1 y T a 1, c 2 y T a 2 y T a 2 } = min{5, 8} = 5 Update y: y T = y T + θ y T = (5, 0, 0). 125 / 136
126 . Iteration 1 III height y s = 5 s 5 8 y u = y v = y t = 0.u. v 1 2 t 6 From the point of view of Dijkstra s algorithm: Optimal solution to DRP is y T = (1, 0, 0): the explored vertex set S = {s} in Dijkstra s algorithm. In fact, DRP is solved via identifying the nodes reachable from s. Step length θ = min{ c 1 y T a 1 y T a 1, c 2 y T a 2 y T a 2 } = min{5, 8} = 5: finding the closest vertex to the nodes in S via comparing all edges going out from S. 126 / 136
127 . Iteration 2 I Dual feasible solution: y T = (5, 0, 0). Let s check the constraints in D: y s y u = 5 y s y v < 8 x 2 = 0 y u y v < 1 x 3 = 0 y u < 6 x 4 = 0 y v < 2 x 5 = 0 Identifying tight constraints in D: J = {1}, implying that x 2, x 3, x 4, x 5 = 0. RP: min s 1 +s 2 +s 3 s.t. s 1 +x 1 +x 2 = 1 node s s 2 x 1 +x 3 +x 4 = 0 node u s 3 x 2 x 3 +x 5 = 0 node v s 1, s 2, s 3, 0 x 2, x 3, x 4, x 5 = / 136
CS711008Z Algorithm Design and Analysis
.. CS711008Z Algorithm Design and Analysis Lecture 9. Algorithm design technique: Linear programming and duality Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China
More informationPart IB Optimisation
Part IB Optimisation Theorems Based on lectures by F. A. Fischer Notes taken by Dexter Chua Easter 2015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after
More informationMachine Learning. Support Vector Machines. Manfred Huber
Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data
More informationCS711008Z Algorithm Design and Analysis
CS711008Z Algorithm Design and Analysis Lecture 8 Linear programming: interior point method Dongbo Bu Institute of Computing Technology Chinese Academy of Sciences, Beijing, China 1 / 31 Outline Brief
More informationOptimality, Duality, Complementarity for Constrained Optimization
Optimality, Duality, Complementarity for Constrained Optimization Stephen Wright University of Wisconsin-Madison May 2014 Wright (UW-Madison) Optimality, Duality, Complementarity May 2014 1 / 41 Linear
More informationICS-E4030 Kernel Methods in Machine Learning
ICS-E4030 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 28. September, 2016 Juho Rousu 28. September, 2016 1 / 38 Convex optimization Convex optimisation This
More informationMachine Learning. Lecture 6: Support Vector Machine. Feng Li.
Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)
More informationConvex Optimization and Modeling
Convex Optimization and Modeling Duality Theory and Optimality Conditions 5th lecture, 12.05.2010 Jun.-Prof. Matthias Hein Program of today/next lecture Lagrangian and duality: the Lagrangian the dual
More informationLectures 6, 7 and part of 8
Lectures 6, 7 and part of 8 Uriel Feige April 26, May 3, May 10, 2015 1 Linear programming duality 1.1 The diet problem revisited Recall the diet problem from Lecture 1. There are n foods, m nutrients,
More informationLecture Notes on Support Vector Machine
Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is
More informationSupport Vector Machines: Maximum Margin Classifiers
Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 16, 2008 Piotr Mirowski Based on slides by Sumit Chopra and Fu-Jie Huang 1 Outline What is behind
More informationLinear Programming: Simplex
Linear Programming: Simplex Stephen J. Wright 1 2 Computer Sciences Department, University of Wisconsin-Madison. IMA, August 2016 Stephen Wright (UW-Madison) Linear Programming: Simplex IMA, August 2016
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1396 1 / 44 Table
More informationCS-E4830 Kernel Methods in Machine Learning
CS-E4830 Kernel Methods in Machine Learning Lecture 3: Convex optimization and duality Juho Rousu 27. September, 2017 Juho Rousu 27. September, 2017 1 / 45 Convex optimization Convex optimisation This
More informationConvex Optimization & Lagrange Duality
Convex Optimization & Lagrange Duality Chee Wei Tan CS 8292 : Advanced Topics in Convex Optimization and its Applications Fall 2010 Outline Convex optimization Optimality condition Lagrange duality KKT
More information2.098/6.255/ Optimization Methods Practice True/False Questions
2.098/6.255/15.093 Optimization Methods Practice True/False Questions December 11, 2009 Part I For each one of the statements below, state whether it is true or false. Include a 1-3 line supporting sentence
More informationNonlinear Programming and the Kuhn-Tucker Conditions
Nonlinear Programming and the Kuhn-Tucker Conditions The Kuhn-Tucker (KT) conditions are first-order conditions for constrained optimization problems, a generalization of the first-order conditions we
More informationExtreme Abridgment of Boyd and Vandenberghe s Convex Optimization
Extreme Abridgment of Boyd and Vandenberghe s Convex Optimization Compiled by David Rosenberg Abstract Boyd and Vandenberghe s Convex Optimization book is very well-written and a pleasure to read. The
More informationSupport Vector Machines, Kernel SVM
Support Vector Machines, Kernel SVM Professor Ameet Talwalkar Professor Ameet Talwalkar CS260 Machine Learning Algorithms February 27, 2017 1 / 40 Outline 1 Administration 2 Review of last lecture 3 SVM
More informationDuality. Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities
Duality Lagrange dual problem weak and strong duality optimality conditions perturbation and sensitivity analysis generalized inequalities Lagrangian Consider the optimization problem in standard form
More informationI.3. LMI DUALITY. Didier HENRION EECI Graduate School on Control Supélec - Spring 2010
I.3. LMI DUALITY Didier HENRION henrion@laas.fr EECI Graduate School on Control Supélec - Spring 2010 Primal and dual For primal problem p = inf x g 0 (x) s.t. g i (x) 0 define Lagrangian L(x, z) = g 0
More informationSupport Vector Machines
Support Vector Machines Support vector machines (SVMs) are one of the central concepts in all of machine learning. They are simply a combination of two ideas: linear classification via maximum (or optimal
More informationConvex Optimization M2
Convex Optimization M2 Lecture 3 A. d Aspremont. Convex Optimization M2. 1/49 Duality A. d Aspremont. Convex Optimization M2. 2/49 DMs DM par email: dm.daspremont@gmail.com A. d Aspremont. Convex Optimization
More information5. Duality. Lagrangian
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationConstrained Optimization and Lagrangian Duality
CIS 520: Machine Learning Oct 02, 2017 Constrained Optimization and Lagrangian Duality Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may or may
More informationLinear Programming Duality
Summer 2011 Optimization I Lecture 8 1 Duality recap Linear Programming Duality We motivated the dual of a linear program by thinking about the best possible lower bound on the optimal value we can achieve
More informationSupport Vector Machines
EE 17/7AT: Optimization Models in Engineering Section 11/1 - April 014 Support Vector Machines Lecturer: Arturo Fernandez Scribe: Arturo Fernandez 1 Support Vector Machines Revisited 1.1 Strictly) Separable
More informationLinear & nonlinear classifiers
Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationLecture: Duality of LP, SOCP and SDP
1/33 Lecture: Duality of LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2017.html wenzw@pku.edu.cn Acknowledgement:
More informationLecture 18: Optimization Programming
Fall, 2016 Outline Unconstrained Optimization 1 Unconstrained Optimization 2 Equality-constrained Optimization Inequality-constrained Optimization Mixture-constrained Optimization 3 Quadratic Programming
More informationLecture 9: Large Margin Classifiers. Linear Support Vector Machines
Lecture 9: Large Margin Classifiers. Linear Support Vector Machines Perceptrons Definition Perceptron learning rule Convergence Margin & max margin classifiers (Linear) support vector machines Formulation
More informationThe Primal-Dual Algorithm P&S Chapter 5 Last Revised October 30, 2006
The Primal-Dual Algorithm P&S Chapter 5 Last Revised October 30, 2006 1 Simplex solves LP by starting at a Basic Feasible Solution (BFS) and moving from BFS to BFS, always improving the objective function,
More informationThe Lagrangian L : R d R m R r R is an (easier to optimize) lower bound on the original problem:
HT05: SC4 Statistical Data Mining and Machine Learning Dino Sejdinovic Department of Statistics Oxford Convex Optimization and slides based on Arthur Gretton s Advanced Topics in Machine Learning course
More informationSupport Vector Machines and Kernel Methods
2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University
More informationSupport Vector Machines
Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)
More informationStatistical Machine Learning from Data
Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique
More informationFoundation of Intelligent Systems, Part I. SVM s & Kernel Methods
Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:
More informationMachine Learning Support Vector Machines. Prof. Matteo Matteucci
Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way
More informationYinyu Ye, MS&E, Stanford MS&E310 Lecture Note #06. The Simplex Method
The Simplex Method Yinyu Ye Department of Management Science and Engineering Stanford University Stanford, CA 94305, U.S.A. http://www.stanford.edu/ yyye (LY, Chapters 2.3-2.5, 3.1-3.4) 1 Geometry of Linear
More informationMath 5593 Linear Programming Week 1
University of Colorado Denver, Fall 2013, Prof. Engau 1 Problem-Solving in Operations Research 2 Brief History of Linear Programming 3 Review of Basic Linear Algebra Linear Programming - The Story About
More informationMotivating examples Introduction to algorithms Simplex algorithm. On a particular example General algorithm. Duality An application to game theory
Instructor: Shengyu Zhang 1 LP Motivating examples Introduction to algorithms Simplex algorithm On a particular example General algorithm Duality An application to game theory 2 Example 1: profit maximization
More informationConvex Optimization Boyd & Vandenberghe. 5. Duality
5. Duality Convex Optimization Boyd & Vandenberghe Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationConvex Optimization and Support Vector Machine
Convex Optimization and Support Vector Machine Problem 0. Consider a two-class classification problem. The training data is L n = {(x 1, t 1 ),..., (x n, t n )}, where each t i { 1, 1} and x i R p. We
More informationIntroduction to Support Vector Machines
Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear
More informationMax Margin-Classifier
Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization
More informationSupport Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012
Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2
More information14. Duality. ˆ Upper and lower bounds. ˆ General duality. ˆ Constraint qualifications. ˆ Counterexample. ˆ Complementary slackness.
CS/ECE/ISyE 524 Introduction to Optimization Spring 2016 17 14. Duality ˆ Upper and lower bounds ˆ General duality ˆ Constraint qualifications ˆ Counterexample ˆ Complementary slackness ˆ Examples ˆ Sensitivity
More informationChapter 1 Linear Programming. Paragraph 5 Duality
Chapter 1 Linear Programming Paragraph 5 Duality What we did so far We developed the 2-Phase Simplex Algorithm: Hop (reasonably) from basic solution (bs) to bs until you find a basic feasible solution
More informationSupport Vector Machine (continued)
Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need
More informationOPTIMISATION /09 EXAM PREPARATION GUIDELINES
General: OPTIMISATION 2 2008/09 EXAM PREPARATION GUIDELINES This points out some important directions for your revision. The exam is fully based on what was taught in class: lecture notes, handouts and
More informationOutline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22
Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems
More informationOptimization 4. GAME THEORY
Optimization GAME THEORY DPK Easter Term Saddle points of two-person zero-sum games We consider a game with two players Player I can choose one of m strategies, indexed by i =,, m and Player II can choose
More informationComputational Game Theory Spring Semester, 2005/6. Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg*
Computational Game Theory Spring Semester, 2005/6 Lecture 5: 2-Player Zero Sum Games Lecturer: Yishay Mansour Scribe: Ilan Cohen, Natan Rubin, Ophir Bleiberg* 1 5.1 2-Player Zero Sum Games In this lecture
More informationMotivation. Lecture 2 Topics from Optimization and Duality. network utility maximization (NUM) problem:
CDS270 Maryam Fazel Lecture 2 Topics from Optimization and Duality Motivation network utility maximization (NUM) problem: consider a network with S sources (users), each sending one flow at rate x s, through
More informationStructural and Multidisciplinary Optimization. P. Duysinx and P. Tossings
Structural and Multidisciplinary Optimization P. Duysinx and P. Tossings 2018-2019 CONTACTS Pierre Duysinx Institut de Mécanique et du Génie Civil (B52/3) Phone number: 04/366.91.94 Email: P.Duysinx@uliege.be
More information(Kernels +) Support Vector Machines
(Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning
More informationEE/AA 578, Univ of Washington, Fall Duality
7. Duality EE/AA 578, Univ of Washington, Fall 2016 Lagrange dual problem weak and strong duality geometric interpretation optimality conditions perturbation and sensitivity analysis examples generalized
More informationLecture: Duality.
Lecture: Duality http://bicmr.pku.edu.cn/~wenzw/opt-2016-fall.html Acknowledgement: this slides is based on Prof. Lieven Vandenberghe s lecture notes Introduction 2/35 Lagrange dual problem weak and strong
More informationLinear and Combinatorial Optimization
Linear and Combinatorial Optimization The dual of an LP-problem. Connections between primal and dual. Duality theorems and complementary slack. Philipp Birken (Ctr. for the Math. Sc.) Lecture 3: Duality
More informationGame Theory: Lecture 2
Game Theory: Lecture 2 Tai-Wei Hu June 29, 2011 Outline Two-person zero-sum games normal-form games Minimax theorem Simplex method 1 2-person 0-sum games 1.1 2-Person Normal Form Games A 2-person normal
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More information15-780: LinearProgramming
15-780: LinearProgramming J. Zico Kolter February 1-3, 2016 1 Outline Introduction Some linear algebra review Linear programming Simplex algorithm Duality and dual simplex 2 Outline Introduction Some linear
More informationEE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015
EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,
More informationSupport Vector Machines
Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized
More informationAppendix A Taylor Approximations and Definite Matrices
Appendix A Taylor Approximations and Definite Matrices Taylor approximations provide an easy way to approximate a function as a polynomial, using the derivatives of the function. We know, from elementary
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationLecture Support Vector Machine (SVM) Classifiers
Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in
More informationPart IB Optimisation
Part IB Optimisation Based on lectures by F. A. Fischer Notes taken by Dexter Chua Easter 015 These notes are not endorsed by the lecturers, and I have modified them (often significantly) after lectures.
More information"SYMMETRIC" PRIMAL-DUAL PAIR
"SYMMETRIC" PRIMAL-DUAL PAIR PRIMAL Minimize cx DUAL Maximize y T b st Ax b st A T y c T x y Here c 1 n, x n 1, b m 1, A m n, y m 1, WITH THE PRIMAL IN STANDARD FORM... Minimize cx Maximize y T b st Ax
More informationCO 250 Final Exam Guide
Spring 2017 CO 250 Final Exam Guide TABLE OF CONTENTS richardwu.ca CO 250 Final Exam Guide Introduction to Optimization Kanstantsin Pashkovich Spring 2017 University of Waterloo Last Revision: March 4,
More informationToday: Linear Programming (con t.)
Today: Linear Programming (con t.) COSC 581, Algorithms April 10, 2014 Many of these slides are adapted from several online sources Reading Assignments Today s class: Chapter 29.4 Reading assignment for
More informationSupport Vector Machines
Support Vector Machines Sridhar Mahadevan mahadeva@cs.umass.edu University of Massachusetts Sridhar Mahadevan: CMPSCI 689 p. 1/32 Margin Classifiers margin b = 0 Sridhar Mahadevan: CMPSCI 689 p.
More informationg(x,y) = c. For instance (see Figure 1 on the right), consider the optimization problem maximize subject to
1 of 11 11/29/2010 10:39 AM From Wikipedia, the free encyclopedia In mathematical optimization, the method of Lagrange multipliers (named after Joseph Louis Lagrange) provides a strategy for finding the
More informationA Brief Review on Convex Optimization
A Brief Review on Convex Optimization 1 Convex set S R n is convex if x,y S, λ,µ 0, λ+µ = 1 λx+µy S geometrically: x,y S line segment through x,y S examples (one convex, two nonconvex sets): A Brief Review
More informationLecture: Algorithms for LP, SOCP and SDP
1/53 Lecture: Algorithms for LP, SOCP and SDP Zaiwen Wen Beijing International Center For Mathematical Research Peking University http://bicmr.pku.edu.cn/~wenzw/bigdata2018.html wenzw@pku.edu.cn Acknowledgement:
More informationSupport Vector Machines for Regression
COMP-566 Rohan Shah (1) Support Vector Machines for Regression Provided with n training data points {(x 1, y 1 ), (x 2, y 2 ),, (x n, y n )} R s R we seek a function f for a fixed ɛ > 0 such that: f(x
More informationLagrange Duality. Daniel P. Palomar. Hong Kong University of Science and Technology (HKUST)
Lagrange Duality Daniel P. Palomar Hong Kong University of Science and Technology (HKUST) ELEC5470 - Convex Optimization Fall 2017-18, HKUST, Hong Kong Outline of Lecture Lagrangian Dual function Dual
More informationLecture 6: Conic Optimization September 8
IE 598: Big Data Optimization Fall 2016 Lecture 6: Conic Optimization September 8 Lecturer: Niao He Scriber: Juan Xu Overview In this lecture, we finish up our previous discussion on optimality conditions
More informationSupport Vector Machines for Classification and Regression
CIS 520: Machine Learning Oct 04, 207 Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are designed to be a supplement to the lecture. They may
More informationIntroduction to Mathematical Programming IE406. Lecture 10. Dr. Ted Ralphs
Introduction to Mathematical Programming IE406 Lecture 10 Dr. Ted Ralphs IE406 Lecture 10 1 Reading for This Lecture Bertsimas 4.1-4.3 IE406 Lecture 10 2 Duality Theory: Motivation Consider the following
More informationSupport Vector Machine
Support Vector Machine Kernel: Kernel is defined as a function returning the inner product between the images of the two arguments k(x 1, x 2 ) = ϕ(x 1 ), ϕ(x 2 ) k(x 1, x 2 ) = k(x 2, x 1 ) modularity-
More informationIntroduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur
Introduction to Machine Learning Prof. Sudeshna Sarkar Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module - 5 Lecture - 22 SVM: The Dual Formulation Good morning.
More informationKernel Methods and Support Vector Machines
Kernel Methods and Support Vector Machines Oliver Schulte - CMPT 726 Bishop PRML Ch. 6 Support Vector Machines Defining Characteristics Like logistic regression, good for continuous input features, discrete
More informationKernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning
Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:
More informationConvex Optimization. Dani Yogatama. School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. February 12, 2014
Convex Optimization Dani Yogatama School of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA February 12, 2014 Dani Yogatama (Carnegie Mellon University) Convex Optimization February 12,
More informationData Mining. Linear & nonlinear classifiers. Hamid Beigy. Sharif University of Technology. Fall 1396
Data Mining Linear & nonlinear classifiers Hamid Beigy Sharif University of Technology Fall 1396 Hamid Beigy (Sharif University of Technology) Data Mining Fall 1396 1 / 31 Table of contents 1 Introduction
More informationAn introductory example
CS1 Lecture 9 An introductory example Suppose that a company that produces three products wishes to decide the level of production of each so as to maximize profits. Let x 1 be the amount of Product 1
More informationCONSTRAINED NONLINEAR PROGRAMMING
149 CONSTRAINED NONLINEAR PROGRAMMING We now turn to methods for general constrained nonlinear programming. These may be broadly classified into two categories: 1. TRANSFORMATION METHODS: In this approach
More informationSupport Vector Machines for Classification and Regression. 1 Linearly Separable Data: Hard Margin SVMs
E0 270 Machine Learning Lecture 5 (Jan 22, 203) Support Vector Machines for Classification and Regression Lecturer: Shivani Agarwal Disclaimer: These notes are a brief summary of the topics covered in
More informationGame Theory. Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin
Game Theory Greg Plaxton Theory in Programming Practice, Spring 2004 Department of Computer Science University of Texas at Austin Bimatrix Games We are given two real m n matrices A = (a ij ), B = (b ij
More informationOPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES
General: OPTIMISATION 2007/8 EXAM PREPARATION GUIDELINES This points out some important directions for your revision. The exam is fully based on what was taught in class: lecture notes, handouts and homework.
More informationLecture 2: Linear SVM in the Dual
Lecture 2: Linear SVM in the Dual Stéphane Canu stephane.canu@litislab.eu São Paulo 2015 July 22, 2015 Road map 1 Linear SVM Optimization in 10 slides Equality constraints Inequality constraints Dual formulation
More informationSupport Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington
Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification
More informationConvex Optimization M2
Convex Optimization M2 Lecture 8 A. d Aspremont. Convex Optimization M2. 1/57 Applications A. d Aspremont. Convex Optimization M2. 2/57 Outline Geometrical problems Approximation problems Combinatorial
More informationChap 2. Optimality conditions
Chap 2. Optimality conditions Version: 29-09-2012 2.1 Optimality conditions in unconstrained optimization Recall the definitions of global, local minimizer. Geometry of minimization Consider for f C 1
More informationSupport Vector Machine (SVM) and Kernel Methods
Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin
More informationDEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION. Part I: Short Questions
DEPARTMENT OF STATISTICS AND OPERATIONS RESEARCH OPERATIONS RESEARCH DETERMINISTIC QUALIFYING EXAMINATION Part I: Short Questions August 12, 2008 9:00 am - 12 pm General Instructions This examination is
More informationIII. Linear Programming
III. Linear Programming Thomas Sauerwald Easter 2017 Outline Introduction Standard and Slack Forms Formulating Problems as Linear Programs Simplex Algorithm Finding an Initial Solution III. Linear Programming
More informationLINEAR CLASSIFIERS. J. Elder CSE 4404/5327 Introduction to Machine Learning and Pattern Recognition
LINEAR CLASSIFIERS Classification: Problem Statement 2 In regression, we are modeling the relationship between a continuous input variable x and a continuous target variable t. In classification, the input
More information