Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES

Size: px
Start display at page:

Download "Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating Support Vector Machines and Covariance-Matrix Adaptation-ES"

Transcription

1 Covariance Matrix Adaptation-Evolution Strategy Surrogate models for Single and Multi-Objective Stochastic Optimization: Integrating and Covariance-Matrix Adaptation-ES Ilya Loshchilov, Marc Schoenauer, Michèle Sebag TAO CNRS INRIA Univ. Paris-Sud May 23rd, 2011 Michèle Sebag Surrogate optimization: SVM for CMA 1/ 47

2 Motivations Find Argmin {F : X IR} Context: ill-posed optimization problems Function F (fitness function) on X IR d Gradient not available or not useful F available as an oracle (black box) continuous Build {x 1,x 2,...} Argmin(F) Black-box approaches + Applicable + Robust comparison-based approaches are invariant High computational costs: number of function evaluations Michèle Sebag Surrogate optimization: SVM for CMA 2/ 47

3 Surrogate optimization Principle Gather E = {(x i,f(x i ))} Build ˆF from E training set learn surrogate model Use surrogate model ˆF for some time: Optimization: use ˆF instead of true F in std algo Filtering: select promising x i based on ˆF in population-based algo. Compute F(x i ) for some x i Update ˆF Iterate Michèle Sebag Surrogate optimization: SVM for CMA 3/ 47

4 Surrogate optimization, cont Issues Learning Hypothesis space (polynoms, neural nets, Gaussian processes,...) Selection of training set (prune, update,...) What is the learning target? Interaction of Learning & Optimization modules Schedule (when to relearn) How to use ˆF to support optimization search How to use search results to support learning ˆF This talk Using Covariance-Matrix Estimation within Support Vector Machines Using SVM for multi-objective optimization Michèle Sebag Surrogate optimization: SVM for CMA 4/ 47

5 Content 1 Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization 2 Statistical Machine Learning Linear classifiers The kernel trick 3 Previous Work Mixing Rank-SVM and Local Information Experiments 4 Background Dominance-based Surrogate Experimental Validation Michèle Sebag Surrogate optimization: SVM for CMA 5/ 47

6 Stochastic Search Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization A black box search template to minimize f : R n R Initialize distribution parameters θ, set sample size λ N While not terminate 1 Sample distribution P (x θ) x 1,...,x λ R n 2 Evaluate x 1,...,x λ on f 3 Update parameters θ F θ (θ,x 1,...,x λ,f(x 1 ),...,f(x λ )) Covers Deterministic algorithms, Evolutionary Algorithms, PSO, DE P implicitly defined by the variation operators Estimation of Distribution Algorithms Michèle Sebag Surrogate optimization: SVM for CMA 6/ 47

7 The (µ, λ) Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization Gaussian Mutations x i m+σn i (0,C) for i = 1,...,λ as perturbations of m where x i,m R n, σ R +, and C R n n where the mean vector m R n represents the favorite solution the so-called step-size σ R + controls the step length the covariance matrix C R n n determines the shape of the distribution ellipsoid How to update m, σ, and C? Michèle Sebag Surrogate optimization: SVM for CMA 7/ 47

8 History Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization The one-fifth rule Rechenberg, 73 One single parameter σ for the whole population Measure empirical success rate Increase σ if too large, decrease σ if too small Often wrong in non-smooth landscapes Self-adaptive mutations Schwefel, 81 Each individual carries its own mutation parameter Log-normal mutation of mutation parameters (Normal) mutation of individual Adaptation is slow for full covariance case from 1 to n2 n 2 Michèle Sebag Surrogate optimization: SVM for CMA 8/ 47

9 Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization Cumulative Step-Size Adaptation (CSA) x i = m+σy i m m+σy w Measure the length of the evolution path the pathway of the mean vector m in the generation sequence decrease σ increase σ loosely speaking steps are perpendicular under random selection (in expectation) perpendicular in the desired situation (to be most efficient) Michèle Sebag Surrogate optimization: SVM for CMA 9/ 47

10 Covariance Matrix Adaptation Rank-One Update Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization m m+σy w, y w = µ i=1 w iy i:λ, y i N i (0,C) initial distribution, C = I new distribution: C 0.8 C+0.2 y w y T w ruling principle: the adaptation increases the probability of successful steps, y w, to appear again Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

11 Covariance Matrix Adaptation Rank-One Update Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization m m+σy w, y w = µ i=1 w iy i:λ, y i N i (0,C) y w, movement of the population mean m (disregarding σ) new distribution: C 0.8 C+0.2 y w y T w ruling principle: the adaptation increases the probability of successful steps, y w, to appear again Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

12 Covariance Matrix Adaptation Rank-One Update Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization m m+σy w, y w = µ i=1 w iy i:λ, y i N i (0,C) mixture of distribution C and step y w, C 0.8 C+0.2 y w y T w new distribution: C 0.8 C+0.2 y w y T w ruling principle: the adaptation increases the probability of successful steps, y w, to appear again Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

13 Covariance Matrix Adaptation Rank-One Update Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization m m+σy w, y w = µ i=1 w iy i:λ, y i N i (0,C) new distribution (disregarding σ) new distribution: C 0.8 C+0.2 y w y T w ruling principle: the adaptation increases the probability of successful steps, y w, to appear again Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

14 Covariance Matrix Adaptation Rank-One Update Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization m m+σy w, y w = µ i=1 w iy i:λ, y i N i (0,C) movement of the population mean m new distribution: C 0.8 C+0.2 y w y T w ruling principle: the adaptation increases the probability of successful steps, y w, to appear again Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

15 Covariance Matrix Adaptation Rank-One Update Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization m m+σy w, y w = µ i=1 w iy i:λ, y i N i (0,C) mixture of distribution C and step y w, C 0.8 C+0.2 y w y T w new distribution: C 0.8 C+0.2 y w y T w ruling principle: the adaptation increases the probability of successful steps, y w, to appear again Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

16 Covariance Matrix Adaptation Rank-One Update Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization m m+σy w, y w = µ i=1 w iy i:λ, y i N i (0,C) new distribution: C 0.8 C+0.2 y w y T w ruling principle: the adaptation increases the probability of successful steps, y w, to appear again Michèle Sebag Surrogate optimization: SVM for CMA 10/ 47

17 Rank-µ Update Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization x i = m + σy i, y i N i(0,c), m m + σy w y w = µ i=1 wi y i:λ x i = m + σ y i, y i N (0,C) Cµ = µ 1 yi:λ y T i:λ C (1 1) C + 1 Cµ mnew m + 1 µ yi:λ sampling of λ = 150 solutions where C = I and σ = 1 calculating C from µ = 50 points, w 1 = = w µ = 1 µ new distribution Remark: the old (sample) distribution shape has a great influence on the new distribution iterations needed Michèle Sebag Surrogate optimization: SVM for CMA 11/ 47

18 Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization Invariance: Guarantee for Generalization Invariance properties of CMA-ES Invariance to order preserving transformations in function space like all comparison-based algorithms Translation and rotation invariance to rigid transformations of the search space CMA-ES is almost parameterless Tuning of a small set of functions Hansen & Ostermeier 2001 Default values generalize to whole classes Exception: population size for multi-modal functions but try the Restart-CMA-ES Auger & Hansen, 2005 Michèle Sebag Surrogate optimization: SVM for CMA 12/ 47

19 State-of-the-art Results Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization BBOB Black-Box Optimization Benchmarking ACM-GECCO workshop, in 2009 and 2010 Set of 25 benchmark functions, dimensions 2 to 40 With known difficulties (ill-conditioning, non-separability,... ) Noisy and non-noisy versions Competitors include BFGS (Matlab version), Fletcher-Powell, DFO (Derivative-Free Optimization, Powell 04) Differential Evolution Particle Swarm Optimization and many more Michèle Sebag Surrogate optimization: SVM for CMA 13/ 47

20 Contents Statistical Machine Learning Linear classifiers The kernel trick 1 Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization 2 Statistical Machine Learning Linear classifiers The kernel trick 3 Previous Work Mixing Rank-SVM and Local Information Experiments 4 Background Dominance-based Surrogate Experimental Validation Michèle Sebag Surrogate optimization: SVM for CMA 14/ 47

21 Supervised Machine Learning Statistical Machine Learning Linear classifiers The kernel trick Context Universe instance x i Oracle y i Input: Training set E = {(x i,y i ),i = 1...n,x i X,y i Y} Output: Hypothesis h : X Y Criterion: Quality of h Michèle Sebag Surrogate optimization: SVM for CMA 15/ 47

22 Supervised Machine Learning, 2 Statistical Machine Learning Linear classifiers The kernel trick Definitions E = {(x i,y i ),x i X,y i Y,i = 1...n} Classification : Y finite Regression : Y IR Hypothesis space H : X Y failure/ok time to failure Tasks Select H model selection Assess h H expected accuracy IE[h(x) y] Find h in H minimizing the error cost in expectation h = Arg min{ie[l(h(x) y),h H} Michèle Sebag Surrogate optimization: SVM for CMA 16/ 47

23 Dilemma Statistical Machine Learning Linear classifiers The kernel trick Fitting the data Bias variance tradeoff Michèle Sebag Surrogate optimization: SVM for CMA 17/ 47

24 Statistical Machine Learning Statistical Machine Learning Linear classifiers The kernel trick Minimize expected loss Minimize IE[l(h(x), y)] Principle If h is well-behaved on the training set, if the training set is representative and if h is regular, then h is well-behaved in expectation. n i=1 E[F] F(x i) +c(f,n) n Michèle Sebag Surrogate optimization: SVM for CMA 18/ 47

25 Statistical Machine Learning Linear classifiers The kernel trick Linear classification; the noiseless case H : X IR d IR prediction = sgn(h(x)) h(x) = < w,x > +b L 1 L 2 w L 3 b Michèle Sebag Surrogate optimization: SVM for CMA 19/ 47

26 Statistical Machine Learning Linear classifiers The kernel trick Linear classification; the noiseless case, 2 Example Constraint y i (< w,x i > +b) margin 0 Maximize minimum margin 2/ w support vector w b +1 b b 0-1 Formalisation { Minimize 1 2 w 2 subject to i, y i (< w,x i > +b) 1 2/ w i Michèle Sebag Surrogate optimization: SVM for CMA 20/ 47

27 Statistical Machine Learning Linear classifiers The kernel trick Linear classification; the noiseless case, 3 Primal form { Minimize 1 2 w 2 subject to i, y i (< w,x i > +b) 1 Using Lagrange multipliers: Minimize w,b max α { 1 2 w 2 Dual form Maximize α } n α i [y i (w x i b) 1] i=1 n α i 1 α i α j y i y j x i,x j 2 i=1 Optimization: quadratic programming i,j w = α i y i x i subject to α i 0,i = 1...n Michèle Sebag Surrogate optimization: SVM for CMA 21/ 47

28 Statistical Machine Learning Linear classifiers The kernel trick Linear classification; the noisy case Allow constraint violation; consider slack variables Primal form { Minimize 1 2 w 2 +C n i=1 ξ i subject to i, y i (< w,x i > +b) 1 ξ i, 0 ξ i Lagrange multipliers: { 1 n 2 w 2 +C ξ i Dual form Solution i=1 Maximize α Minimize w,b,ξ max α,β } n n α i [y i (w x i b) 1+ξ i ] β i ξ i i=1 n α i 1 α i α j y i y j x i,x j 2 i=1 i,j i=1 subject to 0 α i C,i = 1...n, α i y i = 0 support vectors h(x) = w,x = α i y i x i,x Michèle Sebag Surrogate optimization: SVM for CMA 22/ 47

29 The kernel trick Statistical Machine Learning Linear classifiers The kernel trick Intuition X Ω Φ : x = (x 1,x 2 ) (x 2 1, 2x 1.x 2,x 2 2) Principle: choose Φ, K such that Φ(x),Φ(x ) = K(x,x ) Michèle Sebag Surrogate optimization: SVM for CMA 23/ 47

30 The kernel trick, 2 Statistical Machine Learning Linear classifiers The kernel trick SVM only considers the scalar product PROS h(x) = i α iy i x i,x linear case h(x) = i α iy i K(x i,x) mboxkerneltrick A rich hypothesis space No computational overhead: no explicit mapping on the feature space Open problem: kernel design Michèle Sebag Surrogate optimization: SVM for CMA 24/ 47

31 The kernel trick, 3 Statistical Machine Learning Linear classifiers The kernel trick Kernels Polynomial: k(x i,x j ) = ( x i,x j +1) d Gaussian or Radial Basis Function: k(x i,x j ) = exp( xi xj 2 2σ 2 ) Hyperbolic tangent: k(x i,x j ) = tanh(k x i,x j +c) Examples for Polynomial (left) and Gaussian (right) Kernels: Michèle Sebag Surrogate optimization: SVM for CMA 25/ 47

32 Rank-based SVM Statistical Machine Learning Linear classifiers The kernel trick Learning to order things On training set E = {x i,i = 1...n} expert gives preferences: (x ik x jk ), k = 1...K underconstrained regression Order constraints Primal form { Minimize 1 2 w 2 +C K k=1 ξ k subject to k, w,x ik w,x jk 1 ξ k Michèle Sebag Surrogate optimization: SVM for CMA 26/ 47

33 Contents Previous Work Mixing Rank-SVM and Local Information Experiments 1 Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization 2 Statistical Machine Learning Linear classifiers The kernel trick 3 Previous Work Mixing Rank-SVM and Local Information Experiments 4 Background Dominance-based Surrogate Experimental Validation Michèle Sebag Surrogate optimization: SVM for CMA 27/ 47

34 Surrogate Models for CMA-ES Previous Work Mixing Rank-SVM and Local Information Experiments lmm-cma-es Build a full quadratic meta-model around current point Weighted by Mahalanobis distance from covariance matric Speed-up: a factor of 2-3 for n 4 Complexity: from O(n 4 ) to O(n 6 ) (intractable for n>16) Rank-invariance is lost S. Kern et al. (2006). "Local Meta-Models for Optimization Using Evolution Strategies" Z. Bouzarkouna et al. (2010). Investigating the lmm-cma-es for Large Population Sizes Michèle Sebag Surrogate optimization: SVM for CMA 28/ 47

35 Previous Work Mixing Rank-SVM and Local Information Experiments Surrogate Models for CMA-ES, cont Using Rank-SVM Builds a global model using Rank-SVM x i x j iff F(x i ) < F(x j ) Kernel and parameters highly problem-dependent Note: no use of information from current state of CMA T. Runarsson (2006). "Ordinal Regression in Evolutionary Computation" ACM Algorithm Use C from CMA-ES as Gaussian kernel I. Loschilov et al. (2010). "Comparison-based optimizers need comparison-based surrogates Michèle Sebag Surrogate optimization: SVM for CMA 29/ 47

36 Model Learning Non-separable Ellipsoid problem Previous Work Mixing Rank-SVM and Local Information Experiments X 2 0 X X X 1 Rank-SVM regression in original coordinate system Rank-SVM regression in transformed coordinate system given by current covariance matrix C and mean m: x = C 1 2(x m) Michèle Sebag Surrogate optimization: SVM for CMA 30/ 47

37 Using the Surrogate Model Previous Work Mixing Rank-SVM and Local Information Experiments Optimization: Significant Speed-Up... if global and accurate model Filtering: "Guaranteed" Speed-Up Prescreen (λ ) Evaluate (λ ) Retain with rank, Retain with rank λ, ~λ ~ Probability Density Rank Michèle Sebag Surrogate optimization: SVM for CMA 31/ 47

38 ACM-ES Optimization Loop Previous Work Mixing Rank-SVM and Local Information Experiments from the current covariance matrix A. Select training points B. Build a surrogate model 1. Select best k training points. 2. The change of coordinates, defined and the current mean value, reads [4]: X 2 0 X Add new λ training points and update parameters of CMA-ES. D. Select most promising children Prescreen (λ ) Evaluate (λ ) Probability Density Retain with rank, Retain with rank λ, ~ λ Rank -1 ~ X1-1 Rank-based Surrogate Model X1 3. Build a surrogate model using Rank SVM. C. Generate pre-children 4. Generate pre-children and rank them according to surrogate fitness function. Michèle Sebag Surrogate optimization: SVM for CMA 32/ 47

39 Parameters Previous Work Mixing Rank-SVM and Local Information Experiments SVM Learning Number of training points: N training = 30 d for all problems, except Rosenbrock and Rastrigin, where N training = 70 (d) Number of iterations: N iter = d Kernel function: RBF function with σ equal to the average distance of the training points The cost of constraint violation: C i = 10 6 (N training i) 2.0 Offspring Selection Number of test points: N test = 500 Number of evaluated offsprings: λ = λ 3 Offspring selection pressure parameters: σ 2 sel0 = 2σ2 sel1 = 0.8 Michèle Sebag Surrogate optimization: SVM for CMA 33/ 47

40 Results Speed-up Previous Work Mixing Rank-SVM and Local Information Experiments Speedup Schwefel 2 Schwefel 1/4 Rosenbrock 1 Noisy Sphere 1 Ackley Ellipsoid Problem Dimension Function n λ λ e ACM-ES spu CMA-ES Schwefel /4 Schwefel Rosenbrock (0.95) (0.90) (0.75) (0.9) NoisySphere (0.95) Ackley Ellipsoid Rastrigin (0.3) (0.75) Michèle Sebag Surrogate optimization: SVM for CMA 34/ 47

41 Results Learning Time Previous Work Mixing Rank-SVM and Local Information Experiments Cost of model learning/testing increases quasi-linearly with d on Sphere function: 10 Learning time (ms) Slope= Problem Dimension 10 3 Michèle Sebag Surrogate optimization: SVM for CMA 35/ 47

42 ACM-ES: conclusion Previous Work Mixing Rank-SVM and Local Information Experiments ACM-ES From 2 to 4 times faster on Uni-Modal Problems Invariance to rank-preserving transformations preserved Computation complexity is O(n) Available online at Open Issues Extention to multi-modal optimization On-line adaptation of selection pressure and surrogate model complexity Michèle Sebag Surrogate optimization: SVM for CMA 36/ 47

43 Contents Background Dominance-based Surrogate Experimental Validation 1 Covariance Matrix Adaptation-Evolution Strategy Evolution Strategies CMA-ES The state-of-the-art of (Stochastic) Optimization 2 Statistical Machine Learning Linear classifiers The kernel trick 3 Previous Work Mixing Rank-SVM and Local Information Experiments 4 Background Dominance-based Surrogate Experimental Validation Michèle Sebag Surrogate optimization: SVM for CMA 37/ 47

44 Background Dominance-based Surrogate Experimental Validation Multi-objective CMA-ES (MO-CMA-ES) MO-CMA-ES = µ mo independent (1+1)-CMA-ES. Each (1+1)-CMA samples new offspring. The size of the temporary population is 2µ mo. Only µ mo best solutions should be chosen for new population after the hypervolume-based non-dominated sorting. Update of CMA individuals takes place. Dominated Pareto Objective 2 Michèle Sebag Surrogate optimization: SVM for CMA 38/ 47

45 Background Dominance-based Surrogate Experimental Validation A Multi-Objective Surrogate Model Rationale Rationale: find a unique function F(x) that defines the aggregated quality of the solution x in multi-objective case. Idea originally proposed using a mixture of One-Class SVM and regression-svm a a I. Loshchilov, M. Schoenauer, M. Sebag (GECCO 2010). "A Mono Surrogate for Multiobjective Optimization" Dominated Pareto New Pareto p+e 2e p-e Objective 2 X2 Objective 1 FSVM p+e p-e Michèle Sebag Surrogate optimization: SVM for CMA 39/ 47 X1

46 Unsing the Surrogate Model Background Dominance-based Surrogate Experimental Validation Filtering Generate N inform pre-children For each pre-children A and the nearest parent B calculate Gain(A,B) = F svm (A) F svm (B) New children is the point with the maximum value of Gain true Pareto SVM Pareto X2 X1 Michèle Sebag Surrogate optimization: SVM for CMA 40/ 47

47 Dominance-Based Surrogate Using Rank-SVM Which ordered pairs? Background Dominance-based Surrogate Experimental Validation Considering all possible relations may be too expensive. Primary constraints: x and its nearest dominated point Secondary constraints: any 2 points not belonging to the same front (according to non-dominated sorting) Objective 2 a b c e f d FSVM > - constraints: Primary Secondary All primary constraints, and a limited number of secondary constraints Objective 1 Michèle Sebag Surrogate optimization: SVM for CMA 41/ 47

48 Dominance-Based Surrogate (2) Background Dominance-based Surrogate Experimental Validation Construction of the surrogate model Initialize archive Ω active as the set of Primary constraints, and Ω passive as the set of Secondary constraints. Learn the model for 1000 Ω active iterations. Add the most violated passive contraint from Ω passive to Ω active and optimize the model for 10 Ω active iterations. Repeat the last step 0.1 Ω active times. Michèle Sebag Surrogate optimization: SVM for CMA 42/ 47

49 Experimental Validation Parameters Background Dominance-based Surrogate Experimental Validation Surrogate Models ASM - aggregated surrogate model based on One-Class SVM and Regression SVM RASM - proposed Rank-based SVM SVM Learning Number of training points: at most N training = 1000 points Number of iterations: 1000 Ω active + Ω active 2 2N 2 training Kernel function: RBF function with σ equal to the average distance of the training points The cost of constraint violation: C = 1000 Offspring Selection Number of pre-children: p = 2 and p = 10 Michèle Sebag Surrogate optimization: SVM for CMA 43/ 47

50 Experimental Validation Comparative Results Background Dominance-based Surrogate Experimental Validation ASM and Rank-based ASM applied on top of NSGA-II (with hypervolume secondary criterion) and MO-CMA-ES, on ZDT and IHR functions. N = How many more true evaluations than best performer Michèle Sebag Surrogate optimization: SVM for CMA 44/ 47

51 Discussion Background Dominance-based Surrogate Experimental Validation Results on ZDT and IHR problems Rank-SVM versions are 1.5 times faster for p = for p = 10 before the algorithm reaches nearly-optimal Pareto points premature convergence of approximation of optimal µ-distribution: the surrogate model only enforces convergence toward Pareto front, but does not care about diversity. Michèle Sebag Surrogate optimization: SVM for CMA 45/ 47

52 Background Dominance-based Surrogate Experimental Validation Dominance-based Surrogate: Conclusion The proposed aggregated surrogate model is invariant to preserving transformation of the objective functions. The speed-up is significant, but limited to the convergence to the optimal Pareto front. Objective 2 a b g c e f d > - constraints: Primary Secondary = - constraints: Primary Secondary FSVM Thanks to the flexibility of SVM, "any" kind of preference could be taken into account: extreme points, "=" relations, Hypervolume Contribution, Decision Maker - defined relations. Objective 1 Michèle Sebag Surrogate optimization: SVM for CMA 46/ 47

53 Machine Learning for Optimization: Discussion Learning about the landscape easy Using available samples Using prior knowledge / constraints...using it to speed-up the search Dilemma Exploration vs Exploitation Multi-modal optimization very doable; more difficult Michèle Sebag Surrogate optimization: SVM for CMA 47/ 47

BI-population CMA-ES Algorithms with Surrogate Models and Line Searches

BI-population CMA-ES Algorithms with Surrogate Models and Line Searches BI-population CMA-ES Algorithms with Surrogate Models and Line Searches Ilya Loshchilov 1, Marc Schoenauer 2 and Michèle Sebag 2 1 LIS, École Polytechnique Fédérale de Lausanne 2 TAO, INRIA CNRS Université

More information

Comparison-Based Optimizers Need Comparison-Based Surrogates

Comparison-Based Optimizers Need Comparison-Based Surrogates Comparison-Based Optimizers Need Comparison-Based Surrogates Ilya Loshchilov 1,2, Marc Schoenauer 1,2, and Michèle Sebag 2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 Laboratoire de Recherche

More information

Adaptive Coordinate Descent

Adaptive Coordinate Descent Adaptive Coordinate Descent Ilya Loshchilov 1,2, Marc Schoenauer 1,2, Michèle Sebag 2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 and Laboratoire de Recherche en Informatique (UMR CNRS 8623) Université

More information

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms.

Problem Statement Continuous Domain Search/Optimization. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms. Tutorial Evolution Strategies and Related Estimation of Distribution Algorithms Anne Auger & Nikolaus Hansen INRIA Saclay - Ile-de-France, project team TAO Universite Paris-Sud, LRI, Bat. 49 945 ORSAY

More information

Covariance Matrix Adaptation in Multiobjective Optimization

Covariance Matrix Adaptation in Multiobjective Optimization Covariance Matrix Adaptation in Multiobjective Optimization Dimo Brockhoff INRIA Lille Nord Europe October 30, 2014 PGMO-COPI 2014, Ecole Polytechnique, France Mastertitelformat Scenario: Multiobjective

More information

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22

Outline. Basic concepts: SVM and kernels SVM primal/dual problems. Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels SVM primal/dual problems Chih-Jen Lin (National Taiwan Univ.) 1 / 22 Outline Basic concepts: SVM and kernels Basic concepts: SVM and kernels SVM primal/dual problems

More information

Jeff Howbert Introduction to Machine Learning Winter

Jeff Howbert Introduction to Machine Learning Winter Classification / Regression Support Vector Machines Jeff Howbert Introduction to Machine Learning Winter 2012 1 Topics SVM classifiers for linearly separable classes SVM classifiers for non-linearly separable

More information

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.

More information

Experimental Comparisons of Derivative Free Optimization Algorithms

Experimental Comparisons of Derivative Free Optimization Algorithms Experimental Comparisons of Derivative Free Optimization Algorithms Anne Auger Nikolaus Hansen J. M. Perez Zerpa Raymond Ros Marc Schoenauer TAO Project-Team, INRIA Saclay Île-de-France, and Microsoft-INRIA

More information

Evolution Strategies and Covariance Matrix Adaptation

Evolution Strategies and Covariance Matrix Adaptation Evolution Strategies and Covariance Matrix Adaptation Cours Contrôle Avancé - Ecole Centrale Paris Anne Auger January 2014 INRIA Research Centre Saclay Île-de-France University Paris-Sud, LRI (UMR 8623),

More information

Stochastic Methods for Continuous Optimization

Stochastic Methods for Continuous Optimization Stochastic Methods for Continuous Optimization Anne Auger et Dimo Brockhoff Paris-Saclay Master - Master 2 Informatique - Parcours Apprentissage, Information et Contenu (AIC) 2016 Slides taken from Auger,

More information

Advanced Optimization

Advanced Optimization Advanced Optimization Lecture 3: 1: Randomized Algorithms for for Continuous Discrete Problems Problems November 22, 2016 Master AIC Université Paris-Saclay, Orsay, France Anne Auger INRIA Saclay Ile-de-France

More information

Perceptron Revisited: Linear Separators. Support Vector Machines

Perceptron Revisited: Linear Separators. Support Vector Machines Support Vector Machines Perceptron Revisited: Linear Separators Binary classification can be viewed as the task of separating classes in feature space: w T x + b > 0 w T x + b = 0 w T x + b < 0 Department

More information

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science

Neural Networks. Prof. Dr. Rudolf Kruse. Computational Intelligence Group Faculty for Computer Science Neural Networks Prof. Dr. Rudolf Kruse Computational Intelligence Group Faculty for Computer Science kruse@iws.cs.uni-magdeburg.de Rudolf Kruse Neural Networks 1 Supervised Learning / Support Vector Machines

More information

Introduction to SVM and RVM

Introduction to SVM and RVM Introduction to SVM and RVM Machine Learning Seminar HUS HVL UIB Yushu Li, UIB Overview Support vector machine SVM First introduced by Vapnik, et al. 1992 Several literature and wide applications Relevance

More information

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties

Introduction to Black-Box Optimization in Continuous Search Spaces. Definitions, Examples, Difficulties 1 Introduction to Black-Box Optimization in Continuous Search Spaces Definitions, Examples, Difficulties Tutorial: Evolution Strategies and CMA-ES (Covariance Matrix Adaptation) Anne Auger & Nikolaus Hansen

More information

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask

Machine Learning and Data Mining. Support Vector Machines. Kalev Kask Machine Learning and Data Mining Support Vector Machines Kalev Kask Linear classifiers Which decision boundary is better? Both have zero training error (perfect training accuracy) But, one of them seems

More information

Bio-inspired Continuous Optimization: The Coming of Age

Bio-inspired Continuous Optimization: The Coming of Age Bio-inspired Continuous Optimization: The Coming of Age Anne Auger Nikolaus Hansen Nikolas Mauny Raymond Ros Marc Schoenauer TAO Team, INRIA Futurs, FRANCE http://tao.lri.fr First.Last@inria.fr CEC 27,

More information

arxiv: v1 [cs.ne] 9 May 2016

arxiv: v1 [cs.ne] 9 May 2016 Anytime Bi-Objective Optimization with a Hybrid Multi-Objective CMA-ES (HMO-CMA-ES) arxiv:1605.02720v1 [cs.ne] 9 May 2016 ABSTRACT Ilya Loshchilov University of Freiburg Freiburg, Germany ilya.loshchilov@gmail.com

More information

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM

Support Vector Machines (SVM) in bioinformatics. Day 1: Introduction to SVM 1 Support Vector Machines (SVM) in bioinformatics Day 1: Introduction to SVM Jean-Philippe Vert Bioinformatics Center, Kyoto University, Japan Jean-Philippe.Vert@mines.org Human Genome Center, University

More information

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes

About this class. Maximizing the Margin. Maximum margin classifiers. Picture of large and small margin hyperplanes About this class Maximum margin classifiers SVMs: geometric derivation of the primal problem Statement of the dual problem The kernel trick SVMs as the solution to a regularization problem Maximizing the

More information

ML (cont.): SUPPORT VECTOR MACHINES

ML (cont.): SUPPORT VECTOR MACHINES ML (cont.): SUPPORT VECTOR MACHINES CS540 Bryan R Gibson University of Wisconsin-Madison Slides adapted from those used by Prof. Jerry Zhu, CS540-1 1 / 40 Support Vector Machines (SVMs) The No-Math Version

More information

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University

Lecture 18: Kernels Risk and Loss Support Vector Regression. Aykut Erdem December 2016 Hacettepe University Lecture 18: Kernels Risk and Loss Support Vector Regression Aykut Erdem December 2016 Hacettepe University Administrative We will have a make-up lecture on next Saturday December 24, 2016 Presentations

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Hypothesis Space variable size deterministic continuous parameters Learning Algorithm linear and quadratic programming eager batch SVMs combine three important ideas Apply optimization

More information

Local-Meta-Model CMA-ES for Partially Separable Functions

Local-Meta-Model CMA-ES for Partially Separable Functions Local-Meta-Model CMA-ES for Partially Separable Functions Zyed Bouzarkouna, Anne Auger, Didier Yu Ding To cite this version: Zyed Bouzarkouna, Anne Auger, Didier Yu Ding. Local-Meta-Model CMA-ES for Partially

More information

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization

CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization CMA-ES a Stochastic Second-Order Method for Function-Value Free Numerical Optimization Nikolaus Hansen INRIA, Research Centre Saclay Machine Learning and Optimization Team, TAO Univ. Paris-Sud, LRI MSRC

More information

CSC 411 Lecture 17: Support Vector Machine

CSC 411 Lecture 17: Support Vector Machine CSC 411 Lecture 17: Support Vector Machine Ethan Fetaya, James Lucas and Emad Andrews University of Toronto CSC411 Lec17 1 / 1 Today Max-margin classification SVM Hard SVM Duality Soft SVM CSC411 Lec17

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2016 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina

Indirect Rule Learning: Support Vector Machines. Donglin Zeng, Department of Biostatistics, University of North Carolina Indirect Rule Learning: Support Vector Machines Indirect learning: loss optimization It doesn t estimate the prediction rule f (x) directly, since most loss functions do not have explicit optimizers. Indirection

More information

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012

Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology. M. Soleymani Fall 2012 Support Vector Machine (SVM) & Kernel CE-717: Machine Learning Sharif University of Technology M. Soleymani Fall 2012 Linear classifier Which classifier? x 2 x 1 2 Linear classifier Margin concept x 2

More information

Review: Support vector machines. Machine learning techniques and image analysis

Review: Support vector machines. Machine learning techniques and image analysis Review: Support vector machines Review: Support vector machines Margin optimization min (w,w 0 ) 1 2 w 2 subject to y i (w 0 + w T x i ) 1 0, i = 1,..., n. Review: Support vector machines Margin optimization

More information

Support Vector Machines and Kernel Methods

Support Vector Machines and Kernel Methods 2018 CS420 Machine Learning, Lecture 3 Hangout from Prof. Andrew Ng. http://cs229.stanford.edu/notes/cs229-notes3.pdf Support Vector Machines and Kernel Methods Weinan Zhang Shanghai Jiao Tong University

More information

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington

Support Vector Machines. CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington Support Vector Machines CSE 6363 Machine Learning Vassilis Athitsos Computer Science and Engineering Department University of Texas at Arlington 1 A Linearly Separable Problem Consider the binary classification

More information

Machine Learning. Lecture 6: Support Vector Machine. Feng Li.

Machine Learning. Lecture 6: Support Vector Machine. Feng Li. Machine Learning Lecture 6: Support Vector Machine Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Warm Up 2 / 80 Warm Up (Contd.)

More information

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML)

Machine Learning. Kernels. Fall (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang. (Chap. 12 of CIML) Machine Learning Fall 2017 Kernels (Kernels, Kernelized Perceptron and SVM) Professor Liang Huang (Chap. 12 of CIML) Nonlinear Features x4: -1 x1: +1 x3: +1 x2: -1 Concatenated (combined) features XOR:

More information

Introduction to Randomized Black-Box Numerical Optimization and CMA-ES

Introduction to Randomized Black-Box Numerical Optimization and CMA-ES Introduction to Randomized Black-Box Numerical Optimization and CMA-ES July 3, 2017 CEA/EDF/Inria summer school "Numerical Analysis" Université Pierre-et-Marie-Curie, Paris, France Anne Auger, Asma Atamna,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2014 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Max Margin-Classifier

Max Margin-Classifier Max Margin-Classifier Oliver Schulte - CMPT 726 Bishop PRML Ch. 7 Outline Maximum Margin Criterion Math Maximizing the Margin Non-Separable Data Kernels and Non-linear Mappings Where does the maximization

More information

Benchmarking Gaussian Processes and Random Forests on the BBOB Noiseless Testbed

Benchmarking Gaussian Processes and Random Forests on the BBOB Noiseless Testbed Benchmarking Gaussian Processes and Random Forests on the BBOB Noiseless Testbed Lukáš Bajer,, Zbyněk Pitra,, Martin Holeňa Faculty of Mathematics and Physics, Charles University, Institute of Computer

More information

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan

Support'Vector'Machines. Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan Support'Vector'Machines Machine(Learning(Spring(2018 March(5(2018 Kasthuri Kannan kasthuri.kannan@nyumc.org Overview Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination

More information

Mirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed

Mirrored Variants of the (1,4)-CMA-ES Compared on the Noiseless BBOB-2010 Testbed Author manuscript, published in "GECCO workshop on Black-Box Optimization Benchmarking (BBOB') () 9-" DOI :./8.8 Mirrored Variants of the (,)-CMA-ES Compared on the Noiseless BBOB- Testbed [Black-Box Optimization

More information

Optimisation numérique par algorithmes stochastiques adaptatifs

Optimisation numérique par algorithmes stochastiques adaptatifs Optimisation numérique par algorithmes stochastiques adaptatifs Anne Auger M2R: Apprentissage et Optimisation avancés et Applications anne.auger@inria.fr INRIA Saclay - Ile-de-France, project team TAO

More information

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26

Clustering. Professor Ameet Talwalkar. Professor Ameet Talwalkar CS260 Machine Learning Algorithms March 8, / 26 Clustering Professor Ameet Talwalkar Professor Ameet Talwalkar CS26 Machine Learning Algorithms March 8, 217 1 / 26 Outline 1 Administration 2 Review of last lecture 3 Clustering Professor Ameet Talwalkar

More information

Support Vector Machines.

Support Vector Machines. Support Vector Machines www.cs.wisc.edu/~dpage 1 Goals for the lecture you should understand the following concepts the margin slack variables the linear support vector machine nonlinear SVMs the kernel

More information

Classifier Complexity and Support Vector Classifiers

Classifier Complexity and Support Vector Classifiers Classifier Complexity and Support Vector Classifiers Feature 2 6 4 2 0 2 4 6 8 RBF kernel 10 10 8 6 4 2 0 2 4 6 Feature 1 David M.J. Tax Pattern Recognition Laboratory Delft University of Technology D.M.J.Tax@tudelft.nl

More information

Natural Evolution Strategies for Direct Search

Natural Evolution Strategies for Direct Search Tobias Glasmachers Natural Evolution Strategies for Direct Search 1 Natural Evolution Strategies for Direct Search PGMO-COPI 2014 Recent Advances on Continuous Randomized black-box optimization Thursday

More information

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines

CS6375: Machine Learning Gautam Kunapuli. Support Vector Machines Gautam Kunapuli Example: Text Categorization Example: Develop a model to classify news stories into various categories based on their content. sports politics Use the bag-of-words representation for this

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Shivani Agarwal Support Vector Machines (SVMs) Algorithm for learning linear classifiers Motivated by idea of maximizing margin Efficient extension to non-linear

More information

Introduction to Logistic Regression and Support Vector Machine

Introduction to Logistic Regression and Support Vector Machine Introduction to Logistic Regression and Support Vector Machine guest lecturer: Ming-Wei Chang CS 446 Fall, 2009 () / 25 Fall, 2009 / 25 Before we start () 2 / 25 Fall, 2009 2 / 25 Before we start Feel

More information

Investigating the Local-Meta-Model CMA-ES for Large Population Sizes

Investigating the Local-Meta-Model CMA-ES for Large Population Sizes Investigating the Local-Meta-Model CMA-ES for Large Population Sizes Zyed Bouzarkouna, Anne Auger, Didier Yu Ding To cite this version: Zyed Bouzarkouna, Anne Auger, Didier Yu Ding. Investigating the Local-Meta-Model

More information

Support Vector Machine & Its Applications

Support Vector Machine & Its Applications Support Vector Machine & Its Applications A portion (1/3) of the slides are taken from Prof. Andrew Moore s SVM tutorial at http://www.cs.cmu.edu/~awm/tutorials Mingyue Tan The University of British Columbia

More information

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015

EE613 Machine Learning for Engineers. Kernel methods Support Vector Machines. jean-marc odobez 2015 EE613 Machine Learning for Engineers Kernel methods Support Vector Machines jean-marc odobez 2015 overview Kernel methods introductions and main elements defining kernels Kernelization of k-nn, K-Means,

More information

Support Vector Machine (SVM) and Kernel Methods

Support Vector Machine (SVM) and Kernel Methods Support Vector Machine (SVM) and Kernel Methods CE-717: Machine Learning Sharif University of Technology Fall 2015 Soleymani Outline Margin concept Hard-Margin SVM Soft-Margin SVM Dual Problems of Hard-Margin

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Some material on these is slides borrowed from Andrew Moore's excellent machine learning tutorials located at: http://www.cs.cmu.edu/~awm/tutorials/ Where Should We Draw the Line????

More information

Introduction to Support Vector Machines

Introduction to Support Vector Machines Introduction to Support Vector Machines Hsuan-Tien Lin Learning Systems Group, California Institute of Technology Talk in NTU EE/CS Speech Lab, November 16, 2005 H.-T. Lin (Learning Systems Group) Introduction

More information

Lecture Support Vector Machine (SVM) Classifiers

Lecture Support Vector Machine (SVM) Classifiers Introduction to Machine Learning Lecturer: Amir Globerson Lecture 6 Fall Semester Scribe: Yishay Mansour 6.1 Support Vector Machine (SVM) Classifiers Classification is one of the most important tasks in

More information

CSC242: Intro to AI. Lecture 21

CSC242: Intro to AI. Lecture 21 CSC242: Intro to AI Lecture 21 Administrivia Project 4 (homeworks 18 & 19) due Mon Apr 16 11:59PM Posters Apr 24 and 26 You need an idea! You need to present it nicely on 2-wide by 4-high landscape pages

More information

Bounding the Population Size of IPOP-CMA-ES on the Noiseless BBOB Testbed

Bounding the Population Size of IPOP-CMA-ES on the Noiseless BBOB Testbed Bounding the Population Size of OP-CMA-ES on the Noiseless BBOB Testbed Tianjun Liao IRIDIA, CoDE, Université Libre de Bruxelles (ULB), Brussels, Belgium tliao@ulb.ac.be Thomas Stützle IRIDIA, CoDE, Université

More information

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 6375: Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 6375: Machine Learning Vibhav Gogate The University of Texas at Dallas Machine Learning Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Y Continuous Non-parametric

More information

Lecture Notes on Support Vector Machine

Lecture Notes on Support Vector Machine Lecture Notes on Support Vector Machine Feng Li fli@sdu.edu.cn Shandong University, China 1 Hyperplane and Margin In a n-dimensional space, a hyper plane is defined by ω T x + b = 0 (1) where ω R n is

More information

Support Vector Machine

Support Vector Machine Support Vector Machine Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Linear Support Vector Machine Kernelized SVM Kernels 2 From ERM to RLM Empirical Risk Minimization in the binary

More information

Support Vector Machine (continued)

Support Vector Machine (continued) Support Vector Machine continued) Overlapping class distribution: In practice the class-conditional distributions may overlap, so that the training data points are no longer linearly separable. We need

More information

Support Vector Machines

Support Vector Machines Wien, June, 2010 Paul Hofmarcher, Stefan Theussl, WU Wien Hofmarcher/Theussl SVM 1/21 Linear Separable Separating Hyperplanes Non-Linear Separable Soft-Margin Hyperplanes Hofmarcher/Theussl SVM 2/21 (SVM)

More information

Stochastic optimization and a variable metric approach

Stochastic optimization and a variable metric approach The challenges for stochastic optimization and a variable metric approach Microsoft Research INRIA Joint Centre, INRIA Saclay April 6, 2009 Content 1 Introduction 2 The Challenges 3 Stochastic Search 4

More information

Pattern Recognition 2018 Support Vector Machines

Pattern Recognition 2018 Support Vector Machines Pattern Recognition 2018 Support Vector Machines Ad Feelders Universiteit Utrecht Ad Feelders ( Universiteit Utrecht ) Pattern Recognition 1 / 48 Support Vector Machines Ad Feelders ( Universiteit Utrecht

More information

Statistical Machine Learning from Data

Statistical Machine Learning from Data Samy Bengio Statistical Machine Learning from Data 1 Statistical Machine Learning from Data Support Vector Machines Samy Bengio IDIAP Research Institute, Martigny, Switzerland, and Ecole Polytechnique

More information

Data Mining - SVM. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55

Data Mining - SVM. Dr. Jean-Michel RICHER Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55 Data Mining - SVM Dr. Jean-Michel RICHER 2018 jean-michel.richer@univ-angers.fr Dr. Jean-Michel RICHER Data Mining - SVM 1 / 55 Outline 1. Introduction 2. Linear regression 3. Support Vector Machine 4.

More information

Machine Learning. Support Vector Machines. Manfred Huber

Machine Learning. Support Vector Machines. Manfred Huber Machine Learning Support Vector Machines Manfred Huber 2015 1 Support Vector Machines Both logistic regression and linear discriminant analysis learn a linear discriminant function to separate the data

More information

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017

The Kernel Trick, Gram Matrices, and Feature Extraction. CS6787 Lecture 4 Fall 2017 The Kernel Trick, Gram Matrices, and Feature Extraction CS6787 Lecture 4 Fall 2017 Momentum for Principle Component Analysis CS6787 Lecture 3.1 Fall 2017 Principle Component Analysis Setting: find the

More information

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods

Foundation of Intelligent Systems, Part I. SVM s & Kernel Methods Foundation of Intelligent Systems, Part I SVM s & Kernel Methods mcuturi@i.kyoto-u.ac.jp FIS - 2013 1 Support Vector Machines The linearly-separable case FIS - 2013 2 A criterion to select a linear classifier:

More information

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar

Support Vector Machines. Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar Data Mining Support Vector Machines Introduction to Data Mining, 2 nd Edition by Tan, Steinbach, Karpatne, Kumar 02/03/2018 Introduction to Data Mining 1 Support Vector Machines Find a linear hyperplane

More information

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines

Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Non-Bayesian Classifiers Part II: Linear Discriminants and Support Vector Machines Selim Aksoy Department of Computer Engineering Bilkent University saksoy@cs.bilkent.edu.tr CS 551, Fall 2018 CS 551, Fall

More information

(Kernels +) Support Vector Machines

(Kernels +) Support Vector Machines (Kernels +) Support Vector Machines Machine Learning Torsten Möller Reading Chapter 5 of Machine Learning An Algorithmic Perspective by Marsland Chapter 6+7 of Pattern Recognition and Machine Learning

More information

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract

Scale-Invariance of Support Vector Machines based on the Triangular Kernel. Abstract Scale-Invariance of Support Vector Machines based on the Triangular Kernel François Fleuret Hichem Sahbi IMEDIA Research Group INRIA Domaine de Voluceau 78150 Le Chesnay, France Abstract This paper focuses

More information

Statistical Pattern Recognition

Statistical Pattern Recognition Statistical Pattern Recognition Support Vector Machine (SVM) Hamid R. Rabiee Hadi Asheri, Jafar Muhammadi, Nima Pourdamghani Spring 2013 http://ce.sharif.edu/courses/91-92/2/ce725-1/ Agenda Introduction

More information

Statistical learning theory, Support vector machines, and Bioinformatics

Statistical learning theory, Support vector machines, and Bioinformatics 1 Statistical learning theory, Support vector machines, and Bioinformatics Jean-Philippe.Vert@mines.org Ecole des Mines de Paris Computational Biology group ENS Paris, november 25, 2003. 2 Overview 1.

More information

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning

Kernel Machines. Pradeep Ravikumar Co-instructor: Manuela Veloso. Machine Learning Kernel Machines Pradeep Ravikumar Co-instructor: Manuela Veloso Machine Learning 10-701 SVM linearly separable case n training points (x 1,, x n ) d features x j is a d-dimensional vector Primal problem:

More information

Lecture 10: A brief introduction to Support Vector Machine

Lecture 10: A brief introduction to Support Vector Machine Lecture 10: A brief introduction to Support Vector Machine Advanced Applied Multivariate Analysis STAT 2221, Fall 2013 Sungkyu Jung Department of Statistics, University of Pittsburgh Xingye Qiao Department

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Reading: Ben-Hur & Weston, A User s Guide to Support Vector Machines (linked from class web page) Notation Assume a binary classification problem. Instances are represented by vector

More information

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights

Linear, threshold units. Linear Discriminant Functions and Support Vector Machines. Biometrics CSE 190 Lecture 11. X i : inputs W i : weights Linear Discriminant Functions and Support Vector Machines Linear, threshold units CSE19, Winter 11 Biometrics CSE 19 Lecture 11 1 X i : inputs W i : weights θ : threshold 3 4 5 1 6 7 Courtesy of University

More information

Discriminative Models

Discriminative Models No.5 Discriminative Models Hui Jiang Department of Electrical Engineering and Computer Science Lassonde School of Engineering York University, Toronto, Canada Outline Generative vs. Discriminative models

More information

TDT4173 Machine Learning

TDT4173 Machine Learning TDT4173 Machine Learning Lecture 3 Bagging & Boosting + SVMs Norwegian University of Science and Technology Helge Langseth IT-VEST 310 helgel@idi.ntnu.no 1 TDT4173 Machine Learning Outline 1 Ensemble-methods

More information

Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial)

Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial) Addressing Numerical Black-Box Optimization: CMA-ES (Tutorial) Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat. 490 91405

More information

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas

Midterm Review CS 7301: Advanced Machine Learning. Vibhav Gogate The University of Texas at Dallas Midterm Review CS 7301: Advanced Machine Learning Vibhav Gogate The University of Texas at Dallas Supervised Learning Issues in supervised learning What makes learning hard Point Estimation: MLE vs Bayesian

More information

Machine Learning Support Vector Machines. Prof. Matteo Matteucci

Machine Learning Support Vector Machines. Prof. Matteo Matteucci Machine Learning Support Vector Machines Prof. Matteo Matteucci Discriminative vs. Generative Approaches 2 o Generative approach: we derived the classifier from some generative hypothesis about the way

More information

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari

Machine Learning Basics: Stochastic Gradient Descent. Sargur N. Srihari Machine Learning Basics: Stochastic Gradient Descent Sargur N. srihari@cedar.buffalo.edu 1 Topics 1. Learning Algorithms 2. Capacity, Overfitting and Underfitting 3. Hyperparameters and Validation Sets

More information

Basis Expansion and Nonlinear SVM. Kai Yu

Basis Expansion and Nonlinear SVM. Kai Yu Basis Expansion and Nonlinear SVM Kai Yu Linear Classifiers f(x) =w > x + b z(x) = sign(f(x)) Help to learn more general cases, e.g., nonlinear models 8/7/12 2 Nonlinear Classifiers via Basis Expansion

More information

Modelli Lineari (Generalizzati) e SVM

Modelli Lineari (Generalizzati) e SVM Modelli Lineari (Generalizzati) e SVM Corso di AA, anno 2018/19, Padova Fabio Aiolli 19/26 Novembre 2018 Fabio Aiolli Modelli Lineari (Generalizzati) e SVM 19/26 Novembre 2018 1 / 36 Outline Linear methods

More information

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron

Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron CS446: Machine Learning, Fall 2017 Lecture 14 : Online Learning, Stochastic Gradient Descent, Perceptron Lecturer: Sanmi Koyejo Scribe: Ke Wang, Oct. 24th, 2017 Agenda Recap: SVM and Hinge loss, Representer

More information

Incorporating detractors into SVM classification

Incorporating detractors into SVM classification Incorporating detractors into SVM classification AGH University of Science and Technology 1 2 3 4 5 (SVM) SVM - are a set of supervised learning methods used for classification and regression SVM maximal

More information

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017

Machine Learning. Support Vector Machines. Fabio Vandin November 20, 2017 Machine Learning Support Vector Machines Fabio Vandin November 20, 2017 1 Classification and Margin Consider a classification problem with two classes: instance set X = R d label set Y = { 1, 1}. Training

More information

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI

Support Vector Machines II. CAP 5610: Machine Learning Instructor: Guo-Jun QI Support Vector Machines II CAP 5610: Machine Learning Instructor: Guo-Jun QI 1 Outline Linear SVM hard margin Linear SVM soft margin Non-linear SVM Application Linear Support Vector Machine An optimization

More information

A Restart CMA Evolution Strategy With Increasing Population Size

A Restart CMA Evolution Strategy With Increasing Population Size Anne Auger and Nikolaus Hansen A Restart CMA Evolution Strategy ith Increasing Population Size Proceedings of the IEEE Congress on Evolutionary Computation, CEC 2005 c IEEE A Restart CMA Evolution Strategy

More information

5.6 Nonparametric Logistic Regression

5.6 Nonparametric Logistic Regression 5.6 onparametric Logistic Regression Dmitri Dranishnikov University of Florida Statistical Learning onparametric Logistic Regression onparametric? Doesnt mean that there are no parameters. Just means that

More information

Support Vector Machines

Support Vector Machines Support Vector Machines Le Song Machine Learning I CSE 6740, Fall 2013 Naïve Bayes classifier Still use Bayes decision rule for classification P y x = P x y P y P x But assume p x y = 1 is fully factorized

More information

Linear & nonlinear classifiers

Linear & nonlinear classifiers Linear & nonlinear classifiers Machine Learning Hamid Beigy Sharif University of Technology Fall 1394 Hamid Beigy (Sharif University of Technology) Linear & nonlinear classifiers Fall 1394 1 / 34 Table

More information

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation

Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Tutorial CMA-ES Evolution Strategies and Covariance Matrix Adaptation Anne Auger & Nikolaus Hansen INRIA Research Centre Saclay Île-de-France Project team TAO University Paris-Sud, LRI (UMR 8623), Bat.

More information

SVMs, Duality and the Kernel Trick

SVMs, Duality and the Kernel Trick SVMs, Duality and the Kernel Trick Machine Learning 10701/15781 Carlos Guestrin Carnegie Mellon University February 26 th, 2007 2005-2007 Carlos Guestrin 1 SVMs reminder 2005-2007 Carlos Guestrin 2 Today

More information