Fundamentals of Metaheuristics

Size: px

Start display at page:

Download "Fundamentals of Metaheuristics"

Hillary Powers
5 years ago
Views:

1 Fundamentals of Metaheuristics Part I - Basic concepts and Single-State Methods A seminar for Neural Networks Simone Scardapane Academic year

2 ABOUT THIS SEMINAR The seminar is divided in three main parts, each of 2 hours: 1. Part I (this one): Basic concepts and single-state algorithms (Simulated Annealing and Tabu Search). 2. Part II: Population algorithms (Evolutionary Methods and Particle Swarm Optimization). 3. Part III: Advanced topics, including multi-objective optimization and parallelization.

3 ABOUT THIS SEMINAR The seminar is divided in three main parts, each of 2 hours: 1. Part I (this one): Basic concepts and single-state algorithms (Simulated Annealing and Tabu Search). 2. Part II: Population algorithms (Evolutionary Methods and Particle Swarm Optimization). 3. Part III: Advanced topics, including multi-objective optimization and parallelization.

4 ABOUT THIS SEMINAR The seminar is divided in three main parts, each of 2 hours: 1. Part I (this one): Basic concepts and single-state algorithms (Simulated Annealing and Tabu Search). 2. Part II: Population algorithms (Evolutionary Methods and Particle Swarm Optimization). 3. Part III: Advanced topics, including multi-objective optimization and parallelization.

5 REFERENCE MATERIAL Slides are self-contained (as much as possible!) If you want to expand on the subject: Essentials of Metaheuristics (Sean Luke), avalaible online at sean/book/metaheuristics/. Part of this seminar is built on it. Selected papers at the end of each Part.

6 PROJECTS Metaheuristic optimization is really vast, meaning a great range of small projects. me at simonescardapane [at] gmail [dot] com if you re interested. You can also find me at the ISPAMM lab at DIET.

7 TABLE OF CONTENTS INTRODUCTION Definition: what are metaheuristics? Applicability BASIC CONCEPTS Hill Climbing Random Search Exploration vs. Exploitation Model representation SIMULATED ANNEALING General Algorithm Cooling functions Variations TABU SEARCH Algorithm Description Feature-based Tabu Search

8 WHAT IS A METAHEURISTIC? A metaheuristic is: An algorithm for global approximation...

9 WHAT IS A METAHEURISTIC? A metaheuristic is: An algorithm for global approximation that employs a certain degree of randomness... Belongs to the wider class of stochastic optimizers

10 WHAT IS A METAHEURISTIC? A metaheuristic is: An algorithm for global approximation that employs a certain degree of randomness... Belongs to the wider class of stochastic optimizers... and makes as few assumptions as possible.

11 WHAT IS A METAHEURISTIC? A metaheuristic is: An algorithm for global approximation that employs a certain degree of randomness... Belongs to the wider class of stochastic optimizers... and makes as few assumptions as possible. They are also known as black box optimizers. Divided into single-state methods (only one solution is analyzed at a time) vs. population methods.

12 DRAWBACKS Some of the main downsides of metaheuristics are: 1. No guarantee on global convergence 2. Hard to study in general 3. Difficult to choose the right one

13 WHEN CAN YOU USE THEM To apply a metaheuristic only need two elements are needed: 1. Representation of your hypothesis space H.

14 WHEN CAN YOU USE THEM To apply a metaheuristic only need two elements are needed: 1. Representation of your hypothesis space H. 2. Capability of assessing the goodness (or badness) of an hypothesis h H. This is equivalent to having an objective function f (h).

15 WHEN CAN YOU USE THEM To apply a metaheuristic only need two elements are needed: 1. Representation of your hypothesis space H. 2. Capability of assessing the goodness (or badness) of an hypothesis h H. This is equivalent to having an objective function f (h). 3. There is no need of an explicit representation for the objective function, nor any constraints on H.

16 WHEN SHOULD YOU USE THEM Metaheuristics are useful only when one (or more) of the following statements are true: 1. Lack of explicit representation of the objective function (e.g. soccer robot player) 2. Hard (or impossible) to compute first and second-order derivatives 3. H is too vast to be searched thoroughly. In any other case, using a metaheuristic is an overkill

17 WHY SHOULD YOU CARE Metaheuristics have a lot in common with machine learning. It should not be surprising that they have also a lot of possible applications. For example, in the case of neural networks, you can use a metaheuristic as a tool for: 1. Learning the topology of a network, 2. Pruning a network to improve performances and generalization capabilities, 3. Learning weights in an alternative way with respect to backpropagation.

18 HILL CLIMBING Maybe the simplest possible algorithm: 1. Choose a starting hypothesis h.

19 HILL CLIMBING Maybe the simplest possible algorithm: 1. Choose a starting hypothesis h. 2. Randomly tweak h to get a similar hypothesis h 2 = tweak(h). Sometimes h 2 is knows as the neighbour of h.

20 HILL CLIMBING Maybe the simplest possible algorithm: 1. Choose a starting hypothesis h. 2. Randomly tweak h to get a similar hypothesis h 2 = tweak(h). Sometimes h 2 is knows as the neighbour of h. 3. If f (h 2 ) < f (h) keep h 2, otherwise keep h.

21 HILL CLIMBING Maybe the simplest possible algorithm: 1. Choose a starting hypothesis h. 2. Randomly tweak h to get a similar hypothesis h 2 = tweak(h). Sometimes h 2 is knows as the neighbour of h. 3. If f (h 2 ) < f (h) keep h 2, otherwise keep h. 4. Repeat steps 1-4 until f (h) is not improving, or a certain amount of time has passed.

22 HILL CLIMBING Maybe the simplest possible algorithm: 1. Choose a starting hypothesis h. 2. Randomly tweak h to get a similar hypothesis h 2 = tweak(h). Sometimes h 2 is knows as the neighbour of h. 3. If f (h 2 ) < f (h) keep h 2, otherwise keep h. 4. Repeat steps 1-4 until f (h) is not improving, or a certain amount of time has passed. This is for minimizing f. Maximizing is equivalent to minimizing f (h): you just change the sign.

23 LOCAL MINIMA Note the similarities with gradient descent. Hill Climbing is highly sensitive to local minima!

24 LOCAL MINIMA Note the similarities with gradient descent. Hill Climbing is highly sensitive to local minima! A possible solution is random restarts. Then, with infinite time, it will converge to the global optimum.

25 BIAS AND ASSUMPTIONS What are the assumptions of Hill Climbing? The only one is smoothness : similar solutions behave in a similar way. This makes sense: what happens when this assumption is violated?

26 RANDOM SEARCH Without a-priori knowledge, the only possibility is Random Search. There is actually a large family of random search algorithms.

27 EXPLORATION VS. EXPLOITATION Why are these basic concepts? Almost every metaheuristic is an intelligent combination of Hill Climbing and Random Search. Why is that? They need to balance between an exploitive behaviour and an explorative one.

28 THE CLASSICAL DILEMMA Exploitation means using the current knowledge to find the (possibly sub-optimal) solution.

29 THE CLASSICAL DILEMMA Exploitation means using the current knowledge to find the (possibly sub-optimal) solution. Exploration means taking a chance in the hypothesis space.

30 THE CLASSICAL DILEMMA Exploitation means using the current knowledge to find the (possibly sub-optimal) solution. Exploration means taking a chance in the hypothesis space. Random Search is purely explorative, while Hill Climbing is purely Exploitative.

31 THE CLASSICAL DILEMMA Exploitation means using the current knowledge to find the (possibly sub-optimal) solution. Exploration means taking a chance in the hypothesis space. Random Search is purely explorative, while Hill Climbing is purely Exploitative. We will see many variations throughout the seminar.

32 HYPOTHESIS SPACE SELECTION How can you represent your hypothesis? Vast range of possibilities: 1. Real numbers, integers, vectors,

33 HYPOTHESIS SPACE SELECTION How can you represent your hypothesis? Vast range of possibilities: 1. Real numbers, integers, vectors, 2. String,

34 HYPOTHESIS SPACE SELECTION How can you represent your hypothesis? Vast range of possibilities: 1. Real numbers, integers, vectors, 2. String, 3. Trees, graphs,

35 HYPOTHESIS SPACE SELECTION How can you represent your hypothesis? Vast range of possibilities: 1. Real numbers, integers, vectors, 2. String, 3. Trees, graphs, 4. Rules...

36 HYPOTHESIS SPACE SELECTION How can you represent your hypothesis? Vast range of possibilities: 1. Real numbers, integers, vectors, 2. String, 3. Trees, graphs, 4. Rules... For each possible choice, there are many neighbouring choices (different tweak functions).

37 ALGORITHM SELECTION The third step is the choice of the metaheuristic itself. However, remember that choosing the right space is as important as choosing the right metaheuristic. Sadly, this is much more an art than a science, and depends on experience and intuition.

38 ALGORITHM SELECTION The third step is the choice of the metaheuristic itself. However, remember that choosing the right space is as important as choosing the right metaheuristic. Sadly, this is much more an art than a science, and depends on experience and intuition. That s why we re here, by the way.

39 A-PRIORI INFORMATION In case you possess additional information about your target solution, as a final step you can customize your algorithm. There are various techniques:

40 A-PRIORI INFORMATION In case you possess additional information about your target solution, as a final step you can customize your algorithm. There are various techniques: 1. You can use a multi-objective function (see Part III).

41 A-PRIORI INFORMATION In case you possess additional information about your target solution, as a final step you can customize your algorithm. There are various techniques: 1. You can use a multi-objective function (see Part III). 2. You can include some bias in the tweak function.

42 A-PRIORI INFORMATION In case you possess additional information about your target solution, as a final step you can customize your algorithm. There are various techniques: 1. You can use a multi-objective function (see Part III). 2. You can include some bias in the tweak function. 3. Other possibilities given by the metaheuristic itself (we discuss these case-by-case).

43 SIMULATED ANNEALING Same idea as Hill Climbing, however: 1. If f (h 2 ) f (h) we keep h 2 as in Hill Climbing. 2. If f (h 2 ) > f (h), we still keep it with probability: p(h, h 2, t) = e f (h 2 ) f (h) t t is knows as the temperature, while p as the schedule. This is also known as the Metropolis criterion.

44 EXPLANATION Inspired to the annealing process in metallurgy. The temperature decreases with time, down to 0, thus decreasing the probability of choosing a sub-optimal hypothesis. If the cooling schedule is extended enough, the algorithm is proven to converge to the global optimum (not useful in practice). Proven with Markov chains.

45 HEAT AND NOISE The heat works as noise inserted into Hill Climbing to prevent local minima. Thanks to Rohit Ray and for the image.

46 EXAMPLES OF COOLING FUNCTION 1. t k = α k t 0. This results in an exponential cooling schedule. 2. t k = t 0 αt, also known as a linear schedule. 3. t k = t 0 (1 k K )α. 4. See [6] for a small review of classical strategies.

47 EXAMPLES OF COOLING FUNCTION /2 Figure : Comparison of three different cooling strategies.

48 GENERAL BEHAVIOUR Simulated Annealing has a strong explorative behaviour in the beginning, but switches to a strongly exploitative one toward the end. Sometimes the choice of the scheduling function is not trivial. For this reason, methods have been devised for adjusting it: see for example the Thermodynamic Simulated Annealing.

49 SIMULATED ANNEALING IN MATLAB Widely used implementation inside the Global Optimization Toolbox. Usage: simulannealbnd(objectivefunction, StartingPoint). Many possible options, see documentation on Mathworks website. Implements by default a reanneling technique (raises the temperature at certains points).

50 VARIATIONS Some variations of Simulated Annealing prefer to use a probabilistical criterion even if the new hypothesis is better. In Threshold Accepting, new solution is kept if f (h) f (h 2 ) < Q(k) Where Q(k) is the threshold value at iteration k, generally taken as a monotonically decreasing function. This eliminates the need for a random number generator.

51 TABU SEARCH Again, same as Hill Climbing, but with a small twist. Keeps tracks of last N visited elements, and mark them as taboo. When evaluating the objective function, taboo elements are discarded.

52 TABOO AND MEMORY Despite its simplicity, it is known for being efficient in many practical applications. It tries to be as much exploitative as possible without being trapped in local minima. It is like having a primitive form of memory. Many variations exists, we look here at two.

53 CONTINUOUS FUNCTIONS If the function is continuous, we may also want to discard elements that are sufficiently similar to taboo one. Similarity measure depends on the problem, for example an L-norm can be used: L p (h, h 2 ) = ( (h i 2 hi ) p ) 1/p (However, with continuous spaces it would be better to use other metaheuristics.)

54 FEATURE-BASED TABU SEARCH If the search space is too vast, we may want to mark as taboo not a single hypothesis, but the change made to it. A classical example is the Traveling Salesman Problem, where the hypothesis is a path on the current graph. Whenever you delete edge from A to B, you mark it as taboo. Then, for a given number of iterations, the edge cannot be added again. Some categorize this as intermediate-term memory as opposed to the classical tabu list (short-term memory).

55 SELECTED BIBLIOGRAPHY I Rachid Chelouah and Patrick Siarry. Tabu search applied to global optimization. European Journal of Operational Research, 123(2): , Fred Glover and Manuel Laguna. Tabu Search. Kluwer Academic Publishers, Norwell, MA, USA, L. Ingber. Simulated Annealing: Practice versus Theory. Mathematical and Computer Modelling, 18(11):29 57, W. G. Macready and D. H. Wolpert, II. Bandit problems and the exploration/exploitation tradeoff. Trans. Evol. Comp, 2(1):2 22, April 1998.

56 SELECTED BIBLIOGRAPHY II Debasis Mitra, Fabio Romeo, and Alberto S. Vincentelli. Convergence and Finite-Time Behavior of Simulated Annealing. Advances in Applied Probability, 18(3), Yaghout Nourani and Bjarne Andresen. A comparison of simulated annealing cooling strategies. Journal of Physics A: Mathematical and General, 31(41):8373, 1998.

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

Artificial Intelligence, Computational Logic PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE Lecture 4 Metaheuristic Algorithms Sarah Gaggl Dresden, 5th May 2017 Agenda 1 Introduction 2 Constraint