How to Treat Evolutionary Algorithms as Ordinary Randomized Algorithms

Similar documents
The Fitness Level Method with Tail Bounds

Optimizing Linear Functions with Randomized Search Heuristics - The Robustness of Mutation

Multiplicative Drift Analysis

Black Box Search By Unbiased Variation

A Gentle Introduction to the Time Complexity Analysis of Evolutionary Algorithms

When to Use Bit-Wise Neutrality

Runtime Analysis of Evolutionary Algorithms: Basic Introduction 1

REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 531

A Lower Bound Analysis of Population-based Evolutionary Algorithms for Pseudo-Boolean Functions

Black-Box Search by Unbiased Variation

Runtime Analysis of Evolutionary Algorithms for the Knapsack Problem with Favorably Correlated Weights

Plateaus Can Be Harder in Multi-Objective Optimization

arxiv: v1 [cs.dm] 29 Aug 2013

An Analysis on Recombination in Multi-Objective Evolutionary Optimization

Evolutionary Algorithms How to Cope With Plateaus of Constant Fitness and When to Reject Strings of The Same Fitness

Local Search & Optimization

METHODS FOR THE ANALYSIS OF EVOLUTIONARY ALGORITHMS ON PSEUDO-BOOLEAN FUNCTIONS

A Runtime Analysis of Parallel Evolutionary Algorithms in Dynamic Optimization

Runtime Analysis of a Binary Particle Swarm Optimizer

Runtime Analysis of Binary PSO

REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 531

On the Impact of Objective Function Transformations on Evolutionary and Black-Box Algorithms

Geometric Semantic Genetic Programming (GSGP): theory-laden design of variation operators

Local Search & Optimization

Crossover can be constructive when computing unique input output sequences

Lecture 5: The Principle of Deferred Decisions. Chernoff Bounds

Geometric Semantic Genetic Programming (GSGP): theory-laden design of semantic mutation operators

Faster Black-Box Algorithms Through Higher Arity Operators

UNIVERSITY OF DORTMUND

Simulated Annealing Beats Metropolis in Combinatorial Optimization

Expected Running Time Analysis of a Multiobjective Evolutionary Algorithm on Pseudo-boolean Functions

A Parameterized Runtime Analysis of Simple Evolutionary Algorithms for Makespan Scheduling

Theory of Evolutionary Computation: A True Beginners Tutorial

REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 531

When Is an Estimation of Distribution Algorithm Better than an Evolutionary Algorithm?

Local and Stochastic Search

On the Effectiveness of Sampling for Evolutionary Optimization in Noisy Environments

Runtime Analysis of a Simple Ant Colony Optimization Algorithm

PROBABILISTIC ANALYSIS OF THE (1 + 1)-EVOLUTIONARY ALGORITHM

Runtime Analyses for Using Fairness in Evolutionary Multi-Objective Optimization

The Nottingham eprints service makes this work by researchers of the University of Nottingham available open access under the following conditions.

First hitting time analysis of continuous evolutionary algorithms based on average gain

Theory of Evolutionary Computation

CIS 121 Data Structures and Algorithms with Java Spring Big-Oh Notation Monday, January 22/Tuesday, January 23

REIHE COMPUTATIONAL INTELLIGENCE S O N D E R F O R S C H U N G S B E R E I C H 5 3 1

UNIVERSITY OF DORTMUND

Runtime Analysis of the (1 + (λ, λ)) Genetic Algorithm on Random Satisfiable 3-CNF Formulas

When to use bit-wise neutrality

Switch Analysis for Running Time Analysis of Evolutionary Algorithms

Improved time complexity analysis of the Simple Genetic Algorithm

Average Case Analysis. October 11, 2011

Metaheuristics and Local Search

Refined Upper Bounds on the Expected Runtime of Non-elitist Populations from Fitness-Levels

Computing Single Source Shortest Paths using Single-Objective Fitness Functions

REIHE COMPUTATIONAL INTELLIGENCE S O N D E R F O R S C H U N G S B E R E I C H 5 3 1

PROBLEM SOLVING AND SEARCH IN ARTIFICIAL INTELLIGENCE

A Comparison of GAs Penalizing Infeasible Solutions and Repairing Infeasible Solutions on the 0-1 Knapsack Problem

Runtime Analysis of (1+1) EA on Computing Unique Input Output Sequences

Evolutionary Algorithms and Other Search Heuristics. Bioinspired Computation in Combinatorial Optimization

Lecture 4: Two-point Sampling, Coupon Collector s problem

TECHNISCHE UNIVERSITÄT DORTMUND REIHE COMPUTATIONAL INTELLIGENCE COLLABORATIVE RESEARCH CENTER 531

1 Randomized Computation

Runtime Analysis of Genetic Algorithms with Very High Selection Pressure

A New Approach to Estimating the Expected First Hitting Time of Evolutionary Algorithms

The (1+) evolutionary algorithm with self-adjusting mutation rate

Metaheuristics and Local Search. Discrete optimization problems. Solution approaches

UNIVERSITY OF DORTMUND

More Approximation Algorithms

UNIVERSITY OF DORTMUND

arxiv: v2 [cs.ds] 11 Oct 2017

Computational Intelligence Winter Term 2018/19

UNIVERSITY OF DORTMUND

Algorithms and Complexity theory

When Hypermutations and Ageing Enable Artificial Immune Systems to Outperform Evolutionary Algorithms 1. Dogan Corus, Pietro S. Oliveto, Donya Yazdani

Average Drift Analysis and Population Scalability

Bounded Approximation Algorithms

Homework 4 Solutions

Computational statistics

CS 331: Artificial Intelligence Local Search 1. Tough real-world problems

Lecture 9 Evolutionary Computation: Genetic algorithms

Simple Max-Min Ant Systems and the Optimization of Linear Pseudo-Boolean Functions

Lecture 12: Interactive Proofs

MARKING A BINARY TREE PROBABILISTIC ANALYSIS OF A RANDOMIZED ALGORITHM

Evolutionary Computation

Chernoff Bounds. Theme: try to show that it is unlikely a random variable X is far away from its expectation.

Zebo Peng Embedded Systems Laboratory IDA, Linköping University

Random Search. Shin Yoo CS454, Autumn 2017, School of Computing, KAIST

Introduction to Computer Science and Programming for Astronomers

Fourier Sampling & Simon s Algorithm

Analysis of Evolutionary Algorithms for the Longest Common Subsequence Problem

This lecture will review everything you learned in Basic tools in probability

20.1 2SAT. CS125 Lecture 20 Fall 2016

CS 781 Lecture 9 March 10, 2011 Topics: Local Search and Optimization Metropolis Algorithm Greedy Optimization Hopfield Networks Max Cut Problem Nash

Evolutionary Dynamics on Graphs

DETECTING THE FAULT FROM SPECTROGRAMS BY USING GENETIC ALGORITHM TECHNIQUES

Simulation. Where real stuff starts

Selection and Adversary Arguments. COMP 215 Lecture 19

1 Randomized complexity

CSC 4510 Machine Learning

CS161: Algorithm Design and Analysis Recitation Section 3 Stanford University Week of 29 January, Problem 3-1.

Transcription:

How to Treat Evolutionary Algorithms as Ordinary Randomized Algorithms Technical University of Denmark (Workshop Computational Approaches to Evolution, March 19, 2014) 1/22

Context Running time/complexity analysis of randomized search heuristics (RSHs) e. g., evolutionary algorithms (EA), local search, simulated annealing (SA), ant colony optimization, particle swarm optimization no Initialization Selection Variation Selection Stop? Purpose: find optimum of objective/fitness function f : D R x f (x) 2/22

Context Running time/complexity analysis of randomized search heuristics (RSHs) e. g., evolutionary algorithms (EA), local search, simulated annealing (SA), ant colony optimization, particle swarm optimization no Initialization Selection Variation Selection Stop? Purpose: find optimum of objective/fitness function f : D R x f (x) study the (expected) optimization time started in late 1980s/early 1990s with analysis of SA using/adapting techniques from classical randomized algorithms nowadays body of results and methods for the analysis available, especially in combinatorial optimization 2/22

Contents Challenge Very innocently looking settings surprisingly hard to analyze since EAs (and other RSHs) were not made for analysis Aim for today Show selected (state-of-the-art) results and techniques; discuss challenges and opportunies of approach 1 Introducing an important technique: drift analysis 2 An application to weighted hillclimbing with EAs: linear functions 3 Back to classical TCS? 3/22

Drift Analysis: Idea EAs are randomized algorithms analyze their running time = time to reach optimum (or some approximation). 4/22

Drift Analysis: Idea EAs are randomized algorithms analyze their running time = time to reach optimum (or some approximation). EAs induce very complex stochastic processes. In addition, want to analyze a complex random variable related to a global property (a first hitting time). Can rarely understand the global behavior completely. 4/22

Drift Analysis: Idea EAs are randomized algorithms analyze their running time = time to reach optimum (or some approximation). EAs induce very complex stochastic processes. In addition, want to analyze a complex random variable related to a global property (a first hitting time). Can rarely understand the global behavior completely. But can roughly understand what happens from one step to the next a local property. Drift analysis is the tool to translate the one-step local behavior into a global statement on the running time. 4/22

A Simple Scenario: Additive Drift prob. 1 2 prob. 1 2 You start off at distance X 0 = n (discrete r.v.) from optimum. In each step, reduce distance by at least d in expectation: E(X t X t+1 X t > 0) d maybe sometimes more or less. Find (or bound) the expected time to reach 0: E(T 0 )? 0 (E(Xt X t+1 ) = 2) 5/22

A Simple Scenario: Additive Drift prob. 1 2 prob. 1 2 You start off at distance X 0 = n (discrete r.v.) from optimum. In each step, reduce distance by at least d in expectation: E(X t X t+1 X t > 0) d maybe sometimes more or less. Find (or bound) the expected time to reach 0: E(T 0 )? 0 (E(Xt X t+1 ) = 2) The obvious answer is correct: E(T 0 ) n d. Analogously E(X t X t+1 X t > 0) d E(T 0 ) n d. Scenario is called additive drift. Brought into theory of EAs by He/Yao (2001). 5/22

Multiplicative Drift Still X 0 = n. In each step, reduce distance by expected δ-factor of current one: E(X t X t+1 X t = s) δs. Find E(T 0 ). Example: E(X t+1 X t = s) = s 2. Your guess for E(T 0)? 0 (δ = 1/2) 6/22

Multiplicative Drift Still X 0 = n. In each step, reduce distance by expected δ-factor of current one: E(X t X t+1 X t = s) δs. Find E(T 0 ). Example: E(X t+1 X t = s) = s 2. Your guess for E(T 0)? 0 (δ = 1/2) E(T 0 ) 2(ln n + 1), in general E(T 0 ) Lower bound? (ln X0)+1 δ. 6/22

Multiplicative Drift Still X 0 = n. In each step, reduce distance by expected δ-factor of current one: E(X t X t+1 X t = s) δs. Find E(T 0 ). Example: E(X t+1 X t = s) = s 2. Your guess for E(T 0)? 0 (δ = 1/2) E(T 0 ) 2(ln n + 1), in general E(T 0 ) (ln X0)+1 δ. Lower bound? Say Pr(X t+1 = 0) = Pr(X t+1 = X t ) = 1 2. Then still E(X t+1 X t = s) = s 2. However, E(T 0) = 2. ln X0 Matching lower bound (E(T 0 ) δ ) requires concentration of the one-step progress often met in EAs. 6/22

Multiplicative Drift Still X 0 = n. In each step, reduce distance by expected δ-factor of current one: E(X t X t+1 X t = s) δs. Find E(T 0 ). Example: E(X t+1 X t = s) = s 2. Your guess for E(T 0)? 0 (δ = 1/2) E(T 0 ) 2(ln n + 1), in general E(T 0 ) (ln X0)+1 δ. Lower bound? Say Pr(X t+1 = 0) = Pr(X t+1 = X t ) = 1 2. Then still E(X t+1 X t = s) = s 2. However, E(T 0) = 2. ln X0 Matching lower bound (E(T 0 ) δ ) requires concentration of the one-step progress often met in EAs. Scenario is called multiplicative drift more typical than additive drift in analysis of EAs. Due to Doerr/Johannsen/Winzen and Doerr/Goldberg (2010). 6/22

A Warm-up Example Using Multiplicative Drift An extremely simple EA: (1+1) EA 1 Choose x {0, 1} n uniformly at random. 2 Repeat Create y by flipping each bit in x independently with probability 1 n. If f (y) f (x), set x := y. 3 Until happy. 7/22

A Warm-up Example Using Multiplicative Drift An extremely simple EA: (1+1) EA 1 Choose x {0, 1} n uniformly at random. 2 Repeat Create y by flipping each bit in x independently with probability 1 n. If f (y) f (x), set x := y. 3 Until happy. An extremely simple problem OneMax(x 1,..., x n ) (in fact: OneMin ) = x 1 + + x n 7/22

A Warm-up Example Using Multiplicative Drift An extremely simple EA: (1+1) EA 1 Choose x {0, 1} n uniformly at random. 2 Repeat Create y by flipping each bit in x independently with probability 1 n. If f (y) f (x), set x := y. 3 Until happy. An extremely simple problem OneMax(x 1,..., x n ) (in fact: OneMin ) = x 1 + + x n Analysis: Say X t = OneMax(x) = s, where s {1,..., n}. s ways of flipping a single one-bit, each prob. 1 n E(X t X t+1 X t = s) s en δ = 1 en E(T 0 ) ln(x0)+1 δ ln(n)+1 1/(en) ( ) 1 1 n 1 n 1 en. in multiplicative drift theorem = en ln n + en = O(n log n). 7/22

Further Applications of Drift Analysis Bounds on the running time of population-based EAs, e. g. The Simple GA (SGA) 1 Create population P of µ randomly chosen individuals. 2 C :=. 3 While C < µ do Fitness-proportional selection: Select two parents x and x from P proportional to their fitness. Uniform crossover: Create an offspring x by setting each bit x i = x i probability 1/2 and x i = x i otherwise, for 1 i n. Standard Bit Mutation: Flip each bit x i of x with probability 1/n. C := C {x}. 4 Set P := C and go to 2. EAs in combinatorial optimization (minimum spanning trees, maximum matchings, makespan scheduling... ) EAs in noisy optimization (drift methods really crucial here)... with 8/22

Linear Functions: Generalized OneMax Aim Case study in drift analysis: bounds for linear functions. Increases understanding of an innocently looking problem. Impact Prove optimality of parameter setting from practice. Prove optimality of (1+1) EA within a larger class of EAs. 9/22

The Setting Task: minimize linear pseudo-boolean f : {0, 1} n R f (x 1,..., x n ) = w 1 x 1 + + w n x n w. l. o. g. 0 < w 1 w n 10/22

The Setting Task: minimize linear pseudo-boolean f : {0, 1} n R f (x 1,..., x n ) = w 1 x 1 + + w n x n w. l. o. g. 0 < w 1 w n Randomized Local Search (RLS) 1 Choose x {0, 1} n uniformly at random. 2 Repeat Create y by flipping exactly one bit in x, chosen uniformly at random. If f (y) f (x), set x := y. 3 Until happy. 10/22

The Setting Task: minimize linear pseudo-boolean f : {0, 1} n R f (x 1,..., x n ) = w 1 x 1 + + w n x n w. l. o. g. 0 < w 1 w n (1+1) EA 1 Choose x {0, 1} n uniformly at random. 2 Repeat Create y by flipping each bit in x indep ly with prob. p (default p = 1 n ). If f (y) f (x), set x := y. 3 Until happy. 10/22

The Setting Task: minimize linear pseudo-boolean f : {0, 1} n R f (x 1,..., x n ) = w 1 x 1 + + w n x n w. l. o. g. 0 < w 1 w n (1+1) EA 1 Choose x {0, 1} n uniformly at random. 2 Repeat Create y by flipping each bit in x indep ly with prob. p (default p = 1 n ). If f (y) f (x), set x := y. 3 Until happy. Expected time for minimization? Can t we just reuse OneMax result? Or use another classical method (coupon collector problem)? 10/22

The Setting Task: minimize linear pseudo-boolean f : {0, 1} n R f (x 1,..., x n ) = w 1 x 1 + + w n x n w. l. o. g. 0 < w 1 w n (1+1) EA 1 Choose x {0, 1} n uniformly at random. 2 Repeat Create y by flipping each bit in x indep ly with prob. p (default p = 1 n ). If f (y) f (x), set x := y. 3 Until happy. Expected time for minimization? Can t we just reuse OneMax result? Or use another classical method (coupon collector problem)? Yes, BUT only for RLS n ln n + O(n) iterations expected 10/22

What Makes Analysis of (1+1) EA Difficult Many light bits exchanged for few heavy ones, (0,..., 0, 1) might be improved to (1,..., 1, 0). How does this affect running time? 11/22

What Makes Analysis of (1+1) EA Difficult Many light bits exchanged for few heavy ones, (0,..., 0, 1) might be improved to (1,..., 1, 0). How does this affect running time? Some prior work Expected running time of (1+1) EA with p = 1 n on linear function O(n 2 ) (not too difficult) O(n ln n) (Droste, Jansen, Wegener 1998/2002) (1 + o(1))2.02en ln n (Jägersküpper 2008/2011) (1 + o(1))1.39en ln n (Doerr, Johannsen, Winzen 2010) en ln n O(n) (DJW 2010 + Doerr, Fouz, W. 2011). 11/22

What Makes Analysis of (1+1) EA Difficult Many light bits exchanged for few heavy ones, (0,..., 0, 1) might be improved to (1,..., 1, 0). How does this affect running time? Some prior work Expected running time of (1+1) EA with p = 1 n on linear function O(n 2 ) (not too difficult) O(n ln n) (Droste, Jansen, Wegener 1998/2002) (1 + o(1))2.02en ln n (Jägersküpper 2008/2011) (1 + o(1))1.39en ln n (Doerr, Johannsen, Winzen 2010) en ln n O(n) (DJW 2010 + Doerr, Fouz, W. 2011). Main new result (W. 2012/13) Running time bound (1 ± o(1)) ec c n ln n if p = c n ; optimal for p = 1 n, which is an often recommended mutation probability in practice. 11/22

What Makes Analysis of (1+1) EA Difficult Many light bits exchanged for few heavy ones, (0,..., 0, 1) might be improved to (1,..., 1, 0). How does this affect running time? Some prior work Expected running time of (1+1) EA with p = 1 n on linear function O(n 2 ) (not too difficult) O(n ln n) (Droste, Jansen, Wegener 1998/2002) (1 + o(1))2.02en ln n (Jägersküpper 2008/2011) (1 + o(1))1.39en ln n (Doerr, Johannsen, Winzen 2010) en ln n O(n) (DJW 2010 + Doerr, Fouz, W. 2011). Main new result (W. 2012/13) Running time bound (1 ± o(1)) ec c n ln n if p = c n ; optimal for p = 1 n, which is an often recommended mutation probability in practice. Note: ( (1 ) c n 1 ( ) 1 n c n 1) n e c c is waiting time for single-bit flip. 11/22

Discussion of Distance Function OneMax(x 1,..., x n ) BinVal(x 1,..., x n ) = = x 1 + + x n x 1 + 2x 2 + 4x 3 + + 2 n 1 x n 12/22

Discussion of Distance Function OneMax(x 1,..., x n ) BinVal(x 1,..., x n ) = = x 1 + + x n x 1 + 2x 2 + 4x 3 + + 2 n 1 x n OneMax: 0-bits never lost distance (measured in 1-bits) non-increasing multiplicative drift en(ln n + 1). BinVal: can lose almost all 0-bits. 12/22

Discussion of Distance Function OneMax(x 1,..., x n ) BinVal(x 1,..., x n ) = = x 1 + + x n x 1 + 2x 2 + 4x 3 + + 2 n 1 x n OneMax: 0-bits never lost distance (measured in 1-bits) non-increasing multiplicative drift en(ln n + 1). BinVal: can lose almost all 0-bits. Consider (1-bits) := (no. 1-bits time t no. 1-bits time t + 1): 12/22

Discussion of Distance Function OneMax(x 1,..., x n ) BinVal(x 1,..., x n ) = = x 1 + + x n x 1 + 2x 2 + 4x 3 + + 2 n 1 x n OneMax: 0-bits never lost distance (measured in 1-bits) non-increasing multiplicative drift en(ln n + 1). BinVal: can lose almost all 0-bits. Consider (1-bits) := (no. 1-bits time t no. 1-bits time t + 1): in general random possibly negative (= bad) but positive expectation 1 (n 1) 1 1 (additive drift) n n 12/22

Discussion of Distance Function OneMax(x 1,..., x n ) BinVal(x 1,..., x n ) = = x 1 + + x n x 1 + 2x 2 + 4x 3 + + 2 n 1 x n OneMax: 0-bits never lost distance (measured in 1-bits) non-increasing multiplicative drift en(ln n + 1). BinVal: can lose almost all 0-bits. Consider (1-bits) := (no. 1-bits time t no. 1-bits time t + 1): in general random possibly negative (= bad) but positive expectation 1 (n 1) 1 1 (additive drift) n n Problem: 1 n not sufficient for our purpose, gives only O(n2 ). Need different potential/distance function w. r. t. search points distance from optimum is not measured in no. 1-bits. 12/22

Finding the Right Potential Function From now on: f (x 1,..., x n ) = w 1 x 1 + + w n x n arbitrary. No. one-bits usually not a good potential function. 13/22

Finding the Right Potential Function From now on: f (x 1,..., x n ) = w 1 x 1 + + w n x n arbitrary. No. one-bits usually not a good potential function. Classical potential functions Could try X t := f -value at time t, which is a weighted sum of 1-bits; leads to good multiplicative drift, however ln(x 0 ) might be huge. Potential function should compress steeply increasing weights. Previous work: X t := n i=n/2+1 2x(t) i + n/2 i=1 x(t) i or similar 13/22

Finding the Right Potential Function From now on: f (x 1,..., x n ) = w 1 x 1 + + w n x n arbitrary. No. one-bits usually not a good potential function. Classical potential functions Could try X t := f -value at time t, which is a weighted sum of 1-bits; leads to good multiplicative drift, however ln(x 0 ) might be huge. Potential function should compress steeply increasing weights. Previous work: X t := n i=n/2+1 2x(t) i + n/2 i=1 x(t) i or similar Adaptive potential functions Idea: do not compress all f in the same way. If the linear function at hand is OneMax, why not use OneMax? If it is BinVal, can we use more slowly increasing weights? First time used by Doerr and Goldberg (2010). 13/22

New Potential Function Proves Upper Bound Define g(x) = g 1 x 1 + + g n x n by g 1 := 1 = γ 1 and g i ( ) i 1 (ln ln n)p := min 1 + (1 p) }{{ n 1, g i 1 } =:γ i w i w i 1 for 2 i n. 14/22

New Potential Function Proves Upper Bound Define g(x) = g 1 x 1 + + g n x n by g 1 := 1 = γ 1 and g i ( ) i 1 (ln ln n)p := min 1 + (1 p) }{{ n 1, g i 1 } =:γ i w i w i 1 for 2 i n. Properties g includes the mutation probability p. If f has steep coefficients (e. g., BinVal), then g i = γ i. In flat regions, we reproduce f (with some scaling). 14/22

New Potential Function Proves Upper Bound Define g(x) = g 1 x 1 + + g n x n by g 1 := 1 = γ 1 and g i ( ) i 1 (ln ln n)p := min 1 + (1 p) }{{ n 1, g i 1 } =:γ i w i w i 1 for 2 i n. Properties g includes the mutation probability p. If f has steep coefficients (e. g., BinVal), then g i = γ i. In flat regions, we reproduce f (with some scaling). Crucial: E(X t X t X t = s) (1 o(1)) s p (1 p) n 1 (X t is g-value at time t). 14/22

Further Results OneMax: stochastically smallest optimization time within class Tight lower bounds using multiplicative drift Phase transition from polynomial to superpolynomial in regime p = Θ((ln n)/n) (1+1) EA as optimal mutation-based, population-based algorithm for t := 0 µ 1 do create x t {0, 1} n uniformly at random. end for repeat select x {x 0,..., x t} according to t and f (x 0),..., f (x t). create x t+1 by flipping each bit in x indep ly with probability p. t := t + 1. until happy 15/22

Recent Developments Is expected running time the right measure of efficiency? Where is drift analysis going? Is drift analysis useful for other purposes than analysis of RSHs? 16/22

Why Expected Time is Not Enough (1+1) EA on Distance (DJW 2002) fitness 30 20 10 Each with prob. 1 2 : local optimum reached, time n n to escape global optimum found in O(n log n) steps 0 0 1 2 3 4 5 6 7 8 9 10 no. of one-bits 17/22

Why Expected Time is Not Enough (1+1) EA on Distance (DJW 2002) fitness 30 20 10 Each with prob. 1 2 : local optimum reached, time n n to escape global optimum found in O(n log n) steps 0 0 1 2 3 4 5 6 7 8 9 10 no. of one-bits Expected running time Ω(n n ) for (1+1) EA but function is not really difficult (expected 2 restarts sufficient) results on the distribution of running time required. 17/22

Sharp Concentration of Measure Classical example X number of 1-bits in uniformly random x {0, 1} n. Then (Chernoff) Pr( X E(X) δ n) 2e δ2 /3. deviation in lower-order term exponentially unlikely. 18/22

Sharp Concentration of Measure Classical example X number of 1-bits in uniformly random x {0, 1} n. Then (Chernoff) Pr( X E(X) δ n) 2e δ2 /3. deviation in lower-order term exponentially unlikely. Similar results for running time? Known: for running time T of (1+1) EA on OneMax: Pr(T > E(T ) + δn) e δ/e (by multiplicative drift analysis). No corresponding lower tail (Pr(T < E(T ) δn)) known before. 18/22

Sharp Concentration of Measure Classical example X number of 1-bits in uniformly random x {0, 1} n. Then (Chernoff) Pr( X E(X) δ n) 2e δ2 /3. deviation in lower-order term exponentially unlikely. Similar results for running time? Known: for running time T of (1+1) EA on OneMax: Pr(T > E(T ) + δn) e δ/e (by multiplicative drift analysis). No corresponding lower tail (Pr(T < E(T ) δn)) known before. Will also get lower tails with new theorems. Running times are identified as almost deterministic, despite algorithms being heavily randomized. 18/22

Sharp Concentration of Running Times Drift analysis with tail bounds (Lehre and W., 2014) Suppose we know (bound on) h(s) = E(X t X t+1 X t = s). Consider potential function g(s) = 1 h(1) + s 1 1 h(x) dx Bound E(e λ(g(s) g(xt+1)) X t = s) β < 1 for some λ > 0. Then Pr(T 0 > t) β t e λg(x0). Analogous bound on Pr(T 0 < t) available. 19/22

Sharp Concentration of Running Times Drift analysis with tail bounds (Lehre and W., 2014) Suppose we know (bound on) h(s) = E(X t X t+1 X t = s). Consider potential function g(s) = 1 h(1) + s 1 1 h(x) dx Bound E(e λ(g(s) g(xt+1)) X t = s) β < 1 for some λ > 0. Then Pr(T 0 > t) β t e λg(x0). Analogous bound on Pr(T 0 < t) available. Implications within theory of RSHs For (1+1) EA on OneMax: Pr( T en ln n δn) e Ω(δ). Also valid for linear functions! Similar results for other benchmark functions (e. g., expected Θ(n 2 ), deviation δn 3/2 has prob. e Ω(δ). 19/22

Drift Analysis and Probabilistic Recurrences Scenario Consider a recurrrence T (n) = 1 + T (h(n)), where h(n) n random and E(h(n)) known; T (0) = 0. 20/22

Drift Analysis and Probabilistic Recurrences Scenario Consider a recurrrence T (n) = 1 + T (h(n)), where h(n) n random and E(h(n)) known; T (0) = 0. Karp (JACM 1994) bounds Pr(T (n) u(n) + δ), where u(n) solution of T (n) = 1 + T (E(h(n))) (de-randomized recurrence). Recurrence is a stochastic process: X 0 = n, X t+1 = h(x t ). Can be treated by drift analysis. 20/22

Drift Analysis and Probabilistic Recurrences Scenario Consider a recurrrence T (n) = 1 + T (h(n)), where h(n) n random and E(h(n)) known; T (0) = 0. Karp (JACM 1994) bounds Pr(T (n) u(n) + δ), where u(n) solution of T (n) = 1 + T (E(h(n))) (de-randomized recurrence). Recurrence is a stochastic process: X 0 = n, X t+1 = h(x t ). Can be treated by drift analysis. Application to random combinatorial structure Draw random permutation π of {1,..., n}. 20/22

Drift Analysis and Probabilistic Recurrences Scenario Consider a recurrrence T (n) = 1 + T (h(n)), where h(n) n random and E(h(n)) known; T (0) = 0. Karp (JACM 1994) bounds Pr(T (n) u(n) + δ), where u(n) solution of T (n) = 1 + T (E(h(n))) (de-randomized recurrence). Recurrence is a stochastic process: X 0 = n, X t+1 = h(x t ). Can be treated by drift analysis. Application to random combinatorial structure Draw random permutation π of {1,..., n}. Consider cycles, e. g., 1 5 8 3 1. 20/22

Drift Analysis and Probabilistic Recurrences Scenario Consider a recurrrence T (n) = 1 + T (h(n)), where h(n) n random and E(h(n)) known; T (0) = 0. Karp (JACM 1994) bounds Pr(T (n) u(n) + δ), where u(n) solution of T (n) = 1 + T (E(h(n))) (de-randomized recurrence). Recurrence is a stochastic process: X 0 = n, X t+1 = h(x t ). Can be treated by drift analysis. Application to random combinatorial structure Draw random permutation π of {1,..., n}. Consider cycles, e. g., 1 5 8 3 1. How to determine T (n), the number of cycles? Recurrence T (n) = 1 + T (h(n)), where h(n) Unif(0,..., n 1). 20/22

Drift Analysis and Probabilistic Recurrences Scenario Consider a recurrrence T (n) = 1 + T (h(n)), where h(n) n random and E(h(n)) known; T (0) = 0. Karp (JACM 1994) bounds Pr(T (n) u(n) + δ), where u(n) solution of T (n) = 1 + T (E(h(n))) (de-randomized recurrence). Recurrence is a stochastic process: X 0 = n, X t+1 = h(x t ). Can be treated by drift analysis. Application to random combinatorial structure Draw random permutation π of {1,..., n}. Consider cycles, e. g., 1 5 8 3 1. How to determine T (n), the number of cycles? Recurrence T (n) = 1 + T (h(n)), where h(n) Unif(0,..., n 1). Karp: Pr(T (n) > log 2 (n + 1) + δ) 2 δ+1 20/22

Drift Analysis and Probabilistic Recurrences Scenario Consider a recurrrence T (n) = 1 + T (h(n)), where h(n) n random and E(h(n)) known; T (0) = 0. Karp (JACM 1994) bounds Pr(T (n) u(n) + δ), where u(n) solution of T (n) = 1 + T (E(h(n))) (de-randomized recurrence). Recurrence is a stochastic process: X 0 = n, X t+1 = h(x t ). Can be treated by drift analysis. Application to random combinatorial structure Draw random permutation π of {1,..., n}. Consider cycles, e. g., 1 5 8 3 1. How to determine T (n), the number of cycles? Recurrence T (n) = 1 + T (h(n)), where h(n) Unif(0,..., n 1). Karp: Pr(T (n) > log 2 (n + 1) + δ) 2 δ+1 For comparison, E(T (n)) = ln n + Θ(1); 20/22

Drift Analysis and Probabilistic Recurrences Scenario Consider a recurrrence T (n) = 1 + T (h(n)), where h(n) n random and E(h(n)) known; T (0) = 0. Karp (JACM 1994) bounds Pr(T (n) u(n) + δ), where u(n) solution of T (n) = 1 + T (E(h(n))) (de-randomized recurrence). Recurrence is a stochastic process: X 0 = n, X t+1 = h(x t ). Can be treated by drift analysis. Application to random combinatorial structure Draw random permutation π of {1,..., n}. Consider cycles, e. g., 1 5 8 3 1. How to determine T (n), the number of cycles? Recurrence T (n) = 1 + T (h(n)), where h(n) Unif(0,..., n 1). Karp: Pr(T (n) > log 2 (n + 1) + δ) 2 δ+1 For comparison, E(T (n)) = ln n + Θ(1); note: log 2 n 1.44 ln n. 20/22

Sharp Tail Bound on No. of Cycles New result Using drift analysis with tail bounds, we get Pr(T (n) < (1 δ)(ln n)) e δ2 4 (1 o(1)) ln n and Pr(T (n) > (1 + δ)((ln n) + 1)) e min{δ,δ2 } 6 ln n. Compared to Karp s technique, need knowledge of distribution of h(n), not only the expectation. 21/22

Sharp Tail Bound on No. of Cycles New result Using drift analysis with tail bounds, we get Pr(T (n) < (1 δ)(ln n)) e δ2 4 (1 o(1)) ln n and Pr(T (n) > (1 + δ)((ln n) + 1)) e min{δ,δ2 } 6 ln n. Compared to Karp s technique, need knowledge of distribution of h(n), not only the expectation. Obvious topics for future research T (n) = a(n) + T (h(n)) for a(n) > 1 More than one recursive call? 21/22

Summary Drift analysis for the analysis of EAs Introduced additive and multiplicative drift. Applied drift analysis to bound running time of (1+1) EA on linear functions: Tight bounds up to lower-order terms. Structural insight: multiple-bit flips do neither help or harm. Practically used mutation probability is theoretically optimal. Greedy (1+1) EA is optimal mutation-based, population-based EA. Adaptive multiplicative drift analysis is very powerful technique, but finding the right potential function can be difficult. Drift analysis can yield sharp-concentration results. Looking into EAs from CS perspective may be promising: techniques developed here may be useful for problems from classical TCS. 22/22

Summary Drift analysis for the analysis of EAs Introduced additive and multiplicative drift. Applied drift analysis to bound running time of (1+1) EA on linear functions: Tight bounds up to lower-order terms. Structural insight: multiple-bit flips do neither help or harm. Practically used mutation probability is theoretically optimal. Greedy (1+1) EA is optimal mutation-based, population-based EA. Adaptive multiplicative drift analysis is very powerful technique, but finding the right potential function can be difficult. Drift analysis can yield sharp-concentration results. Looking into EAs from CS perspective may be promising: techniques developed here may be useful for problems from classical TCS. Thank you for your attention! 22/22