The Genetic Algorithm is Useful to Fitting Input Probability Distributions for Simulation Models Johann Christoph Strelen Rheinische Friedrich Wilhelms Universität Bonn Römerstr. 164, 53117 Bonn, Germany E-mail: strelen@cs.uni-bonn.de ASTC2003
The Genetic Algorithm is Useful to Fitting Input Probability Distributions for Simulation Models Modelling influences from outside of a stochastic discrete event model as random variates with a suited distribution Identification of this distribution How can the genetic algorithm be used for this purpose? 2
Problem Given: Data x 1, x 2,..., measured for a specific aspect of a real system Assumption: Can be modelled as independent realizations of a random variable Want to know: Which distribution? Solution in three steps: Step Classical method Genetic algorithm method 1. Which theoretical distribution? Graphical Weighted sum of distributions 2. Parameter values? Maximum likelihood Genetic algorithm 3. Fits the distribution? Goodness-of-fit tests (Objective function) Graphical methods: Histogram, quantile summaries (box plots) Theoretical distribution: Family of distributions with parameters, e.g. exponential, Weibull, lognormal. 3
Outline Five purposes, solved with the genetic algoritm (GA) Parameter estimation Selection of theoretical distributions Examples The genetic algorithm: Mimics the metaphor of natural biological evolution, the fittest individuals ( ˆ= tuples of parameter values) survive = optimization 4
Objective function Z(d) = [ ˆF (x 1 ) F d (x 1 )] 2 + [ ˆF (x 2 ) F d (x 2 )] 2 +... data x 1, x 2,... selected distribution function F d (x) parameter tuple d empirical distribution function ˆF (x) Measures accuracy, the smaller the better The genetic algorithm seaches for a minimum in the parameter space 5
The Genetic Algorithm Generations of populations are generated Population: Individuals Individual: Concatanated encoded parameter values, chromosome Grey code: Binary, given minimum and maximum, adjacent values differ in just one bit Fitness of a individual: Measure for the accuracy (objective function) After a specified number of generations, the individual with the best objective function value is the result. 6
The Genetic Algorithm(2) New Generation with new individuals: Generation gap: The fittest old individuals remain unchanged Selection: The others are selected randomly according to their fitness for breeding offspring Recombination of parts of two chromosomes(crossover) Mutation: Single bits change their state with some probability Reinsertion: The least fit parents are replaced with offspring 7
Purpose 1, Parameter Estimation Given data and a family of distributions Calculate parameter values such that given data are fitted The number of different parameters is not so crucial as with nonlinear equations 8
Example 1, Fitting Data Drawn from a Weibull Distribution 800 realizations, distribution function F (x) = 1 exp[ (x/3) 2 ] Fitted F (x) = 1 exp[ (x/3.031) 2.047 ] 200 generations, accuracy Z(2.047, 3.031) = 0.0293 9 Smooth curve: Fitted theoretical distribution function Scribbling curve: Empirical distribution function
Sometimes the distribution function has no closed form e.g. the Gamma or Lognormal distribution Calculate the distribution function F d (x) at the values x i, i = 1,..., n approximately with simple numerical integration from the density f d (x): F (x 1 ) = 1/n, i F (x i ) = (x j x j 1 )f d (x j 1 ), i = 2,..., n, j=2 F d (x i ) F (x i )/ F (x n ), i = 1,..., n. 10
Example 3, Gamma Distribution with Numerical Integration 800 realizations of a Gamma random variable Density f(x) = β α x α 1 exp[ (x/β)]/γ(α), α = 3 and β = 3 After 100 generations: Fitted density α = 3.01 and β = 2.94 Accuracy 0.031 11
Purpose 2, Similar to Purpose 1 but with Multi-Mode Distributions F (x) = p 1 F 1 (x) + p 2 F 2 (x) +..., p 1 + p 2 +... = 1, F 1, F 2,... same family of distributions but different parameter values 12
Example 2, Two-Mode Weibull Distribution 800 realizations, distribution function F (x) = 1 0.5 exp[ (x/3) 2 ] 0.5 exp[ (x/17) 5 ] Fitted F (x) = 1 0.51 exp[ (x/2.95) 2.05 ] 0.49 exp[ (x/16.98) 5.14 ] Accuracy 0.024 13
Purpose 3, Mixed Different Distributions Purpose 4, Falsification of a Theoretical Distribution One tries to fit a theoretical distribution to the data Objective function Z(d) remains large = the theoretical distribution is bad 14
Example 4, Fitting a Wrong Distribution 3200 realizations of a Weibull random variable Tried to fit these data with a Gamma distribution After 800 generations: Accuracy only 0.76 = Gamma distribution is not suited 15
Purpose 5, Automatic Selection of a Theoretical Distribution one tries to fit data with a mixed distribution F (x) = p 1 F 1 (x) + p 2 F 2 (x) +..., p 1 + p 2 +... = 1, F 1, F 2,... different theoretical distributions p 1 1, p 2 0, p 3 0,... = F 1 (x) is good Or some p i are significantly greater than zero = the according mixed distribution is good 16
Example 5, Decision between Gamma and Weibull Distribution 400 realizations of a Weibull random variable Tried to fit with a mixed Weibull and Gamma distribution function: F (x) = pf (Weibull) (x) + (1 p)f (Gamma) (x) After 400 generations: p = 0.998 accuracy 0.047 = Weibull is well suited 17
Comparing the Maximum-Likelihood Method with the Genetic Algorithm Method Genetic Algorithm Can be applied for the selection of a theoretical distribution Uniform for all theoretical distributions, only two variants: Closed form distribution function or numerically integrated density Many parameters no problem Hence multi-mode and mixed distributions straightforward vs. Maximum-Likelihood Method Other technique must be used (e.g. graphical) For each theoretical distribution different nonlinear equations Difficult Difficult Future work: Genetic algorithms for dependent data, stochastic processes. 18