From Mating Pool Distributions to Model Overfitting

Size: px

Start display at page:

Download "From Mating Pool Distributions to Model Overfitting"

Bernadette Newman
6 years ago
Views:

1 From Mating Pool Distributions to Model Overfitting Claudio F. Lima 1 Fernando G. Lobo 1 Martin Pelikan 2 1 Department of Electronics and Computer Science Engineering University of Algarve, Portugal 2 Missouri Estimation of Distribution Algorithm Laboratory University of Missouri at St. Louis, USA GECCO 2008, Atlanta, USA

2 Probabilistic Modeling in EDAs Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup The core EDA is defined by its model complexity (expressiveness). The spectrum Simpler EDAs only learn the parameters of the model. More powerful EDAs learn both structure and parameters. Bayesian EDAs Solve a broad class of problems efficiently. But oftentimes models do not exactly reflect the underlying problem structure.

3 Motivation Introduction Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Importance of model accuracy Model-based efficiency enhancement techniques. Substructural local search. Fitness estimation (surrogate fitness function). Offline interpretation of probabilistic models. Bias structural learning in EDAs (Mark Hauschild s talk). Design structure-aware variation operators. Empirical findings with BOA: Tournament vs. truncation Population size requirements. Model quality difference.

4 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Bayesian Optimization Algorithm (BOA) BOA EDA which uses Bayesian networks (BNs). Model selected solutions to generate new ones. Similar algorithms: Estimation of Bayesian networks algorithm (EBNA). Learning factorized distribution algorithm (LFDA). Learning BNs Decision trees express conditional probabilities. Greedy algorithm for learning. Scoring metrics considered: K2 and BIC.

5 Experimental Setup Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Test problem To investigate MSA we use a problem of known structure. Clear identification of: Necessary dependencies successful tractability. Unnecessary dependencies poor model interpretability. m k trap problem Consists of m concatenated k-bit trap functions. For k 3, the model should be able to maintain k order statistics, otherwise exponential scalability. k = 5 used in experiments.

Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Ideal Model for the m k trap problem Joint distribution in BNs p(x a, X b, X c, X d ) =

6 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Ideal Model for the m k trap problem Joint distribution in BNs p(x a, X b, X c, X d ) = p(x a )p(x b X a )p(x c X a, X b )p(x d X a, X b, X c ) Clique between variables of the same trap function. While between different traps there are no dependencies.

7 Motivation BOA: The Bayesian Optimization Algorithm Experimental Setup Measuring Model Structural Accuracy Definition 1 The model structural accuracy (MSA) is defined as the ratio of correct edges over the total number of edges in the BN. Definition 2 An edge is correct if it connects two variables that are linked according to the objective function definition. Definition 3 Model overfitting is defined as the inclusion of incorrect (or unnecessary) edges to the BN.

8 Selection Methods Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Selection methods considered Tournament selection. Randomly pick s individuals and choose the best. Repeat n times. Truncation selection. Choose the best τ proportion of the population. Equivalent selection intensity Selection intensity, I Tournament size, s Truncation threshold, τ(%)

9 Selection-Metric Comparison Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Model Structural Accuracy, MSA Tournament w/ K2 Tournament w/ BIC Truncation w/ K2 Truncation w/ BIC Number of function evaluations, n fe x Tournament w/ K2 Tournament w/ BIC Truncation w/ K2 Truncation w/ BIC Selection intensity, I Selection intensity, I Observations 1 Truncation performs better than tournament selection. 2 K2 metric performs better than BIC metric.

10 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator A Simple Analysis of Tournament Selection Probability of an individual with rank i to win a tournament p i = s 1 j=1 Number of copies in the mating pool i j, for s 2. (1) n j c i = s p i. (2) Which can be approximated by a power distribution with p.d.f. f (x) = α x α 1, 0 < x 1, α = s, x = i/n. (3)

11 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator A Simple Analysis of Truncation Selection Number of copies in the mating pool { 0, if i < n (1 (τ/100)) c i = 1, otherwise. (4)

12 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Distribution of the Expected Number of Copies Expected number of copies, c i s = 2 s = 3 s = 4 s = 5 s = 7 s = 10 s = 20 Expected number of copies, c i 1 τ = 66% τ = 47% τ = 36% τ = 30% τ = 22% τ = 15% τ = 8% Rank, i (in percentile) Rank, i (in percentile) Tournament and truncation selection differ in: 1 Window size. 2 Distribution shape.

13 Mating Pool Distribution: An Example Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Function f mk trap = f 1 (X 1 X 2 X 3 ) + f 2 (X 4 X 5 X 6 ) + f 3 (X 7 X 8 X 9 ) Example Rank Individual Copies Mating pool distributions: Uniform Linear Exponential X 6 X 7 freq. X 6 X 7 freq. X 6 X 7 freq

14 Influence of Selection in Model Accuracy Selection: The Mating-Pool Distribution Generator Mating Pool w/ Non-Uniform Distribution What s good about it? Detection of existent patterns is achieved with less data. Tournament selection requires smaller population sizes than truncation. What s bad about it? Irrelevant features can also be detected as patterns. Excessive complexity model overfitting. So what? Recognize that we are learning from imbalanced data sets. Change scoring metric accordingly!

15 Modeling Metric Gain When Overfitting Adaptive Scoring Metric for Tournament Agenda 1 Compute how the power-law-based frequencies differ from uniform ones in the mating pool. 2 Use computed frequency deviation to estimate scoring metric gain due to overfitting. 3 Change the complexity penalty to counterbalance the misleading metric gain.

16 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Deviation of power-law-based frequencies Deviation is linear w.r.t. tournament size Expected market share of top-ranked individuals after selection is given by the (right-side area of the) c.d.f. of the power distribution. F(x) = x s, 0 < x < 1, s 2. (5) In the worst case, market share grows linearly with tournament size. Thus, misleading frequencies deviate linearly from the true ones (uniform distribution). More details in the paper.

17 Computing Metric Gain Modeling Metric Gain When Overfitting Adaptive Scoring Metric Consider the decision of adding an edge from X 2 to X 1 Metric gain must be calculated: G metric = ScoreAfter ScoreBefore ComplexityPenalty Uniform mating pool reveals p 00 = p 01 = p 10 = p 11 = 0.25 G metric < 0. Simulate tournament mating pool by assigning p (s 1), p (s 1), p (s 1), p (s 1). Where is related to the proportion of top-individuals which lead to overfitting (more info in the paper).

18 Computing Metric Gain (2) Modeling Metric Gain When Overfitting Adaptive Scoring Metric Approximated metric gain G 2 (0.25 (s 1)) log 2 (0.25 (s 1)) + 1. (6) Metric gain, G BIC Approximation O(s) O(s 2 ) Tournament size, s

19 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Comparison of Different Penalty Corrections Complexity penalty for the K2 metric p(b) = 2 0.5cs log 2 (n) P l i=1 L i c s = {1, s, s, and s log 2 (s)} Model Structural Accuracy, MSA c =1 s c s =sqrt(s) c s =s c s =s*log 2 (s) Number of function evaluations, n fe 5 x c =1 s c s =sqrt(s) c s =s c s =s*log 2 (s) Tournament size, s Tournament size, s

20 Results for the s penalty (c s = s) Modeling Metric Gain When Overfitting Adaptive Scoring Metric 14 x 105 Model Structural Accuracy, MSA l = 40 l = 60 l = 80 l = 100 Number of function evaluations, n fe l = 40 l = 60 l = 80 l = Tournament size, s Tournament size, s Remark Model structural accuracy is superior to 99%!

21 Modeling Metric Gain When Overfitting Adaptive Scoring Metric Results for Truncation with the Standard Penalty Model Structural Accuracy, MSA l = 40 l = 60 l = 80 l = 100 Number of function evaluations, n fe 2 x l = 40 l = 60 l = 80 l = Truncation threshold, τ Truncation threshold, τ Remark Good results, but not as good as for tournament selection with the s penalty.

22 Summary & Conclusions Modeling Metric Gain When Overfitting Adaptive Scoring Metric Selection addressed as a source of overfitting. Influence of selection method in BN learning: Truncation selection uniform mating pool is more appropriate for learning (standard scoring metrics). Tournament selection non-uniform mating pool leads to model overfitting. Scoring metric adapted to the power distribution of the mating pool generated by tournament selection. Model accuracy has been considerably improved! Improved model interpretability to assist efficiency enhancement techiniques and human researchers.

23 Modeling Metric Gain When Overfitting Adaptive Scoring Metric From Mating Pool Distributions to Model Overfitting Claudio F. Lima 1 Fernando G. Lobo 1 Martin Pelikan 2 1 Department of Electronics and Computer Science Engineering University of Algarve, Portugal 2 Missouri Estimation of Distribution Algorithm Laboratory University of Missouri at St. Louis, USA GECCO 2008, Atlanta, USA

24 Scoring Metrics Introduction Modeling Metric Gain When Overfitting Adaptive Scoring Metric (Bayesian-Dirichlet) K2 metric K 2(B) = p(b) l 1 Γ(m i (x i, l) + 1), (7) Γ(m i (l) + 2) l L i x i i=1 where p(b) = log 2 (n) P l i=1 L i. Bayesian information criteria (BIC) metric l BIC(B) = i=1 l Li x i ( ) m i (x i, l) m i (x i, l) log 2 m i (l) L i log 2 (n). 2 (8)

From Mating Pool Distributions to Model Overfitting

From Mating Pool Distributions to Model Overfitting Claudio F. Lima DEEI-FCT University of Algarve Campus de Gambelas 85-39, Faro, Portugal clima.research@gmail.com Fernando G. Lobo DEEI-FCT University