Estimation-of-Distribution Algorithms. Discrete Domain.
|
|
- Berniece Day
- 5 years ago
- Views:
Transcription
1 Estimation-of-Distribution Algorithms. Discrete Domain. Petr Pošík Introduction to EDAs 2 Genetic Algorithms and Epistasis Genetic Algorithms GA vs EDA Content of the lectures Motivation Example 7 Example Selection, Modeling, Sampling UMDA Behaviour for OneMax problem What about a different fitness? UMDA behaviour on concatanated traps What can be done about traps? Good news! Discrete EDAs Discrete EDAs: Overview EDAs without interactions 7 EDAs without interactions Pairwise Interactions 9 From single bits to pairwise models Example with pairwise dependencies: dependency tree Example of dependency tree learning Dependency tree: probabilities EDAs with pairwise interactions Summary Multivariate Interactions 26 ECGA ECGA: Evaluation metric BOA BOA: Learning the structure Scalability Analysis 3 Test functions Test function (cont.) Scalability analysis OneMax Non-dec. Equal Pairs Decomp. Equal Pairs Non-dec. Sliding XOR Decomp. Sliding XOR Decomp. Trap Model structure during evolution Conclusions 42 Summary Suggestions for discrete EDAs
2 Introduction to EDAs 2 / 44 Genetic Algorithms and Epistasis From the lecture on epistasis: x best =..., f(x best ) = 40 GA works: no dependencies GA fails: deps. exist GA not able to work with them GA works again: deps. exist GA knows about them 4 Popsize60 4 Popsize60 4 Popsize f40xbitonemax f8xbittrap f8xbittrap best average generation best average generation best average generation Popsize60 Popsize60 Popsize xmean xmean xmean generation generation generation P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 3 / 44 Genetic Algorithms Algorithm : Genetic Algorithm begin 2 Initialize the population. 3 while termination criteria are not met do 4 Select parents from the population. Cross over the parents, create offspring. 6 Mutate offspring. 7 Incorporate offspring into the population. Conventional GA operators are not adaptive, and cannot (or ususally do not) discover and use the interactions among solution components. Select cross over mutate approach What does an intearction mean? we would like to create a new offspring by mutation we would like the offspring to have better, or at least the same, quality as the parent if we must modify x i together with x j to reach the desired goal (if it is not possible to improve the solution by modifying either x i or x j only), then x i interacts with x j. The goal of recombination operators: Intensify the search in areas which contained good individuals in previous iterations. Must be able to take the interactions into account. Why not directly describe the distribution of good individuals??? P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 4 / 44 2
3 GA vs EDA Algorithm : Genetic Algorithm begin 2 Initialize the population. 3 while termination criteria are not met do 4 Select parents from the population. Cross over the parents, create offspring. 6 Mutate offspring. 7 Incorporate offspring into the population. Select cross over mutate approach Algorithm 2: Estimation-of-Distribution Alg. begin 2 Initialize the population. 3 while termination criteria are not met do 4 Select parents from the population. Learn a model of their distribution. 6 Sample new individuals. 7 Incorporate offspring into the population. Select model sample approach Explicit probabilistic model: principled way of working with dependencies adaptation ability (different behavior in different stages of evolution) Names: EDA Estimation-of-Distribution Algorithm PMBGA Probabilistic Model-Building Genetic Algorithm IDEA Iterated Density Estimation Algorithm P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms / 44 Content of the lectures. EDA for discrete domains (e.g. binary) Motivation example Without interactions Pairwise interactions Higher order interactions 2. EDA for real domain (vectors of real numbers) Evolution strategies Histograms Gaussian distribution and its mixtures P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 6 / 44 3
4 Motivation Example 7 / 44 Example -bit OneMax (CountOnes) problem: f DxbitOneMax (x) = D d= x d Optimum:, fitness: Algorithm: Univariate Marginal Distribution Algorithm (UMDA) Population size: 6 Tournament selection: t = 2 Model: vector of probabilities p = (p,..., p D ) each p d is the probability of observingat dth element Model learning: compute p from selected individuals Model sampling: generateon dth position with probability p d (independently of other positions) P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 8 / 44 Selection, Modeling, Sampling Old population: 00 (3) 0000 () 0 (4) 0 (4) 0000 () 000 (2) Tournaments: 0 (4) vs. 0 (4) 0 (4) vs. 0 (4) 0 (4) vs () 000 (2) vs () 0000 () vs () 0000 () vs. 00 (3) Selected parents: 0 (4) 0 (4) 0 (4) 000 (2) 0000 () 00 (3) Offspring: 000 (2) 000 (2) 0 (4) 00 (3) 00 (3) 0 (4) Probability vector: P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 9 / 44 4
5 UMDA Behaviour for OneMax problem Probability vector entries Generation s are better then0s on average, selection increases the proportion ofs Recombination preserves and combiness, the ratio ofs increases over time If we have manys in population, we cannot miss the optimum The number of evaluations needed for reliable convergence: Algorithm UMDA behaves similarly to GA with uniform crossover! Nr. of evaluations UMDA O(D ln D) Hill-Climber O(D ln D) GA with uniform xover approx. O(D ln D) GA with -point xover a bit slower P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 0 / 44 What about a different fitness? For OneMax function: UMDA works well, all the bits probably eventually converge to the right value. Will UMDA be similarly successful for other fitness functions? Well, no. :-( Problem: Concatanated -bit traps f = f trap (x, x 2, x 3, x 4, x )+ + f trap (x 6, x 7, x 8, x 9, x 0 ) The trap function is defined as { if u(x) = f trap (x) = 4 u(x) otherwise where u(x) is the so called unity function and returns the number ofs in x (it is actually the One Max function). trap(x) Number of s in chromosome, u(x) P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms / 44
6 UMDA behaviour on concatanated traps Traps: Optimum in... But f trap (0 ) = 2 while f trap ( ) =.37 -dimensional probabilities lead the GA to the wrong way! Exponentially increasing population size is needed, otherwise GA will not find optimum reliably. 0.9 Probability vector entries Generation P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 2 / 44 What can be done about traps? The f trap function is deceptive: Statistics over**** and0**** do not lead us to the right solution The same holds for statistics over*** and00***,** and000**,* and0000* Harder than the needle-in-the-haystack problem: regular haystack simply does not provide any information, where to search for the needle f trap -haystack actively lies to you it points you to the wrong part of the haystack But: f trap (00000) < f trap (), will be better than00000 on average bit statistics should work for bit traps in the same way as bit statistics work for OneMax problem! Model learning: build model for each -tuple of bits compute p(00000), p(0000),..., p(), Model sampling: Each -tuple of bits is generated independently Generate with probability p(00000), 0000 with probability p(0000),... P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 3 / 44 6
7 Good news! Good statistics work great! Relative frequency of Generation What shall we do next? If we were able to find good statistics with a small overhead, and use them in the UMDA framework, we would be able to solve order-k separable problems using O(D 2 ) evaluations.... and there are many problems of this type. The problem solution is closely related to the so-called linkage learning, i.e. discovering and using statistical dependencies among variables. Algorithm Nr. of evaluations UMDA with bit BB O(D ln D) (WOW!) Hill-Climber O(D k ln D), k = GA with uniform xover approx. O(2 D ) GA with -point xover similar to unif. xover P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 4 / 44 Discrete EDAs / 44 Discrete EDAs: Overview. Overview: (a) Univariate models (without interactions) (b) Bivariate models (pairwise dependencies) (c) Multivariate models (higher order interactions) 2. Conclusions P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 6 / 44 7
8 EDAs without interactions 7 / 44 EDAs without interactions. Population-based incremental learning (PBIL) Baluja, Univariate marginal distribution algorithm (UMDA) Mühlenbein and Paaß, Compact genetic algorithm (cga) Harik, Lobo, Goldberg, 998 Similarities: all of them use a vector of probabilities Differences: PBIL and cga do not use population (only the vector p); UMDA does PBIL and cga use different rules for the adaptation of p Advantages: Simplicity Speed Simple simulation of large populations Limitations: Solves reliably only order- decomposable problems P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 8 / 44 EDAs with Pairwise Interactions 9 / 44 From single bits to pairwise models How to describe two positions together? Using the joint probability distribution: A Number of free parameters: 3 B p(a, B) B 0 A 0 p(0, 0) p(0, ) p(, 0) p(, ) Using statistical dependence: A Number of free parameters: 3 B p(a, B) = p(b A) p(a): p(b = A = 0) p(b = A = ) p(a = ) Question: what is the number of parameters in case of the following models? A B C A B C A B C P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 20 / 44 8
9 Example with pairwise dependencies: dependency tree Nodes: binary variables (loci of chromozome) Edges: dependencies among variables Features: Each node depends at most on other node Graph does not contain cycles Graph is connected Learning the structure of dependency tree:. Score the edges using mutual information: p(x, y) I(X, Y) = p(x, y) log x,y p(x)p(y) 2. Use any algorithm to determine the maximum spanning tree of the graph, e.g. Prim (97) (a) (b) Start building the tree from any node Add such a node that is connected to the tree by the edge with maximum score P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 2 / 44 Example of dependency tree learning x x x x x x x x x x 4 6 x 3 x 4 6 x 3 x 4 6 x 3 x x x x 7 3 x 2 x 7 3 x 2 x 7 x x 4 6 x 3 x 4 6 x 3 x 4 x 3 P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 22 / 44 9
10 Dependency tree: probabilities x x 7 x x 4 x 3 Probability Number of params p(x = ) p(x 4 = X ) 2 p(x = X 4 ) 2 p(x 2 = X 4 ) 2 p(x 3 = X 2 ) 2 Whole model 9 P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 23 / 44 EDAs with pairwise interactions. MIMIC (sequences) Mutual Information Maximization for Input Clustering de Bonet et al., COMIT (trees) Combining Optimizers with Mutual Information Trees Baluja and Davies, BMDA (forrest) Bivariate Marginal Distribution Algorithm Pelikan and Mühlenbein, 998 P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 24 / 44 0
11 Summary Advantages: Still simple Still fast Can learn something about the structure Limitations: Reliably solves only order-2 decomposable problems P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 2 / 44 EDAs with Multivariate Interactions 26 / 44 ECGA Extended Compact GA, Harik, 999 Marginal Product Model (MPM) Variables are treated in groups Variables in different groups are considered statistically independent Each group is modeled by its joint probability distribution The algorithm adaptively searches for the groups during evolution Learning the structure Problem. Evaluation metric: Minimum Description Length (MDL) 2. Search procedure: greedy (a) (b) (c) Start with each variable belonging to its own group Ideal group configuration OneMax [][2][3][4][][6][7][8][9][0] bittraps [ ][ ] Perform such a join of two groups which improves the score best Finish if no join improves the score P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 27 / 44
12 ECGA: Evaluation metric Minimum description length: Minimize the number of bits needed to store the model and the data encoded using the model DL(Model, Data) = DL Model + DL Data Model description length: Each group g has g dimensions, i.e. 2 g frequencies, each of them can take on values up to N DL Model = log N (2 g ) g G Data description length using the model: Defined using the entropy of marginal distributions (X g is g -dimensional random vector, x g is its realization): DL Data = N h(x g ) = N p(x g = x g ) log p(x g = x g ) g G g G x g P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 28 / 44 BOA Bayesian Optimization Algorithm: Pelikán, Goldberg, Cantù-Paz, 999 Bayesian network (BN) Conditional dependencies (instead groups) Sequence, tree, forrest special cases of BN For trap function: The same model used independently in Estimation of Bayesian Network Alg. (EBNA), Etxeberria et al., 999 Learning Factorized Density Alg. (LFDA), Mühlenbein et al., 999 P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 29 / 44 2
13 BOA: Learning the structure. Evaluation metric: Bayesian-Dirichlet metric, or Bayesian information criterion (BIC) 2. Search procedure: greedy (a) Start with graph with no edges (univariate marginal product model) (b) Perform one of the following operations, choose the one which improves the score best Add an edge Delete an edge Reverse an edge (c) Finish if no operation improves the score BOA solves order-k decomposable problems in less then O(D 2 ) evaluations! n evals = O(D. ) to O(D 2 ) P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 30 / 44 Scalability Analysis 3 / 44 Test functions One Max: Trap: f DxbitOneMax (x) = D x d d= { D if u(x) = D f DbitTrap (x) = D u(x) otherwise Equal Pairs: f DbitEqualPairs (x) = + D d=2 { if x = x f EqualPair (x d, x d ) f EqualPair (x, x 2 ) = 2 0 if x = x 2 Sliding XOR: f DbitSlidingXOR (x) = + f AllEqual (x)+ D + f XOR (x d 2, x d, x d ) d=3 f AllEqual (x) = if x = ( ) if x = (... ) 0 otherwise { if x x2 = x f XOR (x, x 2, x 3 ) = 3 0 otherwise Concatenated short basis functions: K f NxKbitBasisFunction = f BasisFunction (x K(k )+,..., x Kk ) k= P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 32 / 44 3
14 Test function (cont.). f 40xbitOneMax order- decomposable function, no interactions 2. f x40bitequalpairs non-decomposable function weak interactions: optimal setting of each bit depends on the value of the preceding bit 3. f 8xbitEqualPairs order- decomposable function 4. f x40bitslidingxor non-decomposable function stronger interactions: optimal setting of each bit depends on the value of the 2 preceding bits. f 8xbitSlidingXOR order- decomposable function 6. f 8xbitTrap order- decomposable function interactions in each -bit block are very strong, the basis function is deceptive P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 33 / 44 Scalability analysis Facts: using small population size, population-based optimizers can solve only easy problems increasing the population size, the optimizers can solve increasingly harder problems... but using a too big population is wasting of resources. Scalability analysis: determines the optimal (smallest) population size, with which the algorithm solves the given problem reliably reliably: algorithm finds the optimum in 24 out of 2 runs) for each problem complexity, the optimal population size is determined e.g. using the bisection method studies the influence of the problem complexity (dimensionality) on the optimal population size and on the number of needed evaluations P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 34 / 44 4
15 Scalability on the One Max function 0 4 UMDA COMIT ECGA BOA počet ohodnocení dim P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 3 / 44
16 Scalability on the non-decomposable Equal Pairs function UMDA COMIT ECGA BOA počet ohodnocení dim P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 36 / 44 6
17 Scalability on the decomposable Equal Pairs function UMDA COMIT ECGA BOA počet ohodnocení dim P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 37 / 44 7
18 Scalability on the non-decomposable Sliding XOR function UMDA COMIT ECGA BOA počet ohodnocení dim P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 38 / 44 8
19 Scalability on the decomposable Sliding XOR function 0 UMDA COMIT ECGA BOA počet ohodnocení dim P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 39 / 44 9
20 Scalability on the decomposable Trap function UMDA COMIT ECGA BOA počet ohodnocení dim P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 40 / 44 20
21 Model structure during evolution During the evolution, the model structure is increasingly precise and at the end of the evolution, the model structure describes the problem structure exactly. NO! That s not true! Why? In the beginning, the distribution patterns are not very discernible, models similar to uniform distributions are used. In the end, the population converges and contains many copies of the same individual (or a few individuals). No interactions among variables can be learned. Model structure is wrong (all bits independent), but the model describes the position of optimum very precisely. The model with the best matching structure is found somewhere in the middle of the evolution. Even though the right structure is never found during the evolution, the problem can be solved successfully. P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 4 / 44 Conclusions 42 / 44 Summary Models: Bayesian networks are general models of joint probability High-dimensional models are hard to train High-dimensional models are very flexible Advantages: Reliably solves problems decomposable to subproblems of bounded order Limitations: Does not solve problems decomposable to logarithmic subproblems (hierarchical problems) P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 43 / 44 2
22 Suggestions for discrete EDAs For simple problems: PBIL, UMDA, cga they behave similarly to simple GAs For harder problems: MIMIC, COMIT, BMDA they are able to account for bivariate dependencies For hard problems: BOA, ECGA, EBNA, LFDA they can take into account more general dependencies, problems with hierarchichal structures For even harder problems: hboa (hierarchical BOA) P. Pošík c 204 A0M33EOA: Evolutionary Optimization Algorithms 44 / 44 22
Probabilistic Model-Building Genetic Algorithms
Probabilistic Model-Building Genetic Algorithms Martin Pelikan Dept. of Math. and Computer Science University of Missouri at St. Louis St. Louis, Missouri pelikan@cs.umsl.edu Foreword! Motivation Genetic
More informationProbabilistic Model-Building Genetic Algorithms
Probabilistic Model-Building Genetic Algorithms a.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms Martin Pelikan Foreword Motivation Genetic and evolutionary computation
More informationState of the art in genetic algorithms' research
State of the art in genetic algorithms' Genetic Algorithms research Prabhas Chongstitvatana Department of computer engineering Chulalongkorn university A class of probabilistic search, inspired by natural
More informationProbabilistic Model-Building Genetic Algorithms
Probabilistic Model-Building Genetic Algorithms a.k.a. Estimation of Distribution Algorithms a.k.a. Iterated Density Estimation Algorithms Martin Pelikan Missouri Estimation of Distribution Algorithms
More informationSpurious Dependencies and EDA Scalability
Spurious Dependencies and EDA Scalability Elizabeth Radetic, Martin Pelikan MEDAL Report No. 2010002 January 2010 Abstract Numerous studies have shown that advanced estimation of distribution algorithms
More informationInteraction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm
Interaction-Detection Metric with Differential Mutual Complement for Dependency Structure Matrix Genetic Algorithm Kai-Chun Fan Jui-Ting Lee Tian-Li Yu Tsung-Yu Ho TEIL Technical Report No. 2010005 April,
More informationImproved Runtime Bounds for the Univariate Marginal Distribution Algorithm via Anti-Concentration
Improved Runtime Bounds for the Univariate Marginal Distribution Algorithm via Anti-Concentration Phan Trung Hai Nguyen June 26, 2017 School of Computer Science University of Birmingham Birmingham B15
More informationBehaviour of the UMDA c algorithm with truncation selection on monotone functions
Mannheim Business School Dept. of Logistics Technical Report 01/2005 Behaviour of the UMDA c algorithm with truncation selection on monotone functions Jörn Grahl, Stefan Minner, Franz Rothlauf Technical
More informationFrom Mating Pool Distributions to Model Overfitting
From Mating Pool Distributions to Model Overfitting Claudio F. Lima 1 Fernando G. Lobo 1 Martin Pelikan 2 1 Department of Electronics and Computer Science Engineering University of Algarve, Portugal 2
More informationAnalyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses Mark Hauschild, Martin Pelikan, Claudio F. Lima, and Kumara Sastry IlliGAL Report No. 2007008 January 2007 Illinois Genetic
More informationExpanding From Discrete To Continuous Estimation Of Distribution Algorithms: The IDEA
Expanding From Discrete To Continuous Estimation Of Distribution Algorithms: The IDEA Peter A.N. Bosman and Dirk Thierens Department of Computer Science, Utrecht University, P.O. Box 80.089, 3508 TB Utrecht,
More informationConvergence Time for Linkage Model Building in Estimation of Distribution Algorithms
Convergence Time for Linkage Model Building in Estimation of Distribution Algorithms Hau-Jiun Yang Tian-Li Yu TEIL Technical Report No. 2009003 January, 2009 Taiwan Evolutionary Intelligence Laboratory
More informationAnalyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses
Analyzing Probabilistic Models in Hierarchical BOA on Traps and Spin Glasses Mark Hauschild Missouri Estimation of Distribution Algorithms Laboratory (MEDAL) Dept. of Math and Computer Science, 320 CCB
More informationWITH. P. Larran aga. Computational Intelligence Group Artificial Intelligence Department Technical University of Madrid
Introduction EDAs Our Proposal Results Conclusions M ULTI - OBJECTIVE O PTIMIZATION WITH E STIMATION OF D ISTRIBUTION A LGORITHMS Pedro Larran aga Computational Intelligence Group Artificial Intelligence
More informationInitial-Population Bias in the Univariate Estimation of Distribution Algorithm
Initial-Population Bias in the Univariate Estimation of Distribution Algorithm Martin Pelikan and Kumara Sastry MEDAL Report No. 9 January 9 Abstract This paper analyzes the effects of an initial-population
More informationAnalysis of Epistasis Correlation on NK Landscapes with Nearest-Neighbor Interactions
Analysis of Epistasis Correlation on NK Landscapes with Nearest-Neighbor Interactions Martin Pelikan MEDAL Report No. 20102 February 2011 Abstract Epistasis correlation is a measure that estimates the
More informationc Copyright by Martin Pelikan, 2002
c Copyright by Martin Pelikan, 2002 BAYESIAN OPTIMIZATION ALGORITHM: FROM SINGLE LEVEL TO HIERARCHY BY MARTIN PELIKAN DIPL., Comenius University, 1998 THESIS Submitted in partial fulfillment of the requirements
More informationFinding Ground States of Sherrington-Kirkpatrick Spin Glasses with Hierarchical BOA and Genetic Algorithms
Finding Ground States of Sherrington-Kirkpatrick Spin Glasses with Hierarchical BOA and Genetic Algorithms Martin Pelikan, Helmut G. Katzgraber and Sigismund Kobe MEDAL Report No. 284 January 28 Abstract
More informationHierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses
Hierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses Martin Pelikan 1, Alexander K. Hartmann 2, and Kumara Sastry 1 Dept. of Math and Computer Science, 320 CCB University of Missouri at
More informationAnalysis of Epistasis Correlation on NK Landscapes. Landscapes with Nearest Neighbor Interactions
Analysis of Epistasis Correlation on NK Landscapes with Nearest Neighbor Interactions Missouri Estimation of Distribution Algorithms Laboratory (MEDAL University of Missouri, St. Louis, MO http://medal.cs.umsl.edu/
More informationHierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses
Hierarchical BOA, Cluster Exact Approximation, and Ising Spin Glasses Martin Pelikan and Alexander K. Hartmann MEDAL Report No. 2006002 March 2006 Abstract This paper analyzes the hierarchical Bayesian
More informationFrom Mating Pool Distributions to Model Overfitting
From Mating Pool Distributions to Model Overfitting Claudio F. Lima DEEI-FCT University of Algarve Campus de Gambelas 85-39, Faro, Portugal clima.research@gmail.com Fernando G. Lobo DEEI-FCT University
More informationCSC 4510 Machine Learning
10: Gene(c Algorithms CSC 4510 Machine Learning Dr. Mary Angela Papalaskari Department of CompuBng Sciences Villanova University Course website: www.csc.villanova.edu/~map/4510/ Slides of this presenta(on
More informationHierarchical BOA Solves Ising Spin Glasses and MAXSAT
Hierarchical BOA Solves Ising Spin Glasses and MAXSAT Martin Pelikan 1,2 and David E. Goldberg 2 1 Computational Laboratory (Colab) Swiss Federal Institute of Technology (ETH) Hirschengraben 84 8092 Zürich,
More informationLecture 9 Evolutionary Computation: Genetic algorithms
Lecture 9 Evolutionary Computation: Genetic algorithms Introduction, or can evolution be intelligent? Simulation of natural evolution Genetic algorithms Case study: maintenance scheduling with genetic
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 3 Linear
More informationLinkage Identification Based on Epistasis Measures to Realize Efficient Genetic Algorithms
Linkage Identification Based on Epistasis Measures to Realize Efficient Genetic Algorithms Masaharu Munetomo Center for Information and Multimedia Studies, Hokkaido University, North 11, West 5, Sapporo
More informationAn Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis
An Information Geometry Perspective on Estimation of Distribution Algorithms: Boundary Analysis Luigi Malagò Department of Electronics and Information Politecnico di Milano Via Ponzio, 34/5 20133 Milan,
More informationA stochastic global optimization method for (urban?) multi-parameter systems. Rodolphe Le Riche CNRS and Ecole des Mines de St-Etienne
A stochastic global optimization method for (urban?) multi-parameter systems Rodolphe Le Riche CNRS and Ecole des Mines de St-Etienne Purpose of this presentation Discussion object : cities as an entity
More informationCompact representations as a search strategy: Compression EDAs
Compact representations as a search strategy: Compression EDAs Marc Toussaint Institute for Adaptive and Neural Computation, University of Edinburgh, 5 Forrest Hill, Edinburgh EH1 2QL, Scotland, UK Abstract
More informationBiitlli Biointelligence Laboratory Lb School of Computer Science and Engineering. Seoul National University
Monte Carlo Samling Chater 4 2009 Course on Probabilistic Grahical Models Artificial Neural Networs, Studies in Artificial i lintelligence and Cognitive i Process Biitlli Biointelligence Laboratory Lb
More informationPerformance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and Tunable Overlap
Performance of Evolutionary Algorithms on NK Landscapes with Nearest Neighbor Interactions and Tunable Overlap Martin Pelikan, Kumara Sastry, David E. Goldberg, Martin V. Butz, and Mark Hauschild Missouri
More informationA0M33EOA: EAs for Real-Parameter Optimization. Differential Evolution. CMA-ES.
A0M33EOA: EAs for Real-Parameter Optimization. Differential Evolution. CMA-ES. Petr Pošík Czech Technical University in Prague Faculty of Electrical Engineering Department of Cybernetics Many parts adapted
More informationGene Pool Recombination in Genetic Algorithms
Gene Pool Recombination in Genetic Algorithms Heinz Mühlenbein GMD 53754 St. Augustin Germany muehlenbein@gmd.de Hans-Michael Voigt T.U. Berlin 13355 Berlin Germany voigt@fb10.tu-berlin.de Abstract: A
More informationReadings: K&F: 16.3, 16.4, Graphical Models Carlos Guestrin Carnegie Mellon University October 6 th, 2008
Readings: K&F: 16.3, 16.4, 17.3 Bayesian Param. Learning Bayesian Structure Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University October 6 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationThe Genetic Algorithm is Useful to Fitting Input Probability Distributions for Simulation Models
The Genetic Algorithm is Useful to Fitting Input Probability Distributions for Simulation Models Johann Christoph Strelen Rheinische Friedrich Wilhelms Universität Bonn Römerstr. 164, 53117 Bonn, Germany
More informationNew Epistasis Measures for Detecting Independently Optimizable Partitions of Variables
New Epistasis Measures for Detecting Independently Optimizable Partitions of Variables Dong-Il Seo, Sung-Soon Choi, and Byung-Ro Moon School of Computer Science & Engineering, Seoul National University
More information6.867 Machine learning, lecture 23 (Jaakkola)
Lecture topics: Markov Random Fields Probabilistic inference Markov Random Fields We will briefly go over undirected graphical models or Markov Random Fields (MRFs) as they will be needed in the context
More informationCS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs
CS242: Probabilistic Graphical Models Lecture 4B: Learning Tree-Structured and Directed Graphs Professor Erik Sudderth Brown University Computer Science October 6, 2016 Some figures and materials courtesy
More informationApplication of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data
Application of a GA/Bayesian Filter-Wrapper Feature Selection Method to Classification of Clinical Depression from Speech Data Juan Torres 1, Ashraf Saad 2, Elliot Moore 1 1 School of Electrical and Computer
More informationThe Estimation of Distributions and the Minimum Relative Entropy Principle
The Estimation of Distributions and the Heinz Mühlenbein heinz.muehlenbein@ais.fraunhofer.de Fraunhofer Institute for Autonomous Intelligent Systems 53754 Sankt Augustin, Germany Robin Höns robin.hoens@ais.fraunhofer.de
More informationInternational Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company
International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems c World Scientific Publishing Company UNSUPERVISED LEARNING OF BAYESIAN NETWORKS VIA ESTIMATION OF DISTRIBUTION ALGORITHMS: AN
More informationCrossover Techniques in GAs
Crossover Techniques in GAs Debasis Samanta Indian Institute of Technology Kharagpur dsamanta@iitkgp.ac.in 16.03.2018 Debasis Samanta (IIT Kharagpur) Soft Computing Applications 16.03.2018 1 / 1 Important
More information{ p if x = 1 1 p if x = 0
Discrete random variables Probability mass function Given a discrete random variable X taking values in X = {v 1,..., v m }, its probability mass function P : X [0, 1] is defined as: P (v i ) = Pr[X =
More informationLecture 4 Noisy Channel Coding
Lecture 4 Noisy Channel Coding I-Hsiang Wang Department of Electrical Engineering National Taiwan University ihwang@ntu.edu.tw October 9, 2015 1 / 56 I-Hsiang Wang IT Lecture 4 The Channel Coding Problem
More informationSchema Theory. David White. Wesleyan University. November 30, 2009
November 30, 2009 Building Block Hypothesis Recall Royal Roads problem from September 21 class Definition Building blocks are short groups of alleles that tend to endow chromosomes with higher fitness
More informationLearning Bayesian belief networks
Lecture 4 Learning Bayesian belief networks Milos Hauskrecht milos@cs.pitt.edu 5329 Sennott Square Administration Midterm: Monday, March 7, 2003 In class Closed book Material covered by Wednesday, March
More informationStructure Learning: the good, the bad, the ugly
Readings: K&F: 15.1, 15.2, 15.3, 15.4, 15.5 Structure Learning: the good, the bad, the ugly Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 29 th, 2006 1 Understanding the uniform
More informationabout the problem can be used in order to enhance the estimation and subsequently improve convergence. New strings are generated according to the join
Bayesian Optimization Algorithm, Population Sizing, and Time to Convergence Martin Pelikan Dept. of General Engineering and Dept. of Computer Science University of Illinois, Urbana, IL 61801 pelikan@illigal.ge.uiuc.edu
More informationGaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress
Gaussian EDA and Truncation Selection: Setting Limits for Sustainable Progress Petr Pošík Czech Technical University, Faculty of Electrical Engineering, Department of Cybernetics Technická, 66 7 Prague
More informationCompressing Tabular Data via Pairwise Dependencies
Compressing Tabular Data via Pairwise Dependencies Amir Ingber, Yahoo! Research TCE Conference, June 22, 2017 Joint work with Dmitri Pavlichin, Tsachy Weissman (Stanford) Huge datasets: everywhere - Internet
More informationWhen Is an Estimation of Distribution Algorithm Better than an Evolutionary Algorithm?
When Is an Estimation of Distribution Algorithm Better than an Evolutionary Algorithm? Tianshi Chen, Per Kristian Lehre, Ke Tang, and Xin Yao Abstract Despite the wide-spread popularity of estimation of
More informationFinal Exam, Machine Learning, Spring 2009
Name: Andrew ID: Final Exam, 10701 Machine Learning, Spring 2009 - The exam is open-book, open-notes, no electronics other than calculators. - The maximum possible score on this exam is 100. You have 3
More informationFeature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with
Feature selection c Victor Kitov v.v.kitov@yandex.ru Summer school on Machine Learning in High Energy Physics in partnership with August 2015 1/38 Feature selection Feature selection is a process of selecting
More informationLecture 6: Graphical Models: Learning
Lecture 6: Graphical Models: Learning 4F13: Machine Learning Zoubin Ghahramani and Carl Edward Rasmussen Department of Engineering, University of Cambridge February 3rd, 2010 Ghahramani & Rasmussen (CUED)
More informationMatt Heavner CSE710 Fall 2009
Matt Heavner mheavner@buffalo.edu CSE710 Fall 2009 Problem Statement: Given a set of cities and corresponding locations, what is the shortest closed circuit that visits all cities without loops?? Fitness
More informationBayesian Networks to design optimal experiments. Davide De March
Bayesian Networks to design optimal experiments Davide De March davidedemarch@gmail.com 1 Outline evolutionary experimental design in high-dimensional space and costly experimentation the microwell mixture
More informationUpper Bounds on the Time and Space Complexity of Optimizing Additively Separable Functions
Upper Bounds on the Time and Space Complexity of Optimizing Additively Separable Functions Matthew J. Streeter Computer Science Department and Center for the Neural Basis of Cognition Carnegie Mellon University
More informationBayesian Networks. Motivation
Bayesian Networks Computer Sciences 760 Spring 2014 http://pages.cs.wisc.edu/~dpage/cs760/ Motivation Assume we have five Boolean variables,,,, The joint probability is,,,, How many state configurations
More informationoptimization problem x opt = argmax f(x). Definition: Let p(x;t) denote the probability of x in the P population at generation t. Then p i (x i ;t) =
Wright's Equation and Evolutionary Computation H. Mühlenbein Th. Mahnig RWCP 1 Theoretical Foundation GMD 2 Laboratory D-53754 Sankt Augustin Germany muehlenbein@gmd.de http://ais.gmd.de/οmuehlen/ Abstract:
More informationhow should the GA proceed?
how should the GA proceed? string fitness 10111 10 01000 5 11010 3 00011 20 which new string would be better than any of the above? (the GA does not know the mapping between strings and fitness values!)
More informationComputational statistics
Computational statistics Combinatorial optimization Thierry Denœux February 2017 Thierry Denœux Computational statistics February 2017 1 / 37 Combinatorial optimization Assume we seek the maximum of f
More informationEvolutionary Functional Link Interval Type-2 Fuzzy Neural System for Exchange Rate Prediction
Evolutionary Functional Link Interval Type-2 Fuzzy Neural System for Exchange Rate Prediction 3. Introduction Currency exchange rate is an important element in international finance. It is one of the chaotic,
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 5 Bayesian Learning of Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Recitations: Every Tuesday 4-5:30 in 243 Annenberg Homework 1 out. Due in class
More information2-bit Flip Mutation Elementary Fitness Landscapes
RN/10/04 Research 15 September 2010 Note 2-bit Flip Mutation Elementary Fitness Landscapes Presented at Dagstuhl Seminar 10361, Theory of Evolutionary Algorithms, 8 September 2010 Fax: +44 (0)171 387 1397
More informationProbabilistic Graphical Models (I)
Probabilistic Graphical Models (I) Hongxin Zhang zhx@cad.zju.edu.cn State Key Lab of CAD&CG, ZJU 2015-03-31 Probabilistic Graphical Models Modeling many real-world problems => a large number of random
More informationMachine Learning Summer School
Machine Learning Summer School Lecture 3: Learning parameters and structure Zoubin Ghahramani zoubin@eng.cam.ac.uk http://learning.eng.cam.ac.uk/zoubin/ Department of Engineering University of Cambridge,
More informationPATTERN CLASSIFICATION
PATTERN CLASSIFICATION Second Edition Richard O. Duda Peter E. Hart David G. Stork A Wiley-lnterscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim Brisbane Singapore Toronto CONTENTS
More informationCSC 411 Lecture 3: Decision Trees
CSC 411 Lecture 3: Decision Trees Roger Grosse, Amir-massoud Farahmand, and Juan Carrasquilla University of Toronto UofT CSC 411: 03-Decision Trees 1 / 33 Today Decision Trees Simple but powerful learning
More informationClustering using Mixture Models
Clustering using Mixture Models The full posterior of the Gaussian Mixture Model is p(x, Z, µ,, ) =p(x Z, µ, )p(z )p( )p(µ, ) data likelihood (Gaussian) correspondence prob. (Multinomial) mixture prior
More informationIntroduction to Optimization
Introduction to Optimization Blackbox Optimization Marc Toussaint U Stuttgart Blackbox Optimization The term is not really well defined I use it to express that only f(x) can be evaluated f(x) or 2 f(x)
More informationBayesian Networks: Construction, Inference, Learning and Causal Interpretation. Volker Tresp Summer 2014
Bayesian Networks: Construction, Inference, Learning and Causal Interpretation Volker Tresp Summer 2014 1 Introduction So far we were mostly concerned with supervised learning: we predicted one or several
More informationLearning Bayes Net Structures
Learning Bayes Net Structures KF, Chapter 15 15.5 (RN, Chapter 20) Some material taken from C Guesterin (CMU), K Murphy (UBC) 1 2 Learning Bayes Nets Known Structure Unknown Data Complete Missing Easy
More informationFinding Ground States of SK Spin Glasses with hboa and GAs
Finding Ground States of Sherrington-Kirkpatrick Spin Glasses with hboa and GAs Martin Pelikan, Helmut G. Katzgraber, & Sigismund Kobe Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)
More informationLearning P-maps Param. Learning
Readings: K&F: 3.3, 3.4, 16.1, 16.2, 16.3, 16.4 Learning P-maps Param. Learning Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 24 th, 2008 10-708 Carlos Guestrin 2006-2008
More informationMixtures of Gaussians. Sargur Srihari
Mixtures of Gaussians Sargur srihari@cedar.buffalo.edu 1 9. Mixture Models and EM 0. Mixture Models Overview 1. K-Means Clustering 2. Mixtures of Gaussians 3. An Alternative View of EM 4. The EM Algorithm
More informationAn Evolutionary Programming Based Algorithm for HMM training
An Evolutionary Programming Based Algorithm for HMM training Ewa Figielska,Wlodzimierz Kasprzak Institute of Control and Computation Engineering, Warsaw University of Technology ul. Nowowiejska 15/19,
More informationBayesian Networks Structure Learning (cont.)
Koller & Friedman Chapters (handed out): Chapter 11 (short) Chapter 1: 1.1, 1., 1.3 (covered in the beginning of semester) 1.4 (Learning parameters for BNs) Chapter 13: 13.1, 13.3.1, 13.4.1, 13.4.3 (basic
More informationAn instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if. 2 l i. i=1
Kraft s inequality An instantaneous code (prefix code, tree code) with the codeword lengths l 1,..., l N exists if and only if N 2 l i 1 Proof: Suppose that we have a tree code. Let l max = max{l 1,...,
More informationDiscrete stochastic optimization with continuous auxiliary variables. Rodolphe Le Riche*, Alexis Lasseigne**, François-Xavier Irisarri**
Discrete stochastic optimization with continuous auxiliary variables Rodolphe Le Riche*, Alexis Lasseigne**, François-Xavier Irisarri** *CNRS and Ecole des Mines de St-Etienne, Fr. ** ONERA, Composite
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.utstat.utoronto.ca/~rsalakhu/ Sidney Smith Hall, Room 6002 Lecture 11 Project
More informationLearning Bayesian Networks Does Not Have to Be NP-Hard
Learning Bayesian Networks Does Not Have to Be NP-Hard Norbert Dojer Institute of Informatics, Warsaw University, Banacha, 0-097 Warszawa, Poland dojer@mimuw.edu.pl Abstract. We propose an algorithm for
More informationComputer Vision Group Prof. Daniel Cremers. 14. Clustering
Group Prof. Daniel Cremers 14. Clustering Motivation Supervised learning is good for interaction with humans, but labels from a supervisor are hard to obtain Clustering is unsupervised learning, i.e. it
More informationBayesian Optimization Algorithm, Population Sizing, and Time to Convergence
Preprint UCRL-JC-137172 Bayesian Optimization Algorithm, Population Sizing, and Time to Convergence M. Pelikan, D. E. Goldberg and E. Cantu-Paz This article was submitted to Genetic and Evolutionary Computation
More informationSeries 7, May 22, 2018 (EM Convergence)
Exercises Introduction to Machine Learning SS 2018 Series 7, May 22, 2018 (EM Convergence) Institute for Machine Learning Dept. of Computer Science, ETH Zürich Prof. Dr. Andreas Krause Web: https://las.inf.ethz.ch/teaching/introml-s18
More informationLearning Objectives. c D. Poole and A. Mackworth 2010 Artificial Intelligence, Lecture 7.2, Page 1
Learning Objectives At the end of the class you should be able to: identify a supervised learning problem characterize how the prediction is a function of the error measure avoid mixing the training and
More informationStructure Design of Neural Networks Using Genetic Algorithms
Structure Design of Neural Networks Using Genetic Algorithms Satoshi Mizuta Takashi Sato Demelo Lao Masami Ikeda Toshio Shimizu Department of Electronic and Information System Engineering, Faculty of Science
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 4 Learning Bayesian Networks CS/CNS/EE 155 Andreas Krause Announcements Another TA: Hongchao Zhou Please fill out the questionnaire about recitations Homework 1 out.
More informationCS540 ANSWER SHEET
CS540 ANSWER SHEET Name Email 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18. 19. 20. 1 2 Final Examination CS540-1: Introduction to Artificial Intelligence Fall 2016 20 questions, 5 points
More informationParametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a
Parametric Unsupervised Learning Expectation Maximization (EM) Lecture 20.a Some slides are due to Christopher Bishop Limitations of K-means Hard assignments of data points to clusters small shift of a
More informationECE 521. Lecture 11 (not on midterm material) 13 February K-means clustering, Dimensionality reduction
ECE 521 Lecture 11 (not on midterm material) 13 February 2017 K-means clustering, Dimensionality reduction With thanks to Ruslan Salakhutdinov for an earlier version of the slides Overview K-means clustering
More informationExploration of population fixed-points versus mutation rates for functions of unitation
Exploration of population fixed-points versus mutation rates for functions of unitation J Neal Richter 1, Alden Wright 2, John Paxton 1 1 Computer Science Department, Montana State University, 357 EPS,
More informationLecture 4 October 18th
Directed and undirected graphical models Fall 2017 Lecture 4 October 18th Lecturer: Guillaume Obozinski Scribe: In this lecture, we will assume that all random variables are discrete, to keep notations
More informationLearning Bayesian Networks (part 1) Goals for the lecture
Learning Bayesian Networks (part 1) Mark Craven and David Page Computer Scices 760 Spring 2018 www.biostat.wisc.edu/~craven/cs760/ Some ohe slides in these lectures have been adapted/borrowed from materials
More informationProbabilistic Graphical Models
Probabilistic Graphical Models Lecture 11 CRFs, Exponential Family CS/CNS/EE 155 Andreas Krause Announcements Homework 2 due today Project milestones due next Monday (Nov 9) About half the work should
More informationBN Semantics 3 Now it s personal! Parameter Learning 1
Readings: K&F: 3.4, 14.1, 14.2 BN Semantics 3 Now it s personal! Parameter Learning 1 Graphical Models 10708 Carlos Guestrin Carnegie Mellon University September 22 nd, 2006 1 Building BNs from independence
More informationScalability of Selectorecombinative Genetic Algorithms for Problems with Tight Linkage
Scalability of Selectorecombinative Genetic Algorithms for Problems with Tight Linkage Kumara Sastry 1,2 and David E. Goldberg 1,3 1 Illinois Genetic Algorithms Laboratory (IlliGAL) 2 Department of Material
More informationRepresentation. Stefano Ermon, Aditya Grover. Stanford University. Lecture 2
Representation Stefano Ermon, Aditya Grover Stanford University Lecture 2 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 2 1 / 32 Learning a generative model We are given a training
More informationKoza s Algorithm. Choose a set of possible functions and terminals for the program.
Step 1 Koza s Algorithm Choose a set of possible functions and terminals for the program. You don t know ahead of time which functions and terminals will be needed. User needs to make intelligent choices
More informationNaïve Bayes classification
Naïve Bayes classification 1 Probability theory Random variable: a variable whose possible values are numerical outcomes of a random phenomenon. Examples: A person s height, the outcome of a coin toss
More information