Computational Intelligence Seminar (8 November, 2005, Waseda University, Tokyo, Japan) Statistical Physics Approaches to Genetic Algorithms Joe Suzuki

Similar documents
The Estimation of Distributions and the Minimum Relative Entropy Principle

Variational algorithms for marginal MAP

Probabilistic and Bayesian Machine Learning

Lecture 18 Generalized Belief Propagation and Free Energy Approximations

Bayesian Machine Learning - Lecture 7

Lecture 6: Graphical Models: Learning

The Generalized Distributive Law and Free Energy Minimization

bound on the likelihood through the use of a simpler variational approximating distribution. A lower bound is particularly useful since maximization o

Junction Tree, BP and Variational Methods

Machine Learning Summer School

The Factorized Distribution Algorithm and the Minimum Relative Entropy Principle

Fractional Belief Propagation

Information Geometric view of Belief Propagation

Bayesian networks approximation

EECS 545 Project Report: Query Learning for Multiple Object Identification

UNDERSTANDING BELIEF PROPOGATION AND ITS GENERALIZATIONS

Integrating Correlated Bayesian Networks Using Maximum Entropy

LOWELL JOURNAL. DEBS IS DOOMED. Presldrtit Cleveland Write* to the New York Democratic Rilltors. friends were present at the banquet of

14 : Theory of Variational Inference: Inner and Outer Approximation

On the Convergence of the Convex Relaxation Method and Distributed Optimization of Bethe Free Energy

Learning unbelievable marginal probabilities

Expectation Propagation Algorithm

Data Fusion Algorithms for Collaborative Robotic Exploration

Scalable robust hypothesis tests using graphical models

Message Passing and Junction Tree Algorithms. Kayhan Batmanghelich

Variational Inference. Sargur Srihari

Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Growth Rate of Spatially Coupled LDPC codes

Probabilistic Graphical Models

Probabilistic Graphical Models

CSC2535: Computation in Neural Networks Lecture 7: Variational Bayesian Learning & Model Selection

On the convergence rates of genetic algorithms

Probabilistic Graphical Models

Structure Learning in Markov Random Fields

Feature selection. c Victor Kitov August Summer school on Machine Learning in High Energy Physics in partnership with

A Gentle Introduction to Gradient Boosting. Cheng Li College of Computer and Information Science Northeastern University

13: Variational inference II

Does Better Inference mean Better Learning?

13 : Variational Inference: Loopy Belief Propagation

CS Lecture 8 & 9. Lagrange Multipliers & Varitional Bounds

The sequential decoding metric for detection in sensor networks

necessita d'interrogare il cielo

Biology as Information Dynamics

CUADERNOS DE TRABAJO ESCUELA UNIVERSITARIA DE ESTADÍSTICA

P (x1) x 1 P (x2jx1) x 2 P (x3jx2) x 3 P (x4jx3) x 4 o ff1 ff1 ff2 ff2 ff3 ff3 ff4 x 1 o x 2 o x 3 o x 4 fi1 fi1 fi2 fi2 fi3 fi3 fi4 P (y1jx1) P (y2jx

A Combined LP and QP Relaxation for MAP

Optimization by Max-Propagation Using Kikuchi Approximations

Bayesian Learning in Undirected Graphical Models

The Particle Filter. PD Dr. Rudolph Triebel Computer Vision Group. Machine Learning for Computer Vision

WITH. P. Larran aga. Computational Intelligence Group Artificial Intelligence Department Technical University of Madrid

13 : Variational Inference: Loopy Belief Propagation and Mean Field

A Gaussian Tree Approximation for Integer Least-Squares

Quantitative Biology II Lecture 4: Variational Methods


Sampling Algorithms for Probabilistic Graphical models

Estimation-of-Distribution Algorithms. Discrete Domain.

Variational Scoring of Graphical Model Structures

o C *$ go ! b», S AT? g (i * ^ fc fa fa U - S 8 += C fl o.2h 2 fl 'fl O ' 0> fl l-h cvo *, &! 5 a o3 a; O g 02 QJ 01 fls g! r«'-fl O fl s- ccco

Linear Response for Approximate Inference

Algorithms for Reasoning with Probabilistic Graphical Models. Class 3: Approximate Inference. International Summer School on Deep Learning July 2017

Minimizing D(Q,P) def = Q(h)

optimization problem x opt = argmax f(x). Definition: Let p(x;t) denote the probability of x in the P population at generation t. Then p i (x i ;t) =

G r a y T a k e s P o s t. A s N e w Y S e c ' y

Belief Update in CLG Bayesian Networks With Lazy Propagation

Bayesian Learning in Undirected Graphical Models

Bayesian Model Scoring in Markov Random Fields

11. Learning graphical models

3. If a choice is broken down into two successive choices, the original H should be the weighted sum of the individual values of H.

Published in: Tenth Tbilisi Symposium on Language, Logic and Computation: Gudauri, Georgia, September 2013

Rule Acquisition for Cognitive Agents by Using Estimation of Distribution Algorithms

Parts Manual. EPIC II Critical Care Bed REF 2031

Inference as Optimization

process on the hierarchical group

!"# $ % & '( INRA-RP-RE /71-0-Tir.1395

CS Lecture 19. Exponential Families & Expectation Propagation

Approximating the Partition Function by Deleting and then Correcting for Model Edges (Extended Abstract)

Speech Recognition Lecture 7: Maximum Entropy Models. Mehryar Mohri Courant Institute and Google Research

Structured Variational Inference Procedures and their Realizations

Bayesian networks: approximate inference

CITY OF LAS CRUCES INFRASTRUCTURE/CIP POLICY REVIEW COMMITTEE

Daily Register Q7 w r\ i i n i i /"\ HTT /"\I AIM MPmf>riAnpn ^ ^ oikiar <mn ^^^*»^^*^^ _.. *-.,..,» * * w ^.. i nr\r\

Application of Information Theory, Lecture 7. Relative Entropy. Handout Mode. Iftach Haitner. Tel Aviv University.

Probabilistic Graphical Models

Week 3: The EM algorithm

Fast Memory-Efficient Generalized Belief Propagation

Chapter 6 Reed-Solomon Codes. 6.1 Finite Field Algebra 6.2 Reed-Solomon Codes 6.3 Syndrome Based Decoding 6.4 Curve-Fitting Based Decoding

Introduction to Optimization

Variable Elimination: Algorithm

New Epistasis Measures for Detecting Independently Optimizable Partitions of Variables

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

arxiv: v3 [math.oc] 18 Jan 2012

Machine Learning Lecture 14

Dynamic importance sampling in Bayesian networks using factorisation of probability trees

Belief Propagation on Partially Ordered Sets Robert J. McEliece California Institute of Technology

Probabilistic Graphical Models. Theory of Variational Inference: Inner and Outer Approximation. Lecture 15, March 4, 2013

Series 7, May 22, 2018 (EM Convergence)

Statistical-Mechanical Approach to Probabilistic Image Processing Loopy Belief Propagation and Advanced Mean-Field Method

Introduction to Systems Analysis and Decision Making Prepared by: Jakub Tomczak

Outline. Spring It Introduction Representation. Markov Random Field. Conclusion. Conditional Independence Inference: Variable elimination

Behaviour of the UMDA c algorithm with truncation selection on monotone functions

Transcription:

Computational Intelligence Seminar (8 November, 2005, Waseda University, Tokyo, Japan) Statistical Physics Approaches to Genetic Algorithms Joe Suzuki 1 560-0043 1-1 Abstract We address genetic algorithms (GAs) in a dierent way. The GA selects individuals with high tness values. This corresponds to decrease the temperature of the Boltzmann distribution at each generation. Then, only the individuals with the highest tness values survive. However, this does not happen because the length L of each individual is assumed to be large. Otherwise, exhaustive search can be executed. We consider several strategies to estimate the Boltzmann distribution without consuming exponential computation of L. How to apply eld/bethe/kikuchi approximations and factorizations of the distribution are discussed. The same problem can be solved by dierentiating the Kullback-Leibler information under several constraints like as done in solving probablistic inference based on generalized belief propagation. 1 (GA) 1 Estimation of Distribution Algorithms (EDA) GA (BN) (M ) N( M) M N M EDA estimation GA M i f(i) t t e t log f (i) =(f(i)) t t t P L 2 i2f0 1g log f (i) L et L M GA EDA Heinz Muhlenbein \The Estimation of Distributions and the Minimum Relative Entropy Principle" (to appear in Evolutionary Computation 2005) GA L L GA 1 E-mail: suzuki@math.sci.osaka-u.ac.jp 1

L BN (L ) BN 2 Bethe (BN) GA GA 2. GA GA 3. GA GA \ GA" 4. EDA BN 5. BN EDA 6. BN EDA BN Chow-Liu 7. Heinz Muhlenbein GA 8. Muhlenbein 2.-6. Kullback-Leibler D(pjjq) 7.-8. Kullback-Leibler D(qjjp) p q D(pjjq) D(qjjp) 2 GA L 2 k =1 2 L k ( 2) k =2 {z } L {z } L {z } L J := f0 0 01 11g M f : J! R + f(i) i 2 J 2.1 GA c[i] i 2 J 2 2 x y x y x y 2 jij i 2 J 1 i i 2 J 0,1 2 2

( ) : i 2 J c[i]f(i) j2j c[j]f(j) (1) : 2 k j 2 J 2 k i i j j i i k 2 J (i 2 J) 1/2 : k 2 J i ( i2 J)k i 2 J 0,1 >0 i = jij (1 ; ) L;jij 1( ) 1. 2 i j 2 J 2. i j k 2 J 3. k k 0 2 J 4. 1.-3. M M ( ) (GA) 2.2 GA (c[i]) i2j c[i] =M c[i] 0 i2j ( ) 2 L (c[i]) i2j L =2 J = f00 01 10 11g M =3 (c[00] c[01] c[10] c[11]) (0 0 0 3) (0 0 1 2) (0 0 2 1) (0 0 3 0) (0 1 0 2) (0 1 1 1) (0 1 2 0) (0 2 0 1) (0 2 1 0) (0 3 0 0) (1 0 0 2) (1 0 1 1) (1 0 2 0) (1 1 0 1) (1 1 1 0) (1 2 0 0) (2 0 0 1) (2 0 1 0) (2 1 0 0) (3 0 0 0): 20 2 10 11 10 {z } M=3 (0 0 2 1) 10 10 11 (c[00] c[01] c[10] c[11]) {z } M=3 3

S L M GA s 2 S t 2 S Q =(Q(tjs)) t s2s Q qq = q q =(q(s)) s2s U f(i) i 2 J ( ) g p : R +! R + p 2 R + ( ) 0 <a<b 1. g p (a) <g p (b), p>0 2. g p (b)!1, p!1 g p (a) g p (x) =x p 2 (1) c[i]g p (f(i)) j2j c[j]g p (f(j)) : (2) p lim lim q(s) p!1!0 > 0 s 2 U =0 s 2 S ; U (3) (De Silva-Suzuki, 2005) (3) GA GA GA Q k 1 Q k 3 \ " GA 3.1 >0 i 2 J g(i) := 1 log f(i) (4) f(i) g(i) M t p t (i) (1) p t+1 (i) = j2j p t(i) expfg(i)g p t (j) expfg(j)g (5) 0 > 0 t := 0 + t p t (i) = j2j expf tg(i)g expf t g(j)g (6) 4

( ) t!1 g ( f ) p t (i) 0 1= t t p t (i) j 2 J = f0 1g L expf t g(j)g GA L (5) i 2 J (L ) 3.2 GA M t t ( ) (t ) 2(EDA( t )) 1. M N t ( M) 2. N t p t (i) 3. M N t M t N t GA ( ) GA Estimation of Distribution Algorithms (EDA) 3.3 10 GA John Holland \Adaptation in Nature and Articial Systems" De Silva-Suzuki(2005) GA Holland GA L H( J) t (c t [j]) j2j m(h t) := P i2h c[i]f(i) Pj2H c[j] (7) m(j t) GA ( ) 5

Em(H t +1) m(h t)(1 ; (H))(1 ; ) d(h) (8) E t (c[j]) j2j (H) H d(h) H m(h t +1) m(h t) ( E ) (8) P Holland M P (H t) = i2h p t(i) dp (H t) dt = P (H t)[m(h t) ; m(j t)] (9) GA (9) t =0 1 m(h t) P (H t) (6) 0 =0 =1 t = t (7) p t (i) =c[i]=m (9) dp (H t) dt = k2j p k (t) dp t (i) dt = p i (t)[g(i) ; j2j g(j)p j (t)] P P i2j p t(i)fg(i) ; j2j g(j)p j(t)g P j2j p = P (H t)(m(h t) ; m(j t)) j(t) (9) 4 BN (BN) N (1) (2) (L) (x (1) x (2) x (N) ) 2f0 1g L P ( (1) = x (1) (2) = x (2) (L) = x (L) ) L BN L < (1) < (2) < < (L) (i) f (1) = x (1) (i;1) = x (i;1) g f (k) g k2 (i), (i) f1 2 i; 1g ( (1) = fg) f (k) g k2 (i) f (i) g f (k) g k2f1 2 i;1gn (i) NY P ( (i) = x (i) j (1) = x (1) (i;1) = x (i;1) ) = i=1 i=1 NY P ( (i) = x (i) j( (k) = x (k) ) k2 (i)) =( (1) (2) (L) ) BN i =1 2 L ( (k) = x (k) ) k2 (i) x (i) 2 (i) P ( (i) = x (i) j( (k) = x (k) ) k2 (i)) =1 6

(i) = (P ( (i) = x (i) j( (k) = x (k) ) k2 (i))) x (i) 2 (i) x (k) 2 (k) k2 (i) =( (1) (2) (L) ) BN BN BN= ( ) M x 2f0 1g LM (1) = x (1) 1 (2) = x (2) 1 (L) = x (L) 1 (1) = x (1) 2 (2) = x (2) 2 (L) = x (L) 2 (1) = x (1) M (2) = x (2) M (L) = x (L) M x ^ L P j =1 2 L (j) 2 j;1 L j=1 2j;1 =2 L ; 1 \ GA" \ GA" L L f(i) i 2 J BN 5 GA i =(x (1) x (2) x (L) ) 2f0 1g L q t (k) (x (k) )= p t (x (1) x (2) x (L) ) (x (j) ) j6=k q t (k) (j) j =0 1 ^q (k) c 1 + a 1 t (1) := (10) M + a 1 + a 0 c 1 t k x (k) =1 a 0 a 1 > 0 q t (k) (x (k) ) (M a =0=a 1 =1=2 ) 2 3 2. k =1 2 L (10) 3. k =1 2 L x (k) Univariate Marginal Distribution Algorithm (UMDA) p t (i) q t (i) = Q L q(k) t (x (k) ) ^q t (i) = Q L ^q t (i) q t (i) Kullback-Leibler D(p t jjq t )= i2j q t (i) log q t(i) p t (i) ^q(k) t (x (k) ) O(L) 7

W (t) = P i2j q t(i)g(i), q (k) t = q t (k) (x (k) ) x (k) =0 1 Wright q (k) t+1 = q(k) t + q (k) t (1 ; q (k) t ) @W(t)=@q(k) t W (t) (11) 6 GA Bethe V = f1 2 Lg E = ffj kgjj k 2 V j 6= kg E E 0 E 0 V 2 p t (i) q t (i) = Qfk lg2e q(k l) t (x (k) x (l) ) (12) Qk2V [q(k) t (x (k) )] N(k);1 N(k) k 2 e e 2 E E ^q t (i) i =(x (1) x (2) x (L) ) 2 f0 1g L qt k l (j h) j h=0 1 q (k l) t (x (k) x (l) )= (x (j) ) j6=k l p t (x (1) x (2) x (L) ) ^q t (k l) c j h + a j h (j h) := (13) M + a 0 0 + a 0 1 + a 1 0 + a 1 1 c j h t k l x (k) = j x (l) = h a j h > 0 q t (k l) (j h) (M a j h =1=4 ) E p t Kullback-Leibler D(p t jjq t ) E k l 2 V k 6= l I(k l) := x (k) x (l) q (k l) t (x (k) x (l)) log q(k l) t (x (k) x (l) ) q (k) t (x (k) )q (l) t (x (l) ) (14) 3 (Chow-Liu ) 1. k l 2 V k 6= l I(k l) 2. E [ffk lgg e = fk lg 62 E k 6= l I(k l) e E [fe g E 3. fk lg 62 E k 6= l 2. D(p t jjq t )= k2v H(k) := x (k) H(k) ; fk lg2e I(k l) ;q (k) t (x (k) ) log q (k) t (x (k) ) (15) 8

3 greedy Kruscal p t (12) q t Kullback-Libler D(p t jjq t ) 3 V = f1 2 3g, I(1 2) >I(2 3) >I(3 1) > 0 E = ff1 2g f2 3gg V = f1 2 3 4g, I(1 2) > I(1 3) >I(2 3) >I(1 4) >I(2 4) >I(3 4) E = ff1 2g f1 3g f1 4gg q t q t q t (k l) (j h) := c j h M (16) q t 3 i2j ;c[i] log q t (i)+ k(e) 2 d n (17) fd n g MDL d n =logn k(e) q t 1 2 x (k) 0,1 2 1 2 k 3 I(k l) I(k l) :=I(k l) ; ( k ; 1)( l ; 1) d n (18) 2 (17) (Suzuki, 1993) (18) G =(V E) k 6=2 d n =0 d n 6=0 7 L (Factorized Distribution Algorithm, FDA) p t (x (1) x (L) )= LY j=1 p t (x (j) jx (1) x (j;1) ) (L ) GA 9

7.1 g : f0 1g L! R + I := f1 2 Lg s 1 s 2 s m g(x) = k g k (x sk ) (19) ((19) fg k g fs k g ) (x (j) ) j2sk x sk s 1 s 2 s m EDA I m fs k g m I m fd kg m (histories) fb kg m (residuals), fs k g m (separators) d 0 = fg d k := [ k j=1 s j b k := s k nd k;1 c k := s k \ d k;1 3 fs k g m running intersection property (RIP) 1. b k 6= fg, k =1 2 m 2. d m = fx 1 x 2 x m g 3. k =2 3 m c k s j j<k RIP s 1 s 2 s m s 1 s 2 s m s k1 s kl s k1 [[s kl (merge) s 1 =1 m =1 RIP merge RIP p t (x) :=e tg(x)) = P x 0 e tg(x0) p t (x) = my p t (x bk jx ck )= my p t (x bk x ck ) p t (x ck ) (20) J 23 J 31 J 12 2 R + g(x (1) x (2) x (3) )=J 23 x (2) x (3) + J 31 x (3) x (1) + J 12 x (1) x (2) I = f1 2 3g s 1 = f2 3g s 2 = f3 1g s 3 = f1 2g d 0 = fg d 1 = f2 3g d 2 = f1 2 3g d 3 = f1 2 3g b 1 = f2 3g b 2 = f1g b 3 = fg c 1 = fg c 2 = f3g s 1 c 3 = f1 2g 6 s 1 s 2 RIP 1 3 s 1 s 2 s 3 g(x (1) x (2) x (3) )=J 23 x (2) x (3) + J 31 x (3) x (1) + J 4 x (4) I = f1 2 3 4g s 1 = f2 3g s 2 = f3 1g s 3 = f4g d 0 = fg d 1 = f2 3g d 2 = f1 2 3g d 2 = f1 2 3 4g 10

b 1 = f2 3g b 2 = f1g b 3 = f4g c 1 = fg c 2 = f3g s 1 c 3 = fg s 1 RIP p t (x (1) x (2) x (3) )=p t (x (2) x (3) )p t (x (1) jx (3) )p t (x (4) )= p t(x (2) x (3) )p t (x (3) x (1) )p t (x (4) ) p t (x (3) ) 7.2 G =(V E) Y U V ( [ Y [ Z = V \ Y = Y \ Z = Z \ = fg) Y Z ( Y Z 1 ) <jzjy > G Y Z V Y Z (x 2 p(y z) > 0 y 2 Y z 2 Z p(xjy z) =p(xjz)) I( U Z) E ffu vgju v 2 V u 6= vg( ) x 2 y 2 Y Z x y ( x y ) <jzjy > G =) I( Z Y ) (21) (I-map) E (21) ( I-map) G =(V E) V ( ) (Pearl 1988) V = fx (1) x (L) g V G =(V E) G T (, junction tree) 1. T ( ) G V G T 2. T 2 C 1 C 2 V C 1 \ C 2 V C C 2 E C 1 C 0 V ( ) G 4( ) ( ) 1. G 4 2. G C 1 C m T 11

3. V := fc 1 C m g C 1 C 2 2V C 1 \ C 2 c(c 1 C 2 ) C 1 \ C 2 Kruscal T V c : VV!R + u v 2V c(u v) 2 u v 2V E greedy (Kruscal, 1956) G V V E p t (V )= (4) Q Q v2v p t(v) e2e p t(e) (1) (2) (3) @ @@ @R Figure 1: x (1) x (3) V = fx (1) x (2) x (3) x (4) g G =(V E) 1 x (1) x (3) C 1 = fx (1) x (2) x (3) g C 2 = fx (1) x (3) x (4) g 2 C 1 \ C 2 = fx (1) x (3) g (V = fc 1 C 2 g E = fc 1 \ C 2 g) p t (x (1) x (2) x (3) x (4) )= p t(x (1) x (2) x (3) )p t (x (1) x (3) x (4) ) p t (x (1) x (3) ) ( C 2V ) ( ) NP 7.3 s 1 s 2 s m 2 I ( ) ~q k (x sk )=~q k (x bk x ck ) 2 x bk x ck ~q k (x bk x ck )=1 (22) x bk ~q k (x bk x ck )=~q k (x ck ) (23) ~q 1 ~q m (consistent) RIP b k 6= fg k = 1 m q(x) := Q m ~q k(x bk jx ck ) P q(x) =1 q k (x bk jx ck )=~q k (x bk jx ck ) (24) 12

( ) RIP q k (x bk x ck ) 6= ~q k (x bk x ck ) (25) s 1 = f2 3g s 2 = f3 1g s 3 = f1 2g b 3 = fg q(x) =~q 1 (x b1 )~q 2 (x b2 jx c2 ) q x (3) q(x (1) x (2) x (3) )=~q 3 (x (1) x (2) ) fb k g fc k g ~q 1 ~q m x ~q k (x sk )= (x (j) ) j2insk q(x) (26) q (26) q H(q) :=; x q(x)logq(x) (27) q q Iterative Proportional Fitting (IPF) (Csiszar, 1975) q (0) ( 2 ;L ) x 2f0 1g L q ( +1) (x) ; q ( ) q k (x sk ) (x) P y q( ) (x sk y) x Ins k k =(( ; 1)mod m)+1 I = f1 2 3g s 1 = f2 3g s 2 = f3 1g s 3 = f1 2g m=3 q (0) (x (1) x (2) x (3) )=2 ;3 q (1) (x (1) x (2) x (3) )=~q 1 (x (2) x (3) )2 ;1 q (2) (x (1) x (2) x (3) )= ~q 1(x (2) x (3) )~q 1 (x (3) x (1) ) P x ~q 1(x (2) x (3) ) (2) ~q 3 (x (1) x (2) ) q I = f1 2 3g s 1 = f2 3g s 2 = f3 1g m=2 q (0) (x (1) x (2) x (3) )=2 ;3 q (1) (x (1) x (2) x (3) )=~q 1 (x (2) x (3) )2 ;1 q q (2) = q (3) = RIP =0 1 m m (20) (28) 13

8 BN f~q k g q p Kullback-Leibler D(qjjp) q p GA g(x) log f(x) ( ;g(x) ) U(q) G(q) U(q) :=; x q(x)g(x) G(q) :=U(q) ; H(q) Kullback-Leibler D(qjjp) D(qjjp) =U(q) ; H(q)+lnZ (29) P Z = x eg(x) q(x) =p(x) =e g(x) D(qjjp) =0 G(q) = ; ln Z GA (BN) GA q k D(qjjp) q k q k k =1 2 m t =1 p t (x) q t (x) p(x) q(x) log Z U(q) H(q) (19) f(x) = a s x s (30) si q U(q) = x q(x)f(x) = m a s q k (x s ) (31) 8.1 q k k =1 2 m 1 (m = L) q(x) = my q k (x) (32) H(q) =; m x (k) q(x (k) ) log q(x (k) ) (33) @D(qjjp) @q k = log q k + @U =0 1 ; q k @q k 14

q k = 1 1 + exp[@u=@q k ] fq k g (34) (34) 8.2 Bethe fs k g q(x) = P m ~q k(x bk jx ck ) H(q) =; m x bk x ck q k (x bk x ck ) log ~q k (x bk jx ck ) (35) RIP q k (x bk x ck ) 6= ~q k (x bk x ck ) H(q) ; m x bk x ck ~q k (x bk x ck ) log ~q k (x bk jx ck ) (36) L = U(q) ; H(q) ; ; m m x ck k (x ck )[ x bk ~q k (x bk x ck ) ; ~q k (x ck )] ; k [ ~q k (x bk x ck ) ; 1] ; x bk x ck m k [ x c k m x bk k (x bk )[ x ck ~q k (x bk x ck ) ; ~q k (x bk )] ~q k (x ck ) ; 1] (37) ~q k (x bk x ck ) ~q k (x ck ) BN BN ( ) y BN L ( ) x y x P (xjy) g(x) =; log P (xjy) U(q) =; x q(x)logp (xjy) I = f1 Lg L fs k gi 2 (Bethe ) RIP (singly connected) q(x) = Q Q (i j) ~q k(x (i) x (j) ) k [~q k(x (k) )] N(k);1 (38) x (i) x (j) N(k) k (BP) ~q k m BP BP Weiss-Freeman Generalized Belief Propagation(2004) 15

Bethe RIP (singly connected ) RIP fs k g singly connected RIP 9 Bethe 1. GA!1 BN 2. GA BN f0 1g L! R + GA BN BN GA ( NP ) SMAMIP 1. Csiszar, I. (1975) I-divergence geometry of probability distributions and minimization problems. Annals of Probability, 3(1):146{158, 1975 2. Chandi DeSilva and Suzuki, J. (2005) On the Stationary Distribution of GAs with Positive Crossover Probability. GECCO 2005, Washington DC, June 2005. 3. Chow, C. K. Liu, C. N. (1968) Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, IT-14(3), 462-467. 4. Heinz Muhlenbein and Thilo Mahnig. (1998) FDA - A scalable evolutionary algorithm for the optimization of additively decomposed functions. in Evolutionary Computation, Volume 7, 1999, 353-376 5. Heinz Muhlenbein and Robin Hons. (2005) The Estimation of Distributions and the Minimum Relative Entropy Principle. to appear in Evolutionary Computation. 6. John Holand. (1975) Adaptation in Nature and Articial Systems. MIT Press. 16

7. J. Pearl. (1988) Probabilistic Reasoning in Intelligent Systems. Palo Alto, CA: Morgan Kaufmann. 8. Suzuki, J. (1993) A Construction of Bayesian Networks from Databases Based on the MDL principle. 1993 Uncertainty in Articial Intelligence conference: 266-273 (1993) Learning Bayesian Belief Networks Based on the Minimum Description Length Principle: Basic Properties. IEICE Trans. on Foundamentals, E82-A: 2237-2245 (1999). 9. Suzuki, J. (1995) A Markov Chain Analysis on Simple Genetic Algorithms. IEEE Transactions on Systems, Man and Cybernatics, 25(2) pages 655-659, April 1995. 10. Suzuki, J. (1998) A further result on the Markov chain model of genetic algorithms and its application to a simulated annealing-like strategy. IEEE Transactions on Systems, Man and Cybernatics, 28(1) 11. Yedidia J.S., Freeman W.T. and Weiss Y. (2000) Generalized Belief Propagation. NIPS 2000 17