Convergence analysis of ant colony learning

Size: px

Start display at page:

Download "Convergence analysis of ant colony learning"

Heather Howard
5 years ago
Views:

1 Delft University of Technology Delft Center for Systems and Control Technical report Convergence analysis of ant colony learning J van Ast R Babška and B De Schtter If yo want to cite this report please se the following reference instead: J van Ast R Babška and B De Schtter Convergence analysis of ant colony learning Proceedings of the 18th IFAC World Congress Milan Italy pp Ag Sept 2011 Delft Center for Systems and Control Delft University of Technology Mekelweg CD Delft The etherlands phone: (secretary) fax: URL: This report can also be downloaded viahttp://pbdeschtterinfo/abs/11_012html

2 Convergence Analysis of Ant Colony Learning Jelmer van Ast Robert Babška Bart De Schtter Delft Center for Systems and Control of the Delft University of Technology Mekelweg CD Delft The etherlands ( jmvanast@tdelftnl rbabska@tdelftnl bdeschtter@tdelftnl) Abstract: In this paper we stdy the convergence of the pheromone levels of Ant Colony Learning (ACL) in the setting of discrete state spaces and noiseless state transitions ACL is a mlti-agent approach for learning control policies that combines some of the principles fond in ant colony optimization and reinforcement learning Convergence of the pheromone levels in expected vale is a necessary reqirement for the convergence of the learning process to optimal control policies In this paper we derive pper and lower bonds for the pheromone levels and relate those to the learning parameters and the nmber of ants sed in the algorithm We also derive pper and lower bonds on the expected vale of the pheromone levels Keywords: Evoltionary algorithms in control and identification; Reinforcement learning control 1 ITRODUCTIO Ant Colony Learning (ACL) is an algorithmic framework for the atomatic learning of control policies for non-linear systems with continos or discrete state spaces (van Ast et al ) The framework is based on the Ant Colony Optimization (ACO) class of algorithms which is inspired by the foraging behavior of ants (Dorigo and Blm 2005) In ACL a collection of agents called ants jointly interact with the system at hand in order to find an optimal control policy ie an optimal mapping between states and actions Throgh the stigmergic interaction by means of pheromones the ants are gided by each other s experience towards better control policies In discrete time the control policy will lead to a seqence of state-action pairs starting in a given initial state and terminating with the action that brings the state of the system to the desired vale A seqence of state-action pairs is called a soltion A predefined cost fnction evalates soltions and a soltion is then called optimal if it has the lowest cost compared to the cost of all possible soltion trajectories from the initial state to the goal state A policy is called optimal if for all states in the state space the soltions are optimal In this paper we present a theoretical stdy on the convergence of the pheromone levels of ACL This is a step towards a convergence proof of the algorithm to optimal control policies We derive pper and lower bonds on the pheromone levels as well as on the expected vale on the pheromone levels Frthermore we relate those bonds to the learning parameters and the nmber of ants sed in the algorithm Convergence analysis of ACL is different from the convergence analysis of the ACO gbτmin class of ACO algorithms (Stützle and Dorigo 2002) The convergence proof of Stützle and Dorigo (2002) mainly relies on the global-best pdate rle with which the pheromones are only pdated if they belong to the best soltion fond so far With ACL all soltions fond in a trial are sed to pdate the pheromone levels This reqires a different type of convergence analysis This paper is strctred as follows In Section 2 the ACL framework is presented Section 3 presents a closed expression for the total pheromone pdate that takes place dring a trial of the algorithm In Section 4 pper and lower bonds are derived for the pheromone levels Based on these reslts the pper and lower bonds on the expected vale of the pheromone levels is derived in Section 5 The paper concldes with Section 6 2 AT COLOY LEARIG 21 The Optimal Policy Learning Problem Assme that we have a nonlinear dynamic system characterized by a discrete-valed state vector q = [q 1 q 2 q n T Q where Q has a finite nmber of elements Also assme that the system can be controlled by an inpt U that can only take a finite nmber of vales and that the state can be measred at discrete time steps with a sampling timet s witht the discrete time index The sampled system is denoted as: q(t+1) p(q(t)(t)) with p a probability distribtion fnction over the state-action space The nonlinear mapping fnction h is called the control policy: (t) = h(q(t)) The optimal control problem we consider is to control the system from any given initial state q(0) = q 0 to a desired goal state q(t) = q g in at most t T steps and in an optimal way where optimality is defined by minimizing a certain cost fnction: J(s) = J( qũ) (1) with s a soltion and q = q(1)q(t) and ũ = (0)(T 1) respectively the seqences of states and actions in that soltion The problem is to find a control policy that when applied to the system in q 0 reslts in a seqence of state-action pairs ((0)q(1)) ((1)q(2)) ((T 1)q(T)) that minimizes this cost fnction In or case we aim at finding control policies for non-linear systems which in general cannot be derived analytically from the system description and the cost fnction

3 The cost fnction mst satisfy for any soltion s: 0 < τ 0 ρ J max J (s) J min (2) with J max = maxj(s) and J min = minj(s) respectively the s s largest and smallest possible vale of the cost fnction The parameter τ 0 represents the initial vale of the pheromone levels and ρ denoted the global pheromone decay rate These parameters will be frther explained in Section 22 and Section 25 respectively ote that this reqirement is not at all restrictive since adding a constant τ0 ρ to the cost fnction renders this reqirement satisfied withot changing the optimal soltion Also note that we can trivially extend the optimal control problem that we consider here to inclde a set of goal states denoted byq g In that case we can inclde a single virtal goal state to which all states q g Q g lead with probability one and a cost of zero 22 Otline of the ACL Algorithm The general otline of the ACL algorithm is as follows Initially all M ants are distribted randomly over the state space of the system and the pheromone levels τ q associated with each state-action pair (q) are set to an initial vale τ q (0) = τ 0 In what is called a trial all ants make interaction steps with the system First they decide based on the pheromone levels which action to perform after which they apply this action to their own copy of the system They store the state-action pair to their personal record called their partial soltion s p and the pheromone level at that state-action pair τ q is annealed according to (4) throgh the local pheromone pdate Each copy of the system responds to the inpt by changing its state after which the ants repeat the process by choosing a new action ntil they reach the goal state q g which terminates the trial After all ants have terminated their trial or after a predefined nmber of trials all partial soltions are added to the mltiset S trial This set is sed in the global pheromone pdate step where all soltions in the set are evalated over the cost fnction and the state-action pairs contained in the soltions receive a pheromone pdate accordingly From the otpt of the algorithm which are the pheromone vales the control policy can be derived In the following we explicitly distingish between the steps in the inner loop and the steps in the oter loop of the algorithm In the inner loop the iterations are indexed by t while in the oter loop the iterations are indexed by k In order to nderstand the timing of the pheromone pdates nambigosly the pheromone variables in the inner loop receive the sperscript local : τq local Before starting the inner loop the crrent pheromone levels are copied to the local pheromone levels: τq local (0) = τ q (k) for all stateaction pairs The first step in the inner loop is the selection of the action 23 Action Selection In the action selection step each ant c determines which action to apply to the system in a given state q c There are varios possible forms for the action selection bt the analysis in this paper is based on the ǫ-greedy action selection rle Here the amont of exploration is kept constant de to the inclsion of an explicit exploration probability ǫ: ( ) arg max τq local c = (t) cl with probability 1 ǫ l U qc random(u qc ) with probability ǫ whererandom(u qc ) denotes the random selection of an action from the action setu qc in stateq c sing a niform distribtion 24 Local Pheromone Update After every step each antcperforms a local pheromone pdate for the(q c c )-pair jst visited: τ local q c c (t+1) = (1 γ)τ local q c c (t)+γτ 0 (4) with γ [01) the local pheromone decay rate The prpose of the local pheromone pdate is to stimlate exploration of the state-action space by making it less attractive for an ant to choose the same action in a certain state as its predecessor When all ants have reached the goal or when the inner loop has timed-ot the algorithm contines with the global pheromone step in the oter loop 25 Global Pheromone Update After completion of the trial (which let s assme happens when t = T ) the pheromone levels are pdated according to the following global pheromone pdate step: τ q (k +1) =(1 ρ)τ local q (T)+ρ s S trial (k): (q) s J (s) (q) : s S trial (k) : (q) s (5) with S trial the mltiset of all candidate soltions fond in the trial and ρ (01 the global pheromone decay rate ote that elitism in the global pheromone pdate is not possible since the best soltion wold then always be the soltion starting jst prior to the goal state with the action taking the system to the goal state immediately Since we aim at learning optimal control policies from any initial state we mst also inclde every soltion fond in the pdate The pheromone deposit is eqal to J (s) = J ( qũ) the inverse of the cost fnction over the seqence of discrete state-action pairs in s ote that minimizing the cost corresponds to maximizing the pheromone levels corresponding to the optimal soltion After the global pheromone pdate the algorithm contines fork+1 at the start of the oter loop ntil the maximal nmber of trials have taken place (ie when k = K) 26 Control Policy The control policy can be extracted from the pheromone levels as follows: = h(q) = argmax(τ ql ) (6) l U q in which ties are broken randomly This eqation means that the control policy assigns the action to a given state that maximizes the associated pheromone levels 3 TOTAL PHEROMOE UPDATE Let s now derive an expression for the total effect of the pheromone pdates dring one trial At the start of a trial k the crrent pheromone levels are copied to the local pheromone levels τq local (0) = τ q (k) for all (q)-pairs In the first (3)

4 trial when τ local q (0) = τ q (0) = τ 0 for all (q)-pairs the local pheromone pdate (4) has no effect Only after some pheromone variables have received a pheromone deposit from the global pheromone pdate (5) these pheromone levels can become larger than τ 0 Let s assme that the trial is ended when t = T and that M q (k) ants have visited the (q)-pair in the given trial It is then easy to verify that the pheromone levels have then been pdated according to: τ local q (T) = (1 γ) Mq(k) (τ local q (0) τ 0 )+τ 0 = (1 γ) Mq(k) (τ q (k) τ 0 )+τ 0 (7) The global pheromone pdate is applied toτq local (T) at the end of a trial if (q) is an element of the soltion of one or more ants We can aggregate (7) and (5) to get an expression for the total pheromone pdate at the end of a trial: τ q (k +1) = (1 ρ) {(1 γ) Mq(k) (τ q (k) τ 0 )+τ 0 +ρ J (s) ifm q (k) > 0 (8) s S trial (k): (q) s τ q (k +1) = τ q (k) otherwise (9) withm q (k) the nmber of ants that have visited(q) dring trialk From (8) - (9) we can see that: (1) If a(q) is not visited by any of the ants in a given trial the pheromone level τ q will not be pdated in that trial (2) If a (q) is visited by one or more ants in a given trial τ q will be pdated in that trial and may increase or decrease in vale depending on the pheromone deposits of the ants and the vales ofγ and ρ By introdcingκas the conter of the nmber of trials in which the pheromone level of a state-action pair (q) receives a global pheromone pdate we can write (8) - (9) as: { τq pd (κ+1) = (1 ρ) (1 γ) Mq(κ) (τq pd (κ) τ 0 )+τ 0 +ρ J (s) (10) s S trial (κ); (q) s in which the sperscript pd is sed to avoid confsion between pheromone pdates indexed withk and κ 4 BOUDS O THE PHEROMOE LEVELS We will proceed or analysis by deriving lower and pper bonds for the pheromone levels Starting with the lower bond we prove the following proposition: Proposition 41 The lower bond on the pheromone levels is τ 0 Proof By indction we can show that a pheromone level can never become smaller than τ 0 when sing the pheromone pdate expression from (10): τq pd (0) =τ 0 { q (κ+1) =(1 ρ) (1 γ) Mq(κ) (τq pd (κ) τ 0 )+τ 0 +ρ J (s) s S trial (κ): (q) s (1 ρ) (1 γ)mq(κ) (τq pd (κ) τ 0 ) +τ 0 {{ by indction +ρm q (κ)jmax (1 ρ)τ 0 +ρjmax (1 ρ)τ 0 +τ 0 τ 0 in which we have sed the condition from (2) In order to derive the pper bond on the pheromone levels we se the following lemma: Lemma 42 Consider a first-order scalar difference eqation: y(k +1) = ay(k)+b witha [01)b R and an initial pointy(0) Ify(0) b 1 a then y(k) is non-decreasing with the final vale: lim y(k) = b k 1 a Proof The final vale follows trivially from the final vale theorem of the z-transform (Åström and Wittenmark 1990) We can prove that the seqence y(k) is non-decreasing as follows Using: y(k) = a k y(0)+(a k ++a+1)b y(k +1) = a k+1 y(0)+(a k +a k ++a+1)b we can derive the following expression for the difference between two consective time steps of this difference eqation: y(k +1) y(k) = (a k+1 a k )y(0)+a k b = a k (b+ay(0) y(0)) Since a [01) we have a k 0 So in order to make y(k) non-decreasing we need that: (b +ay(0) y(0)) 0 which is eqal to reqiring that y(0) b 1 a Using this lemma we can prove the following proposition: Proposition 43 For any (q )-pair the pheromone levels are bonded from above: τq pd (κ) β pper +ρmj min 1 α pper with α pper = (1 ρ)(1 γ) β pper = (1 ρ) [ (1 γ) M ( τ 0 )+τ 0 and with J min = min s J(s) the smallest possible vale of the cost fnction ote that for this theoretical analysis it is not necessary to know the vale ofj min Proof Considering a given (q )-pair let s rewrite (10) as follows: τq pd (κ+1) =(1 ρ)(1 γ) Mq(κ) τq pd (κ) +(1 ρ) [(1 γ) Mq(κ) ( τ 0 )+τ 0 +ρ J (s) s S trial (κ): (q) s

5 q q = = * q This eqation is of the form: q (κ+1) = α(κ) q (κ)+β(κ)+δ(κ) Let s now introdce θ q (κ) which satisfies the following difference eqation: θ q (κ+1) = α pper θ q (κ)+β pper +δ pper in which α pper β pper and δ pper are pper bonds of α(κ) β(κ) and δ(κ) respectively: α(κ) α pper = (1 ρ)(1 γ) β(κ) β pper = (1 ρ) [ (1 γ) M ( τ 0 )+τ 0 δ(κ) δ pper = ρmj min The pper bond for α(κ) is obtained by taking M q (κ) = 1 for allκ while the pper bonds forβ(κ) andδ(κ) are obtained for M q (κ) = M for all κ We take the same initial vales for q and θ q so θ q (0) = τ 0 Using Lemma 42 we can now show that θ q (κ) is a non-decreasing fnction of κ We immediately see that α pper 0 and we mst show that: β pper +δ pper 1 α pper θ q (0) = τ 0 This condition is satisfied if (1 ρ) [ (1 γ) M ( τ 0 )+τ 0 +ρmj min τ 0 1 (1 ρ)(1 γ) or (1 ρ) [ (1 γ) M ( τ 0 )+τ 0 +ρmj min τ 0 +(1 ρ)(1 γ)τ 0 0 Recalling the ranges for ρ viz (01 and for γ viz [01) we can show that the latter ineqality holds as follows: (1 ρ) (1 γ) M ( τ 0 )+τ 0 {{ 0 +(1 ρ)(1 γ)τ 0 +ρmj min τ 0 τ 0 Moreover by indction it follows that q (κ) θ q (κ): τq pd (0) = θ q (0) τq pd (κ+1) = α(κ)τq pd (κ)+β(κ)+δ(κ) α pper θ q (κ)+β pper +δ pper = θ q (κ+1) We can se Lemma 42 to get lim θ q(κ) = β pper +δ pper κ 1 α pper Since we have shown that θ q (κ) is non-decreasing we have now arrived at the conclsion that: τq pd (κ) θ q (κ) β pper +ρmj min 1 α pper ote that this pper bond is only tight form = 1 5 BOUDS O THE EXPECTED VALUE OF THE PHEROMOE LEVELS The bonds derived in the previos section are sefl as they give s information abot the range in which the pheromone levels reside Moreover the bonds give the relations between the global and local pheromone decay ratesρandγ the nmber of ants M and the initial vale of the pheromone levels τ 0 However two shortcomings of these bonds prevent s from drawing conclsions abot convergence of the algorithm: (1) Or pper bond is not tight form > 1 (2) Or pper bond is the same for all pheromone levels associated with all (q)-pairs Especially the second isse prevents s from analyzing whether and when the pheromone levels associated with the optimal state-action pairs become larger than the pheromone levels associated with sboptimal state-action pairs In this section or aim is to find expressions for the evoltion of individal pheromone levels However two factors in the algorithm in particlar complicate sch an analysis: (1) The total pheromone pdate from (10) contains M q (κ) the nmber of ants that have visited the(q )-pair in trial κ which is dependent on many nknown factors sch as the other pheromone levels and the exploration process (2) The total pheromone pdate of τq pd (κ) depends on the pheromone deposits J (s) from all soltions fond in trial κ The pdate of a particlar pheromone level ths depends on all state-action pairs prior to (q) and all state-action pairs following (q ) which also depends on many nknown factors sch as the other pheromone levels and the exploration process There are too many ncertainties involved in the algorithm in order to find a closed expression of the final pheromone levels This is inherent to learning algorithms that contain random variables sch as exploration and in which the credit assignment sch as the distribtion of rewards in reinforcement learning (Stton and Barto 1998) or the distribtion of the pheromone deposits in ACL depends on a series of decision variables In the following we eliminate the ncertainty arising from exploration by looking at the expected vale of the pheromone levels We eliminate the ncertainty arising from the stateaction pairs prior to (q ) and the state-action pairs following (q) by considering the sitation depicted in Figre 1 Fig 1 From a state q the action will take the system to another state q from which there are possible actions The action in this state will bring the system to the goal state optimally The action ū does so sb-optimally bt still with a lower cost than the other actions Here we regard the M q (κ) ants to start in state q and all choose the action All ants are taken toq after which they can choose between possible actions Withot loss of generality we assme that the action 1 = then takes an ant to the goal state immediately and with the lowest cost compared to the other available actions It is ths considered to be the optimal action The action 2 = ū is the second-best action It takes an ant to the goal state with a higher cost compared to 1 bt with a lower cost compared to all the other actions Let s also g

6 assme that the inverse of the cost reslting from all possible actions is ordered as follows: J minq = J (q 1 = ) > J (q 2 = ū) = J secondq > > J (q ) = J maxq Here J q is a short-hand expression for the cost that reslts from an action chosen in q and possible other state-action pairs following q The sbscripts min second etc then denote respectively the lowest or next to lowest possible cost to reslting in this manner By definition J (q 1 = ) = J minq and J (q ) = J maxq We will analyze the behavior of the expected vale ofτq pd (κ) Since the cost reslting from the state-action pair (q ) is independent from the action chosen inq any constant cost for J (q) will do for or analysis Withot loss of generality and for the sake of simplicity we takej (q) = 0 althogh formally this is impossible since J (q) Jmax > 0 for any (q)-pair ote that it ths also holds that J J minq minq = J secondq = J secondq etc We assme in this section that the optimal action from q is crrently also associated with the highest pheromone level and is ths also designated to be optimal Dring learning this does not have to be the case since other actions may be associated with higher pheromone levels and is ths not yet known to be the optimal action Compting the expected vale of τq pd (κ) involves taking the expected vale of the nmber of ants M q (κ) in the exponent which severely complicates deriving an analytical expression for the expected vale of τq pd (κ) We mst ths shift or aim by choosing to derive pper and lower bonds on the expected vale of τq pd (κ) instead Proposition 51 For any state-action pair (q ) the expected vale of the pheromone levels is bonded from above: with q (κ) β pper +ρmjexpq (11) 1 α pper E[ α pper = (1 ρ)(1 γ) β pper = (1 ρ) [ (1 γ) M ( τ 0 )+τ 0 Jexpq = (1 ǫ)j minq +ǫj avgq and J avgq the inverse of the average cost expected to reslt when moving fromqto the goal Proof When choosing the pheromone level q is increased the most when: q (κ+1) =(1 ρ)(1 γ) α pper q (κ)+ρmj minq +(1 ρ) [ (1 γ) M ( τ 0 )+τ 0 β pper When choosing ū the pheromone level q is increased the most when: q (κ+1) =(1 ρ)(1 γ) α pper q (κ)+ρmj secondq +(1 ρ) [ (1 γ) M ( τ 0 )+τ 0 β pper The largest increase of τq pd for the other actions follows in a similar manner The probability that the optimal action is chosen is: p( ) = 1 ǫ+ ǫ ( ) = 1 ǫ namely the probability of not exploring pls the probability of selecting that action while exploring (which is niformly distribted) The probability of choosing any of the other actions is p( i ) = ǫ for i The expected vale of τq pd when increasing the most can now be compted as follows: E pper [τq pd (κ+1) [ ( ) = 1 ǫ + (α pper q (κ)+β pper +ρmj minq ) (α pper q (κ)+β pper +ρmj secondq ) (α pper + q (κ)+β pper +ρmj [ ( ) =α pper τq pd (κ)+β pper + 1 ǫ + ρmj secondq ++ {{ terms =α pper τq pd (κ)+β pper +[1 ǫρmj + maxq) ρmj maxq minq ρm(j minq +J secondq ++J maxq) ρmj minq =α pper τq pd (κ)+β pper +ρmjexpq where Jexpq = (1 ǫ)j minq + ǫj avgq is the expected pheromone deposit on τq pd Since E pper [τq pd (κ) E[τq pd (κ) for all κ the following difference eqation describes the evoltion of the pper bond of the expected vale of τq pd (κ): E pper [τq pd (κ+1) =α pper E pper [τq pd (κ)+β pper Since we can show that +ρmj expq E pper [τq pd (0) = τ 0 β pper +ρmjexpq 1 α pper we know from Lemma 42 that E pper [τq pd (κ) is nondecreasing and that: E[τq pd (κ) E pper [τq pd (κ) β pper +ρmjexpq 1 α pper ote that J avgq is generally not known althogh it might be possible to estimate it Similar to the pper bond we can derive a lower bond on the expected vale of pheromone levels Proposition 52 For any state-action pair (q ) in the limit for κ the expected vale of the pheromone levels is bonded from below: with lim κ E[τpd q (κ) β lower +ρj expq 1 α lower (12) α lower = (1 ρ)(1 γ) M β lower = (1 ρ)[(1 γ)( τ 0 )+τ 0 Jexpq = (1 ǫ)j minq +ǫj avgq

7 Proof When choosing q is increased the least when: q (κ+1) =(1 ρ)(1 γ) M q (κ) {{ α lower +(1 ρ)[(1 γ)( τ 0 )+τ 0 {{ β lower +ρjminq since for allκα(κ) = (1 ρ)(1 γ) Mq(κ) is the smallest for M q (κ) = M andβ(κ) = (1 ρ) [ (1 γ) Mq(κ) ( τ 0 )+τ 0 is the smallest for M q (κ) = 1 The rest of the proof follows easily along the same lines as the proof of Proposition 51 The expected vale of the pheromone levels lies between these bonds E lower [τq pd (κ) E[τq pd (κ) E pper [τq pd (κ) for all κ In the limit for κ the expected vale of the pheromone levels lies between the derived pper and lower bonds: β lower +ρjexpq lim 1 α lower κ E[τpd q (κ) β pper +ρmjexpq 1 α pper The expressions for α(κ) α pper α lower β(κ) β pper and β lower are presented in Table 1 Table 1 The expressions for α(κ) and β(κ) and their bonds ICIS project Self-Organizing Moving Agents (grant no BSIK03024) REFERECES Åström KJ and Wittenmark B (1990) Compter Controlled Systems Theory and Design Prentice-Hall Englewood Cliffs J Dorigo M and Blm C (2005) Ant colony optimization theory: a srvey Theoretical Compter Science 344(2-3) Stützle T and Dorigo M (2002) A short convergence proof for a class of ant colony optimization algorithms IEEE Transactions on Evoltionary Comptation 6(4) Stton RS and Barto AG (1998) Reinforcement Learning: An Introdction MIT Press Cambridge MA van Ast JM Babška R and De Schtter B (2009) ovel ant colony optimization approach to optimal control International Jornal of Intelligent Compting and Cybernetics 2(3) van Ast JM Babška R and De Schtter B (2010) Ant colony learning algorithm for optimal control In R Babška and FCA Groen (eds) Interactive Collaborative Information Systems volme 281 of Stdies in Comptational Intelligence Springer Berlin / Heidelberg α(κ) and its pper and lower bond α(κ) = (1 ρ)(1 γ) Mq(κ) α pper = (1 ρ)(1 γ) α lower = (1 ρ)(1 γ) M β(κ) and its pper and lower bond β(κ) = (1 ρ) [ (1 γ) Mq(κ) ( τ 0 )+τ 0 β pper = (1 ρ) [ (1 γ) M ( τ 0 )+τ 0 β lower = (1 ρ)[(1 γ)( τ 0 )+τ 0 = τ 0 γ(1 ρ) 6 COCLUSIOS In this paper we have analyzed the evoltion of the pheromone levels in Ant Colony Learning In particlar we have stdied the sitation in which from a state q an action will take the system to another state q from which there are possible actions There the pheromone level τ q was dependent on the action chosen in the state q frther downstream towards the goal The goal wold be reached after choosing the optimal action in q Becase of the npredictability of the nmber of ants that visit a particlar state we have derived pper and lower bonds on the pheromone levels followed by pper and lower bonds on the expected vale of the pheromone levels In ftre research we will extend the convergence analysis towards a convergence proof of the algorithm to optimal control policies The convergence analysis of the pheromone levels as presented in this paper is an essential step for this Ftre research mst also focs on the applicability of the convergence reslts to real-world control problems where the strctre of the control problem mst be taken into accont ACKOWLEDGMET This research is financially spported by Senter Ministry of Economic Affairs of the etherlands within the BSIK-

A Model-Free Adaptive Control of Pulsed GTAW

A Model-Free Adaptive Control of Plsed GTAW F.L. Lv 1, S.B. Chen 1, and S.W. Dai 1 Institte of Welding Technology, Shanghai Jiao Tong University, Shanghai 00030, P.R. China Department of Atomatic Control,