Developing Communication Strategy for Multi-Agent Systems with Incremental Fuzzy Model

Similar documents
Supporting information How to concatenate the local attractors of subnetworks in the HPFP

Reinforcement Learning for a New Piano Mover s Problem

An improved statistical disclosure attack

Advanced Electromechanical Systems (ELE 847)

A NEW INTERPRETATION OF INTERVAL-VALUED FUZZY INTERIOR IDEALS OF ORDERED SEMIGROUPS

Cubic Bezier Homotopy Function for Solving Exponential Equations

Visual Robot Homing using Sarsa(λ), Whole Image Measure, and Radial Basis Function.

ANOTHER CATEGORY OF THE STOCHASTIC DEPENDENCE FOR ECONOMETRIC MODELING OF TIME SERIES DATA

Query Data With Fuzzy Information In Object- Oriented Databases An Approach The Semantic Neighborhood Of Hedge Algebras

OPERATOR-VALUED KERNEL RECURSIVE LEAST SQUARES ALGORITHM

Privacy-Preserving Bayesian Network Parameter Learning

Chapter 2: Evaluative Feedback

e t dt e t dt = lim e t dt T (1 e T ) = 1

Hidden Markov Model. a ij. Observation : O1,O2,... States in time : q1, q2,... All states : s1, s2,..., sn

Simplified Variance Estimation for Three-Stage Random Sampling

Electromagnetic Transient Simulation of Large Power Transformer Internal Fault

Motion Feature Extraction Scheme for Content-based Video Retrieval

Origin Destination Transportation Models: Methods

Variants of Pegasos. December 11, 2009

Solution in semi infinite diffusion couples (error function analysis)

Partially Observable Systems. 1 Partially Observable Markov Decision Process (POMDP) Formalism

Concise Derivation of Complex Bayesian Approximate Message Passing via Expectation Propagation

To Possibilities of Solution of Differential Equation of Logistic Function

Tighter Bounds for Multi-Armed Bandits with Expert Advice

Memory Size Estimation of Supercomputing Nodes of Computational Grid using Queuing Theory

4.8 Improper Integrals

Research Article Oscillatory Criteria for Higher Order Functional Differential Equations with Damping

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

Person Movement Prediction Using Hidden Markov Models

Lecture 4: Trunking Theory and Grade of Service (GOS)

II The Z Transform. Topics to be covered. 1. Introduction. 2. The Z transform. 3. Z transforms of elementary functions

Rank One Update And the Google Matrix by Al Bernstein Signal Science, LLC

[ ] 2. [ ]3 + (Δx i + Δx i 1 ) / 2. Δx i-1 Δx i Δx i+1. TPG4160 Reservoir Simulation 2018 Lecture note 3. page 1 of 5

EEM 486: Computer Architecture

( ) () we define the interaction representation by the unitary transformation () = ()

Numerical Simulations of Femtosecond Pulse. Propagation in Photonic Crystal Fibers. Comparative Study of the S-SSFM and RK4IP

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

Interval Estimation. Consider a random variable X with a mean of X. Let X be distributed as X X

Testing a new idea to solve the P = NP problem with mathematical induction

Decompression diagram sampler_src (source files and makefiles) bin (binary files) --- sh (sample shells) --- input (sample input files)

Minimum Squared Error

THE EXISTENCE OF SOLUTIONS FOR A CLASS OF IMPULSIVE FRACTIONAL Q-DIFFERENCE EQUATIONS

0 for t < 0 1 for t > 0

On One Analytic Method of. Constructing Program Controls

Research on Negotiation based Bargaining Strategies in e-commerce Jiang Jianhua 1,a, Zhang Guangyun 1,b, Hong Niansong 2,c

September 20 Homework Solutions

The Characterization of Jones Polynomial. for Some Knots

RL for Large State Spaces: Policy Gradient. Alan Fern

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

Minimum Squared Error

Reinforcement Learning


Multi-load Optimal Design of Burner-inner-liner Under Performance Index Constraint by Second-Order Polynomial Taylor Series Method

In the complete model, these slopes are ANALYSIS OF VARIANCE FOR THE COMPLETE TWO-WAY MODEL. (! i+1 -! i ) + [(!") i+1,q - [(!

RL for Large State Spaces: Policy Gradient. Alan Fern

Parameter estimation method using an extended Kalman Filter

Exact Dynamic Programming for Decentralized POMDPs with Lossless Policy Compression

Robustness Experiments with Two Variance Components

Graduate Macroeconomics 2 Problem set 5. - Solutions

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

Review: Transformations. Transformations - Viewing. Transformations - Modeling. world CAMERA OBJECT WORLD CSE 681 CSE 681 CSE 681 CSE 681

A Preliminary Study on Preference Elicitation in DCOPs for Scheduling Devices in Smart Buildings

Epistemic Game Theory: Online Appendix

Uplink Call Admission Control Techniques for Multimedia Packet Transmission in UMTS WCDMA System

Applied Statistics Qualifier Examination

Jordan Journal of Physics

Advanced Macroeconomics II: Exchange economy

Macroscopic quantum effects generated by the acoustic wave in a molecular magnet

Physics 201 Lecture 2

Chapter Lagrangian Interpolation

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

Adaptive and Coordinated Traffic Signal Control Based on Q-Learning and MULTIBAND Model

The road traffic system constitutes one of the cornerstones of modern

W. B. Vasantha Kandasamy Florentin Smarandache NEUTROSOPHIC BILINEAR ALGEBRAS AND THEIR GENERALIZATIONS

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

Single-loop System Reliability-Based Design & Topology Optimization (SRBDO/SRBTO): A Matrix-based System Reliability (MSR) Method

An Intelligent Agent Negotiation Strategy in the Electronic Marketplace Environment

A Deterministic Algorithm for Summarizing Asynchronous Streams over a Sliding Window

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Software Reliability Growth Models Incorporating Fault Dependency with Various Debugging Time Lags

Contraction Mapping Principle Approach to Differential Equations

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

HERMITE SERIES SOLUTIONS OF LINEAR FREDHOLM INTEGRAL EQUATIONS

Active Model Based Predictive Control for Unmanned Helicopter in Full Flight Envelope

Motion. Part 2: Constant Acceleration. Acceleration. October Lab Physics. Ms. Levine 1. Acceleration. Acceleration. Units for Acceleration.

Mathematics 805 Final Examination Answers

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

International Journal of Modeling and Optimization, Vol. 2, No. 2, April 2012

Mechanics Physics 151

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

The solution is often represented as a vector: 2xI + 4X2 + 2X3 + 4X4 + 2X5 = 4 2xI + 4X2 + 3X3 + 3X4 + 3X5 = 4. 3xI + 6X2 + 6X3 + 3X4 + 6X5 = 6.

Example: MOSFET Amplifier Distortion

Political Economy of Institutions and Development: Problem Set 2 Due Date: Thursday, March 15, 2019.

An introduction to Support Vector Machine

Introduction. Voice Coil Motors. Introduction - Voice Coil Velocimeter Electromechanical Systems. F = Bli

Online Appendix for. Strategic safety stocks in supply chains with evolving forecasts

Chapter Newton-Raphson Method of Solving a Nonlinear Equation

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

Linear Response Theory: The connection between QFT and experiments

f t f a f x dx By Lin McMullin f x dx= f b f a. 2

Transcription:

(IJACSA) Inernonl Journl of Advnced Compuer Scence nd Applcons, Developng Communcon Sregy for Mul-Agen Sysems wh Incremenl Fuzzy Model Sm Hmzeloo, Mnsoor Zolghdr Jhrom Deprmen of Compuer Scence nd Engneerng Shrz Unversy Shrz, Irn Absrc Communcon cn gurnee he coordned behvor n he mul-gen sysems. However, n mny relworld problems, uncon my no be vlble every me becuse of lmed bndwdh, nosy envronmen or uncon cos. In hs pper, we nroduce n lgorhm o develop uncon sregy for cooperve mul-gen sysems n whch he uncon s lmed. Ths mehod employs fuzzy model o esme he benef of uncon for ech possble suon. Ths specfes mnml uncon h s necessry for successful on behvor. An ncremenl mehod s lso presened o cree nd une our fuzzy model h reduces he hgh compuonl complexy of he mul-gen sysems. We use severl sndrd benchmrk problems o ssess he performnce of our proposed mehod. Expermenl resuls show h he genered uncon sregy cn mprove he performnce s well s full-uncon sregy, whle he gens ulze lle uncon. Keywords Mul-gen sysems; decenrlzed prlly observble Mrkov decson process; uncon; plnnng under uncerny; fuzzy nference sysems I. INTRODUCTION One of he mn gols of rfcl nellgence s desgnng uonomous gens nercng n domn. A Mul-Agen Sysem (MAS) ncludes mulple uonomous gens operng n n uncern envronmen n order o mxmze her uly. In MAS, ech gen ndependenly perceves s locl envronmen nd nfluence he envronmen by execung s cons. Mny rfcl nellgence problems cn ke dvnge of MAS desgn such s mulple moble robos, sensor neworks, dsser response ems, smr cy nd vdeo gmes. There re wo ypes of problems n MASs, self-neresed nd cooperve sengs [1]. In self-neresed scenro he gens cn hve dfferen nd even conflcng gols. In cooperve seng, whch we focus on n hs work, he gens coopere o rech shred rge. In hs cse, ech gen ndvdully mkes decson bsed on s locl observon, bu he mxmum rewrd wll be cheved when he ndvdul decsons re coordned. Communcon s n mporn fcor o preserve coordned behvor. However, uncon s no lwys vlble, especlly when he gens hve lm on bery usge or he uncon chnnel s nosy or lmed. Therefore, one of he mn chllenges n MASs s o mnn coordnon over long perod of me wh mnml uncon. Vrous mhemcl models hve been used o chrcerze decson-mkng problems. In sochsc fully observble envronmen, Mrkov Decson Process (MDP) provdes powerful modelng ool. Prlly Observble Mrkov Decson Process (POMDP) s employed n problems wh lmed sensng cpbles. Decenrlzed Prlly Observble Mrkov Decson Process (Dec-POMDP) s powerful frmework for collborve mul-gen plnnng n n uncern envronmen [2]. In hs pper, we use Dec- POMDP o model cooperve MAS problems. The sreges of mul-gen problems re cegorzed n wo cegores, fne-horzon nd nfne-horzon Dec- POMDPs. Fne-horzon polces re usully represened by decson ree nd numerous echnques hve been proposed o obn or pproxme he opml polces [3]-[5]. Moreover, number of mehods hve been developed o genere decenrlzed polces wh mnml uncon usge [1], [6]. On he oher sde, fne se conrollers (FSCs) s mor model o represen nfne-horzon Dec-POMDP polcy. Severl opmzon echnques hve been used o pproxme he prmeers of FSCs, for exmple, Lner progrmmng [7], nonlner progrmmng [8] nd expecon-mxmzon [9], [10]. However, denfyng bes suons for uncon hs no been consdered by exsng mehods. Ths pper focuses on solvng hs ssue. One of he powerful funcon pproxmors re fuzzy sysems h cn pproxme ny non-lner sysem o n rbrry ccurcy. A fuzzy sysem s cpble of hndlng hgh level of uncerny by compc fuzzy rule-bse. Therefore, presenng fuzzy model s desrble o solve MAS n prevous sudes [11]. In [12] n ncremenl fuzzy conroller hs been nroduced o fnd soluon of lrge MASs. Our m n hs pper s o presen n lgorhm o denfy bes suons for mkng uncon n MASs modelled by nfne-horzon Dec-POMDP. Ths mehod develops sregy h helps he gens o mnn coordnon wh mnml uncon. Ths uncon polcy s developed cenrlzed n rnng phse, where he uncon s no resrced. The gens use hs polcy decenrlzed n es envronmen h he uncon chnnel s lmed. Ths pper presens n ncremenl mehod o esme he benefs of uncon n every possble suon h he gens cn hve. Bsed on hs esmon, he gens cn decde when he uncon hs he mos mpc on he mprovemen of he fnl performnce. The resuls show h he performnce of he presened www.cs.hes.org 167 P g e

(IJACSA) Inernonl Journl of Advnced Compuer Scence nd Applcons, uncon sregy s lmos he sme s he full uncon. The orgnzon of he res of pper s s follows. Secon 2 formlly defnes he nfne-horzon Dec-POMDP nd gves n overvew of he Dec-POMDP soluon mehods. Secon 3 presens he dels of he proposed mehod. In Secon 4 we evlue he proposed uncon sregy on severl well-known Dec-POMDP problems. Fnlly, he conclusons re gven n Secon 5. II. BACKGROUND AND RELATED WORKS In hs pper, we consder group of gens coopere wh ech oher n n uncern envronmen over nfne me seps. A ech me sep, he gens ke on con (con for -h gen) h cuses he se of he envronmen o chnge from s o s +1. Afer h, ech gen perceves s observon nd receves globl rewrd from he envronmen. Ths cycle repes over nfne seps. Ths ype of MAS problems s properly modelled by nfne-horzon Dec-POMDP [12]. Fg. 1 dsplys he nercon of he gens nd he envronmen. Fg. 1. Dec-POMDP Seup. A. Infne-Horzon Dec-POMDP In nfne-horzon Dec-POMDP, group of gens re consdered h opere n n uncern envronmen over nfne seps. Infne-horzon Dec-POMDP s uple 0 I, S,{ A },{ }, P, O, R, b, where I s fne se of gens nd S s fne se of ses. Ech se deermnes he specfc suon of he envronmen. The number of gens s NI nd N S s he number of ses. A nd specfy he fne se of cons nd observons vlble for gen. 1,..., N I denoes on con ( A I A ) nd o o1,..., denoes on observon ( I ). If o N I he gens ke on con n me sep, he se of he envronmen s rnsoned from s o s +1 wh probbly 1 P( s s, ). The probbly of he on observon o 1 n se s +1 fer he gens perform on con s 1 1 O( o s, ). A he end of ech me sep, he envronmen gves he gens he globl rewrd R( s, ) for kng he on con n he se s. The nl se 0 dsrbuon s b. The belef vecor b b, 1b, N S deermnes he belef of -h gen bou he se of he envronmen n me sep. In fc, b s probbly dsrbuon over S such h b, n specfes he belef of -h gen h he se of he envronmen s s n. The belef spce s n N s -dmensonl spce defned by he belef vecor. For nfne-horzon Dec-POMDP problems wh he nl se dsrbuon b 0, he soluon s on polcy h mxmzes he expeced nfne-horzon dscouned rewrd 0 E R( s, ) b, where dscoun fcor (0 < 1) 0 lms he summon of rewrds n he nfne-horzon. Fndng he opml soluon for he nfne-horzon Dec- POMDP my no be prccl, becuse of unbounded number of seps [13]. Prevous reserches hve red o fnd subopml soluon by usng bounded polcy represenon. The mos on polcy represenon s fne se conrollers (FSCs). Severl pproches hve presened o esme he prmeers of FSCs such s lner progrmmng [7], nonlner progrmmng [8] nd expecon-mxmzon [9], [10]. Vlue funcon s noher pproch o represen he polcy n nfne-horzon Dec-POMDP problems [14]. In our prevous work [12] we hve nroduced n ncremenl mehod o lern fuzzy model s vlue funcon. I generes compc fuzzy rule-bse s soluon h offers sclbly for lrge MAS problems. As sed before, obnng mnml uncon o coordne he behvor of he gens s one of he mn chllenges n cooperve MAS problems. Therefore, severl mehods hve been nroduced o deermne he uncon sregy. Mos of hese lgorhms work for fne-horzon Dec- POMDP cses [15], [6]. F. Wu e l. [1] nroduced n onlne plnnng pproch o reduce he compuonl complexy. To cope wh lmed bndwdh, he gens unce only when hsory nconssency s deeced. The presened mehod n [16] clcules dvergence beween he gens belef o evlue uncon. Snce hs mehod hs consdered n mprecse ssumpon for clculng belef dvergence, cnno ccurely esme he vlue of uncon. B. Incremenl Lernng An ncremenl lernng s mehod h crees model by recursvely exrcng requred nformon from sequence of ncomng d. Ths lernng mehod s ble o sr lernng from scrch. Is prmeers nd srucure re uned ncremenlly ccordng o curren nformon whou memorzng prevous observon. Thus, he model cn be creed usng low compuonl complexy nd lmed memory sze. Evolvng fuzzy [17] nd neuro-fuzzy [18] sysems re he mos populr pproches for ncremenl lernng. Shhprs e l. n [19] proposed wo fs mehods for dpng cerny fcors of fuzzy rules, bsed on he renforcemen lernng nd rewrd nd punshmen. In [20] smple nd fs mehod s proposed h uses grden decen o www.cs.hes.org 168 P g e

(IJACSA) Inernonl Journl of Advnced Compuer Scence nd Applcons, une he srucure nd prmeers of fuzzy clssfer. D. Kngn e l. n [21] nd [22] hve nroduced group of ncremenl mehods clled TEDA h cn be used for cluserng, regresson nd clssfcon. Incremenl mehods re lso employed o fnd polcy for nfne-horzon Dec- POMDP. An ncremenl renforcemen lernng lgorhm s presened n [12] o cree compc fuzzy model s soluon of lrge MASs. III. OUR PROPOSED METHOD In hs pper, we nroduce mehod o fnd uncon sregy for cooperve MAS problems n whch he uncon s expensve or lmed. Ths mehod esmes he benef of uncon by compung he effec of uncon on ncresng ccumuled rewrd for ech suon. Ths cn be used o obn mnml uncon h s necessry for successful on behvor. In hs pper, we exend our prevous mehod presened n [12]. In h mehod, ech gen mkes use of n ndvdul fuzzy rule-bse o nerc wh he envronmen. These rulebses h mp he belef spce o he vlue of he cons, re creed nd uned by n ncremenl renforcemen lernng lgorhm regrdng experences of he gens. In hs pper, wo phses re consdered, lernng nd execuon phse. In he lernng phse, uncon beween he gens s no lmed nd he lgorhm freely shres he nformon of he gens o une he uncon sregy. However, here s lmed bndwdh n he execuon phse nd he gens use he lerned sregy o denfy he suons where he uncon cn be benefcl o mprove he performnce. A. Lernng Phse In hs phse, he gens nerc wh he envronmen nd n ddon o unng her behvor ccordng o he response of he envronmen, he uncon sregy s dused. To do hs, ech gen hs n ndvdul decson mkng sysem o selec he bes con n every me sep nd here s shred uncon rule-bse, h s used o lern he benef of uncon for ech suon. The benef of uncon, Q c, s compued by comprng he oucomes of wo dfferen gen-envronmen nercons n he prculr se of he envronmen. Snce he se of he envronmen s no vlble n Dec-POMDPs, we pproxme wh he belef vecors of he gens. In ech me sep, once, he gens selec he con whou usng uncon nd once gn, he cons re seleced fer shrng nformon. The dfference beween he vlue of hese wo seleced cons (.e. mmede rewrd plus he expeced ccumulon of fuure rewrds) deermnes he benef of uncon. Therefore, here s uple for ech me sep h conns wo prs: prculr suon of he envronmen, whch specfed by belef vecor nd Q c, he benef of uncon for hs suon. We cll hs uple n experence. Fg. 2 llusres he process of producng n experence n me sep. Snce he gens nerc wh he envronmen mny mes nd n experence s cheved for ech me sep, here s sequence of experences (one elemen for ech me sep). The uncon rule-bse s creed nd uned usng hs sequence. The sequence of experences heoreclly s nfne n nfne-horzon Dec-POMDP problems. Therefore, we hve nroduced n ncremenl lgorhm o develop uncon sregy. We descrbe he process of producng n experence nd updng mechnsm ccordng o n experence n followng wo sub-secons. 1) Producng n experence: A ech me sep, frs, he gens nerc wh he envronmen, usng only locl nformon. Ech gen updes s prevous belef vecor 1 b o locl belef vecor b h s compued bsed on s 1 prevous con nd locl observon o. s S, b ( s) 1 O( o s, ) 1 A o O s S 1 O( o s, ) s S 1 A o O s S 1 P( s s, 1 1 P( s s, ) b ( s) ) b 1 ( s) Fg. 2. The process of producng n Experence n me Sep. www.cs.hes.org 169 P g e

Where, o nd 1 re he on observon nd he on con of ll gens excep gen respecvely. Also, O nd A re ll possble on observons nd ll possble on cons for he oher gens, o o, o nd 1 1 1,. Then, ccordng o b, he gen selecs he bes con. As sed before, we used our prevuos work presened n [12] o deermne he behvor of he gens. In hs mehod, ech gen hs ndvdul fuzzy rule-bse o esme he vlue of he cons ccordng o s belef vecor. In fc, fuzzy rule-bse of -h gen deermnes Q ( b, m), he expeced vlue of con m. A ech me sep, he gens esme he vlue of her cons nd perform he con hvng mxmum vlue. rg mx Q ( b, m) m (IJACSA) Inernonl Journl of Advnced Compuer Scence nd Applcons, Afer obnng, he sme process s done o deermne he ppropre on con f he gens shre her locl nformon. To do hs, he lgorhm consders b, globl belef vecor nd upde by usng on con 1 nd on observon s S, ss 1 o. s S, b ( s) 1 O( o s, ) P( s s, O( o s, ) ss 1 P( s s, Usng globl belef vecor bes con, : ) b 1 1, ) b b, ( s) 1, ( s) s, ech gen selecs he,, rg mx Q ( b, m) m Therefore, here re wo on cons n ech me sep for nercng wh he envronmen; f he gens unce o ech oher,, s seleced nd f hey mke decson bsed on he locl nformon, s seleced. The dfference beween he oupus of hese wo on cons, deermnes he vlue of he uncon n me sep. 1, Assume r nd o re globl rewrd nd on observon f he gens ke on con, ; nd f he gens perform on con 1, hey receve r nd o from he envronmen. The dfference beween he oupus of hese wo on cons s clculed s follow: Q c 1, 1 r V( b ) r V ( b ) 1 1 Where b s upded for ech gen usng, o nd b 1, o 1, (1), nd lso s upded usng,, nd 1, 1 1, sme equon. V( b ) (.e. V( b ) or V( b ) ) s he 1, esmed vlue of he ncomng suon. In fc, V ( b ) esmes ccumuled rewrd h wll be cheved n he 1, fuure seps. V( b ) s esly obned by one-sep lookhed: 1, 1, V ( b ) mx Q ( b, ) A In hs mnner, whenever ech gen hs he sme belef vecor s b, he benef of he uncon s Q c (.e. uncon cn ncrese he ccumuled rewrd by Q c ). Our proposed lgorhm uses hs uple b, Qc s n experence o une he uncon rule-bse. 2) Updng mechnsm: In he lernng phse, he gens nerc wh he envronmen mny mes nd n experence s cheved for ech me sep. Hence, here s sequence of experences h our lgorhm uses o cree nd une he uncon rule-bse. The proposed mehod combnes he nformon of experences by cluserng he smlr experences, n whch cener of ech cluser denfes uncon rule. Snce he number of experences n he lernng phse s huge, we nroduce n ncremenl pproch o cluser he experences. In he followng, we presen he ncremenl process of unng he uncon sregy ccordng o n experence: Ech rule specfes he benef of uncon for regon of belef spce. The -h rule n uncon rulebse, R, hve followng form: R : f b s lke B hen Q Q Where B B, 1 B, N s reference belef S vecor [12] of rule h specfes he cener of he regon nd Q represens he expeced benef of uncon for hs regon. Assume he -h gen hs n experence b, Qc n he me sep. The lgorhm denfes he mos smlr reference belef vecor o b. To do hs, he smlry of b o he reference belef vecor of ll exsng rules n he uncon rulebse s compued s follows: www.cs.hes.org 170 P g e

(IJACSA) Inernonl Journl of Advnced Compuer Scence nd Applcons, CosSm b N s b, k B, k ( k, B ) 1 N s N s 2 2 b, k B, k k 1 k 1 Where CosSm ( b, B ) s he cosne smlry of hese wo vecors. If mxmum smlry of b o he exsng rules s less hn hershold Sm mn,.e. b consderbly dfferen wh ll reference belef vecors, so we consder b, Qc s new experence. In hs cse, he proposed mehod dds new rule o he uncon rule-bse, ccordng o b, Qc. The reference belef vecor of he new rule s se o b ( BnewRule b ) nd he consequen pr of he new rule s se o Q c ( QnewRule Q c ). I s noeworhy h f here s no rule n he uncon rule-bse, he sme procedure s done o dd he frs rule. Oherwse, f here s smlr reference belef vecor o b, he neres rule o b s deermned: w rg mx CosSm( b, B ) Where, w s he ndex of he mos smlr rule. Ech rule s denfed by vergng ll smlr experences h gens hve durng he lernng phse. Snce he number of hese experences s huge, we use recursve formul o clcule he men of group of smlr experences. For dusng R w ccordng o he experence b, Qc, he neceden of Rw s upded regrdng b by followng recursve equon [21]: B of w( new) ( k 1) Bw( k old) b Where B w ( old ) nd B w ( new ) re he reference belef vecors R w he consequen of Q w( new), before nd fer updng, respecvely. Smlrly, ( k 1) Qw( k R w s upded s follow: old) Q c B. Execuon Phse The genered sregy s performed n he execuon phse n whch uncon s lmed. In hs phse, he gens esme he benef of uncon nd f s recognzed benefcl, he gens shre her locl nformon. In ech me sep, he gens compue he benef of uncon ccordng o s belef vecor s follow: Assume he belef vecor of -h gen n me sep s b. Frng srengh of ll rules n uncon rule-bse re clculed usng: N S ( b ) n1 B, n, n Where s he frng srengh of he rule. These frng srenghs re hen used o clcule he benef of uncon: Q ( b ) N r 1 N r Q 1 Where N r s he number of rules n he uncon rulebse nd Q ( b ) denoes he benef of uncon from he perspecve of -h gen. Ths gen propges uncon reques f he esmed benef s more hn predefned hreshold C : Q ( b ) C The vlues of C depends on he chrcerscs of ech problem. In he rel-world problems, hs prmeer cn be se ccordng o he percenge of ccess o he uncon. Also, n n pplcon wh he uncon cos, hs prmeer cn be used o blnce he uncon coss wh he coordnon benefs. If uncon s vlble, ech gen propges s sequence of con-observon from prevous uncon, up o he curren me sep. By shrng hs nformon, he belef vecors of ll gens re equvlen nd hus he coordned behvours re gurneed. In he bsence of uncon, he gen pospones s reques unl he uncon s llowed. By usng hs sregy, he behvours of he gens mnn coordned wh lle uncon. IV. EXPERIMENTAL RESULTS We evlued our proposed lgorhm on severl benchmrk problems h hve been wdely used o re mulgen plnnng mehods. These problems re Brodcs Chnnel [3], Meeng n Grd 3 3 [4], Cooperve Box Pushng [5] nd Sochsc Mrs Rover [23]. We repored he ccumuled dscouned rewrd (Rewrd), percenge of uncon (Comm. (%)) nd he number of genered rules wh dfferen vlues of C. In he rel-world problems, C cn be se regrdng he moun of ccess o he uncon. Lower vlue of C ncreses he www.cs.hes.org 171 P g e

(IJACSA) Inernonl Journl of Advnced Compuer Scence nd Applcons, uncon usge. In n pplcon wh he uncon cos, C cn be used o blnce uncon coss wh coordnon benefs. The dscoun fcor s se o 0.9 nd he resuls re verges over 50 runs. To he bes of our knowledge, hs s he frs emp o fnd he uncon behvour n nfne-horzon Dec- POMDP problems. Therefore, we compre he performnce of our uncon sregy o he full-uncon (Full- Comm.) sregy s n upper bound nd he no-uncon (No-Comm.) sregy s lower bound. Snce n rel-world MAS problems he uncon s lmed, he mn purpose of he expermens s o es wheher our proposed uncon behvour cn help he gens o pproch he performnce of full-uncon, whle usng lle uncon. A. Brodcs Chnnel Problem In he Brodcs Chnnel problem wo gens re conneced n nework. In ech me sep, only one of hem cn use he connecon nd sends s messge. To vod collson, ech gen hs o decde wheher send messge or no. Ths problem hs 4 ses, 2 cons nd 5 observons. The resuls n Tble I show h Brodcs Chnnel problem s very smple such h he gens cn esly coopere. Therefore, he performnce of he vrous percenge of uncon s lmos he sme nd dfferen vlues of C hve no effec on he performnce. TABLE I. BROADCAST CHANNEL RESULTS Brodcs No. of C chnnel Rewrd Comm. (%) rules S =4 No-Comm. 9.1 0 - A =2 0.5 9.11 0.0 2 O =5 0.1 9.18 83.16 1.16 Full-Comm. 9.2 100 - B. Meeng n Grd Problem In Meeng n Grd problem, here re wo gens on 3 3 grd. They cn move up, down, lef or rgh, or sy on prevous squre. Ech gen cn sense wheher here re wlls round by nosy sensors wh 0.9 chnce o percevng he rgh observon. The gol of he gens s o spend s much me s possble on he sme squre. Ths problem hs 81 ses, 7 observons, 5 cons. The resuls n Tble II show h low percenge of uncon cnno sgnfcnly mprove ccumuled rewrd, however he performnce of fulluncon sregy cn be cheved by mkng uncon n lmos hlf of me seps. Snce he gens n Meeng n Grd problem need he fuure plnnng nformon o coopere, nd n our mehod, he conobservon sequence s rnsferred, he proposed uncon sregy cnno mnn he gens coordned for long me. Meeng n 3 3 Grd TABLE II. MEETING IN A 3 3 GRID RESULTS C Rewrd Comm. (%) S =81 No-Comm. 4.19 0 - No. of rules A =5 0.7 4.22 14.13 27.62 O =7 0.5 5.71 57.99 27.94 Full-Comm. 5.82 100 - C. Cooperve Box Pushng Problem In Cooperve Box Pushng problem, here re hree boxes (wo smll nd one lrge) on 3 4 grd nd wo gens h cn move he boxes. Ech gen cn push smll box lone. However, for movng he lrger box, he gens need o coopere. Whenever one of he boxes reches no gol re, rl ends. If s one of he smll boxes, he gens gn rewrd of +10, nd f he lrge box move no he gol re, hey ge rewrd of +100. However, f box smshes no wll or he lrge box s pushed by one gen, penly of -5 s receved. The Box Pushng problem hs 4 cons, 5 observons, 4 gol ses nd 96 non-gol ses (100 ses n ol). Accordng o he defnon of hs problem, uncon hs sgnfcn mpc on he performnce. The repored resuls n Tble III show he proposed uncon sregy dd sgnfcnly mprove he performnce wh low percenge of uncon. Whle he cheved ccumuled rewrd wh no uncon s 177.11, hs vlue cn be ncresed o 218.97 by uncng n only 6.13% of me seps. Also, he ccumuled rewrd hs reched 225.19 by uncng n one hrd of me seps wheres s 232.25 for he fulluncon cse. Fg. 3 demonsres he effec of dfferen vlues of prmeer C on he percenge of uncon nd he ccumuled rewrd n solvng Cooperve box pushng problem. In order o beer llusron of he performnce of our mehod, he vlues of he ccumuled rewrd re shown beween he cheved rewrd of he No-Comm. sregy s lower bound nd he Full-Comm. sregy s n upper bound. As sed before, he percenge of uncon nd ccumuled rewrd re ncresed by decresng C. Moreover, regrdng hese fgures s obvous h he ccumuled rewrd s sgnfcnly ncresed wh smll ncrese n percenge of uncon. Cooperve box pushng TABLE III. COOPERATIVE BOX PUSHING RESULTS C Rewrd Comm. (%) No. of rules S =100 No-Comm. 177.11 0 - A =4 30 198.63 1.82 26.34 O =5 20 218.97 6.13 26.34 10 225.19 33.86 26.56 Full-Comm. 232.25 100 - www.cs.hes.org 172 P g e

(IJACSA) Inernonl Journl of Advnced Compuer Scence nd Applcons, vlues of C n solvng Mrs rover problem n Fg. 4. Fg. 4() llusres he effec of C on he percenge of uncon nd Fg. 4(b) shows he effec of hs prmeer on he ccumuled rewrd. Agn, n Fg. 4(b), he vlues of ccumuled rewrd re shown beween he rewrd of he No- Comm. sregy nd he Full-Comm. sregy s he lower nd upper bound, respecvely. Fg. 4 clerly shows h wh smll ncrese n percenge of uncon, he ccumuled rewrd s sgnfcnly ncresed. () TABLE IV. MARS ROVER RESULTS Mrs Rover C Rewrd Comm. (%) No. of rules S =256 No-Comm. 23.55 0 - A =6 3 23.5 0.7 8.06 O =8 2 26.05 12.19 8.02 1 27.09 17.51 8.24 Full-Comm. 28.77 100 - (b) () Fg. 3. The Effec of C on () he Percenge of Communcon nd (b) he Accumuled Rewrd n Cooperve Box Pushng Problem. D. Mrs Rover Problem We evlue he performnce of our proposed mehod wh lrger problem, Mrs Rover problem. In Ths problem, here re wo rovers expermenng 2 2 grd by ndependenly drllng or smplng ech se or movng round. Two of he ses us need one gen o smple, whle n he oher ses, boh gens mus drll he sme me n order o ge he mxmum rewrd. The gens ge lrge penly, f se s drlled whle only needs o be smpled. When les one expermen s performed ech se, he problem s rese. Ths problem hs 256 ses, 6 cons nd 8 observons. As cn be seen from Tble IV, proposed uncon sregy dd very well for Mrs Rover problem s lrge MAS problem. The mehod cheves lmos he sme performnce s he cse of full-uncon by mkng uncon n less hn one ffh of me seps (17.51%). We hve lso demonsred he resuls of ccumuled rewrds nd he percenge of uncon wh dfferen (b) Fg. 4. The Effec of C on () he Percenge of Communcon nd (b) he Accumuled Rewrd n Mrs Rover Problem. www.cs.hes.org 173 P g e

(IJACSA) Inernonl Journl of Advnced Compuer Scence nd Applcons, To summrze, our proposed lgorhm o develop he uncon sregy, performed very well n ll he benchmrk problems. Usng hs sregy cn hevly reduce he moun of uncon necessry for successful coordned behvour. V. CONCLUSION We nroduced n lgorhm o develop uncon sregy for cooperve mul-gen sysems n whch he uncon s lmed. Ths sregy denfes bes suons for mkng uncon n MASs modelled by nfne-horzon Dec-POMDP. Ths uncon polcy s developed cenrlzed n rnng phse, whch he uncon s no resrced. The gens use hs polcy decenrlzed n es envronmen h he uncon chnnel s lmed. Our mehod generes fuzzy model o pproxme he benef of uncon for ech suon. The gens cn use hs fuzzy model o obn mnml uncon h s necessry for coordned behvor. We lso nroduced n ncremenl mehod o cree nd une hs fuzzy model. Our ncremenl mehod hs reduced he hgh compuonl complexy of he mul-gen sysems by consrucng compc fuzzy rule-bse. We used severl sndrd benchmrk problems o evlue he performnce of our proposed mehod. Expermenl resuls show h hs uncon sregy cn help he gens o cheve lmos he sme performnce s he full-uncon sregy by usng lle uncon. Therefore, n he rel-world MAS problems h he uncon s usully lmed, our proposed lgorhm cn hevly reduce he moun of uncon necessry for successful coordned behvour. Mny AI domns cn ke dvnge of MAS desgn such s mulple moble robos nd dsser response ems. Developng group of nellgen plyers or gens n vdeo gmes s noher neresng feld n AI reserch. In our fuure work, we nend o cusomze our ncremenl model o cree humn-lke plyers for rel-me sregy gmes who cn c nd rec nellgenly gns vrul envronmen nd even rel plyers. REFERENCES [1] E Wu, S. Zlbersen nd X. Chen, "Onlne Plnnng for Mul-Agen Sysems wh Bounded Communcon," Arfcl Inellgence, vol. 175, no. 2, p. 487 511, 2011. [2] D. S. Bernsen, R. Gvn, N. Immermn nd S. Zlbersen, "The complexy of decenrlzed conrol of Mrkov decson processes," n Mhemcs of Operons Reserch 27, 2002. [3] D.. S. Bernsen,. E.. A. Hnsen nd S. Zlbersen, "Bounded polcy eron for decenrlzed POMDPs," n Proceedngs of he 19h nernonl on conference on Arfcl nellgence, 2005. [4] C. Amo, J. S. Dbngoye nd S. Zlbersen, "Incremenl Polcy Generon for Fne-Horzon DEC-POMDPs," n Proceedngs of he 19h Inernonl Conference on Auomed Plnnng nd Schedulng, Thesslonk, Greece, 2009. [5] S. Seuken nd S. Zlbersen, "Improved Memory-Bounded Dynmc Progrmmng for Decenrlzed POMDPs," n Proceedngs of he 23rd Conference on Uncerny n Arfcl Inellgence (UAI), Vncouver, Brsh Columb, 2007. [6] M. Roh, R. Smmons nd M. Veloso, "Resonng bou on belefs for execuon-me uncon decsons," n AAMAS '05 Proceedngs of he fourh nernonl on conference on Auonomous gens nd mulgen sysems, 2005. [7] D. S. Bernsen, C. Amo, E. A. Hnsen nd S. Zlbersen, "Polcy Ieron for Decenrlzed Conrol of Mrkov Decson Processes," Journl of AI Reserch (JAIR), vol. 34, pp. 89-132, 2009. [8] C. Amo, D. S. Bernsen, nd S. Zlbersen, "Opmzng fxed-sze sochsc conrollers for POMDPs nd decenrlzed POMDPs," Journl of Auonomous Agens nd Mul-Agen Sysems (JAAMAS), vol. 21, no. 3, p. 293 320, 2010. [9] J. K. Prnen nd J. Pelonen, "Perodc Fne Se Conrollers for Effcen POMDP nd DEC-POMDP Plnnng," n he 25h Annul Conference on Neurl Informon Processng Sysems (NIPS 2011), 2011. [10] A. Kumr nd S. Zlbersen, "Anyme Plnnng for Decenrlzed POMDPs usng Expecon Mxmzon," n Proceedngs of he 26h Conference on Uncerny n Arfcl Inellgence (UAI), Cln Islnd, Clforn, 2010. [11] R. Shrm nd M. T. J. Spn, "Byesn-Gme-Bsed Fuzzy Renforcemen Lernng Conrol for Decenrlzed POMDPs," IEEE Trnscons on Compuonl Inellgence nd AI n Gmes, vol. 4, no. 4, pp. 309-328, 2012. [12] S. Hmzeloo nd M. Zolghdr Jhrom, "An ncremenl fuzzy conroller for lrge dec-pomdps," n Arfcl Inellgence nd Sgnl Processng Conference (AISP), Shrz, Irn, 2017. [13] F. A. Olehoek nd C. Amo, A Concse Inroducon o Decenrlzed POMDPs, Sprnger Inernonl Publshng, 2016. [14] H. Kurnw, D. Hsu nd W. S. Lee, "Effcen pon-bsed POMDP plnnng by pproxmng opmlly rechble belef spces," n In Proc. Robocs: Scence nd Sysems, 2008. [15] R. Emery-Monemerlo, Gme-heorec conrol for robo ems, Docorl Dsseron, Robocs Insue, Crnege Mellon Unversy, Augus 2005. [16] S. A. Wllmson, E. H. Gerdng nd N. R. Jennngs, "Rewrd shpng for vlung uncons durng mul-gen coordnon," n AAMAS '09 Proceedngs of The 8h Inernonl Conference on Auonomous Agens nd Mulgen Sysems - Volume 1, Budpes, Hungry, My 10-15, 2009. [17] P. P. Angelov nd X. Zhou, "Evolvng Fuzzy-Rule-Bsed Clssfers From D Srems," IEEE Trnscons on Fuzzy Sysems, vol. 16, no. 6, 2008. [18] S. Schlebs nd N. Ksbov, "Evolvng spkng neurl nework survey," Evolvng Sysems, vol. 4, no. 2, p. 7 98, 2013. [19] H. Shhprs, S. Hmzeloo nd M. Zolghdr Jhrom, "A Self-Tunng Fuzzy Rule-Bsed Clssfer for D Srems," Inernonl Journl of Uncerny, Fuzzness nd Knowledge-Bsed Sysems, vol. 22, no. 2, 2014. [20] H. Shhprs nd E. G. Mnsoor, "An onlne fuzzy model for clssfcon of d srems wh drf," n Arfcl Inellgence nd Sgnl Processng Conference (AISP), Shrz, Irn, 25-27 Oc. 2017. [21] D. Kngn, P. Angelov nd J. A. Igless, "Auonomously evolvng clssfer TEDAClss," Informon Scences, vol. 366, p. 1 11, 2016. [22] D. Kngn nd P. Angelov, "Evolvng cluserng, clssfcon nd regresson wh TEDA," n Inernonl Jon Conference on Neurl Neworks (IJCNN), 2015. [23] C. Amo nd S. Zlbersen, "Achevng gols n decenrlzed POMDPs," n Proceedngs of The 8h Inernonl Conference on Auonomous Agens nd Mulgen Sysems, 2009. www.cs.hes.org 174 P g e