A Novel Curiosity-Driven Perception-Action Cognitive Model

Inernaonal Conference on Arfcal Inellgence: Technologes and Applcaons (ICAITA 6) A Novel Curosy-Drven Percepon-Acon Cognve Model Jng Chen* Bng L and L L School of Inforaon Technology Engneerng Tanjn Unversy of Technology and Educaon Tanjn Chna *Correspondng auhor adapve curosy fraework as he nrnsc ovaon of he arge exploraon echans[8]. Absrac Ang a agens' auonoous cognve probles n unknown envronen a novel curosy-drven perceponacon cognve odel s proposed whch sulaes nrnsc ovaon cognve echans based on he curosy n psychology and cognve processes fro percepon o acon s realzed by probablsc acon selecon echans. Inforaon enropy llusraes ha can acheve beer cognon o acon applyng he proposed odel and ndeed reflecs he bologcal cognve processes. Coparng sulaon resuls usng SPE shows ha hs ehod s effecve. Self-learnng s a an ovaon of he behavor akng and he exploraon sraegy n he nrnsc ovaon plays an poran role n he process of cognon. As one of he an facors ha ncrease he nrnsc ovaon curosy has played an poran role n he learnng process of "perceponacon" loop. Curosy anly ncludes sensory curosy(sc) and cognve curosy(cc). A new odel naed as curosydrven percepon-acon cognve odel (CPACM) s proposed n hs paper whch apply he sensory curosy (curosy cognve) n he learnng of he percepon-acon loop. Usng he classcal sknner pgeon experens he effecveness s verfed n he sulaon envronen of MATLAB. Keywords-percepon-acon; cognve odel; sknner pgeon; curosy I. INTRODUCTION Wh he sudy of cognve robos "cognve conrol" has gradually replaced he "nellgen conrol"[] whch s anly used o solve he conrol proble by sulang he bologcal cognon echans. Accordng o he Page's cognve developen heory sensoroor s he prary sage of cognve developen. Huan's sensoroor (also can be called Percepon-Acon) funcon s he resul of cognve developen whch s gradually fored developed and perfeced. In Wu Xuan and Xaogang Ruan presen a psychology odel called sknner auoaa whch pleened he heory of operan condonng reflex(ocr) and s used n equlbru learnng of self-balancng robo and showed he psychology odel effecveness n he ncreenal learnng process[]. In 5 Huang Jng pu forward a knd of arfcal sensoroor syse wh OCR funcon and realzed he cognve process of he appng relaonshp beween he sae and acon and ade a coparson verfy usng wo experen fro psychology and cybernecs feld[4]. II. A. Srucure Desgn On he bass of operan condonng cognve odels wh probablsc behavor akng n he foraon process of "percepon-acon" appng he proposed odel ncrease he curosy-drven odule and he cognve srucure s shown n Fgure I. In Cusurds e al proposed a cognve conrol archecure for he percepon acon cycle n robos and agens whch s coposed of a large nuber of neural copung echans and hs vew s srongly suppored by he evdence of bran experenal research[5]. In he nrnsc ovaon research fraework n Baranes e al. proposed he adapve arge generaon algorh based on curosy as he echans of nrnsc ovaon whch s used n he auonoous learnng of robo[6]. In aconpercepon loop based on nernal odel s used n he research of exploraon echans n he process of cognon of agens whou he exernal reward feedback[7]. Baranes and Oudeyer proposed an adapve arge generaon - robus nellgen 6. The auhors - Publshed by Alans Press CURIOSITY-DRIVEN PERCEPTION-ACTION COGNITIVE MODEL FIGURE I. COGNITIVE STRUCTURE OF CPACM B. Maheacal Descrpon The proposed curosy-drven percepon-acon cognve odel can be descrbed as a 9-uplpe copuaonal odel CPACM S A f r V ( S A ) C ( S ) P( ) L. Every eleens are explaned as follows. 9

S s he nernal sae se of CPACM S { s L n} ; A s he oponal acon se of CPACM A{ a L } ; f s he sae ranson of CPACM f : S( ) a( ) S( ) whch s alos deerned by envronen or syse odel; s he orenaon echans () ( S()) whch denoes he sae orenaon a e and can be defned based on he specfc case and he bgger of he value he beer of he syse; r : r ( ) rs [ ( ) A ( )] s he reward fro sae S ( ) o S ( ) afer akng he acon A ( ) a e ; V( S A ) s he predcon of CPACM V( S A ) [ v v L v ] [ vsa ( ) vsa ( ) L vsa ( )] s a vecor; CS ( ) s he curosy funcon whch denoes he sae curosy a e. I s a onoone decreasng funcon wh respec o e. In he condon of he sae sae curosy s declned over e n lne wh he characerscs of bologcal learnng; P () s he probably vecor fro he condon of he sae o oponal behavor for he CPACM P () [ pa ( ) pa ( ) L pa ( ) S] [ pa S() pa S() L pa ()] S acon selecon probably V( S aj )/ C( S) V( S a)/ C( S) pa () ( ) j S p a aj S e e aa denoes ha agen choose he acon a j A as p( aj ) P n he condon of sae S a e and pa ( s ) pa ( s ) ; L denoes he updang of "Percepon-Acon" appng of CPACM L: PA( ) PA( ). Ths updang echans s pleened hrough changng he predcon crc nework whch use he TD() ehod o updae he wegh of V( S A ) of CPACM ha s W () TD() VINW ( ) W where TD() r ( ) V ( ) V ( ) and hen he acon selecon probably s changed. The cognon process could be concluded as follows. A e he sae of agen s S () s S and based on he nal predcon crc funcon V( S a ) curosy paraeer and P vecor he behavor selecon probably of each acon could be decded and hen selec he acon a k A as probably o ac on he envronen and sae s ransfored o S ( ) S and hen ge he edae crc nforaon r and hen V( S A ) s updaed based on TD() algorh o forulae new predcon esaon value and updae he curosy value a hs sae and he ge new probably vecor P and he frs loop of cognon s copleened. Loopng as he above process he "percepon-acon" loop based on curosy-drven nrnsc ovaon s fored and he agen learns he behavor selecon echans so he cognon process s over. In order o characerze he cerany degree of he syse we use nforaon enropy o easure and condonal enropy s used here o descrbe he behavor nforaon enropy n a defned sae (n b) whch s defned as follows. H ( S ) p ( )log ( ) a S pa S where p () represens probably of selecon behavor a a S n he condon of sae S a e and sasfes pa () S whch denoes ha he bgger of he nforaon enropy he hgher of he syse's uncerany degree. The proposed cognve odel express he process fro rando o cerany and reflec he gradual learnng characerscs slar o bology. III. EXPERIMENTAL SIMULATION AND ANALYSIS In order o verfy he proposed curosy-drven "percepon- Acon" cognve odel's behavoral learnng ably we ake a sulaon research usng ypcal experen(sknner pgeon experen) n behavoral learnng heory and ake a experenal coparave analyss wh he operan condonng odels o llusrae he effecveness of he proposed ehod. A. Sknner Pgeon Experen Sknner Pgeon Experen(SPE) whch has been descrbed n leraure [] n deal s a classc experen n behavoral learnng heory. In hs experen he pgeon s placed n a desgned box naed sknner box. In he face of hree dfferen colored buons ha s red yellow and blue color he pgeon akes an acon selecon of peckng a he hree buons. When pecks a dfferen buon wll ge dfferen response. If peckng a red buon pgeon wll ge foo for reward and peckng a yellow buon does no respond and peckng a blue buon pgeon wll ge elecrc shock as punshen. A he begnnng nuber of pgeon peckng he hree buons s alos equal. Bu afer a perod of cognve he nuber of pgeon peckng a he red buon s sgnfcanly ncreased and he pgeon learned he behavor of peckng a he red buon auonoously o ge food reward hs behavor. In order o faclae sulaon of he experen he sae and behavor was coded and he splfed dscree aheacal odel s esablshed. The sae ranson dagra s shown n Fgure II n whch he crcle denoe he sae n whch and arrows ndcae he ranson beween wo saes and he value above he arrow ndcaes he acon aken. 4

FIGURE II. STATE TRANSITION DIAGRAM OF SPE In Fgure s s he sae of pan; s s he sae of desre o oban food; s s he sae of sasfed. The bgger he sae value he beer he pgeon's sae. The orenaon of he pgeon s sasfyng self need n he axu exend and he beer he sae he bgger he orenaon value. The hree acon of pgeon s a a a whch respecvely denoes peckng a red yellow and blue buon. To express predcon funcon V a recurren neural r nework s used and he npu vecor s IN R oupu s Y R hdden nodes nuber s h he npu vecor of hdden layer s T h O() o() o() L oh () R whch could be seen as he excaon funcon of nework nernal sae. The oupu vecor of hdden layer s T h H() h() h() L hh () R whch denoes he nernal sae se of he nework. The weghs of hs recurren () hr () hh () h neural nework s W R W R W R and he acvaon funcon of oupu layer s a lnear weghng funcon as follows. () () () Y W H The nernal sae ranson funcon s () () T H( ) g( IN( ) H( ) W W )=[ h( ) h( ) L hh ( )] and he excaon funcon of nernal sae () T () T s O() ( W ) H ( ) ( W ) IN() where hj( ) ( exp( oj( )))( j L h).the percepon sae and acon n SPE are all one denson varable so T IN [ x a] R. In order o express sae curosy of pgeon we ake a sascs of he sae sae n he learnng process denong as Nu( S ) whch s a onoone decreasng funcon wh respec o e shown as forula (). CS ( ) k k Nu( S ) e where k k s he adjusen facor of curosy paraeer sasfy he followng condon. When Nu( S ) CS ( ) ha ees he characersc of bologcal cognon. Defne he reward echans based on he orenaon as () S() and f ( ) ( ) r. If ( ) ( ) r. For he SPE he cognve flow based on he proposed odel s shown as Fgure III where sepax s he ax run sep durng sulaon. The nal sae of pgeon s he sae of desre o oban food. sepax r h s W k k A Nu( S) V () = VINW ( ) CS ( ) Nu( S) Nu( S) sep ax FIGURE III. COGNITIVE FLOW BASED ON THE PROPOSED CPACM MODEL. B. Sulaon Resuls and Analyss Usng he CPACM cognve odels proposed n hs paper and OCR odel cognve process of SPE were sulaed. The saplng e s s and we record he daa of he hree acon's selecon probably sknner pgeon's sae and acon aken a each second. Coparson of he resuls s shown n Fgure IV. probably of a a a.9.8.7.6.4... 5 5 /s a) acon selecon probably changng usng CPACM odel wh curosy pr py pb 4

probably of a a a.9.8.7.6.4... 5 5 b) acon selecon probably changng usng OCRM odel whou curosy Sknner Pgeon Sae Sknner Pgeon Acon.5.5 /s 4 6 8 4 6 8 /s.5.5 pr py pb :pan;:desre o oban food;:sasfed :Red;:Yellow;:Blue 4 6 8 4 6 8 /s c) sknner pgeon's sae and acon aken changng usng CPACM odel wh curosy Sknner Pgeon Sae Sknner Pgeon Acon.5.5 5 5 /s.5.5 :pan;:desre o oban food;:sasfed :Red;:Yellow;:Blue 5 5 /s d) sknner pgeon's sae and acon aken changng usng OCRM odel whou curosy FIGURE IV. SKINNER PIGEON COGNITION RESULTS COMPARISON OF CPACM AND OCRM presence of curosy n CPACM n he nal oens pgeon has a ceran curosy a hree saes so ha a ceran degree of behavor selec probably flucuaon of sknner pgeon occur copared wh OCRM ehod. By observng Fgure IV c) and d) we wll fnd ha alhough he probably of CPACM ehod appears shock bu on he lae sage sknner pgeon behavor and sae have reached a ore deerned sae. Bu n he sudden sall probably evens of OCRM ehod behavoral choces cerany decreased. Fro Fgure IV d) would be observed ha here s also he cases of choosng yellow or blue buon ha s because n he early sages of learnng curosy does no work lack of cognon under ceran saes and behavor cognon process s affeced. Fgure V shows a varaon of he syse behavor nforaon enropy. By enropy's rend ends o we can conclude ha he cognon of he syse s a evoluon process fro uncerany o cerany. IV. CONCLUSION In hs paper ang a agens' auonoous cognve probles n unknown envronen a novel curosy-drven percepon-acon cognve odel s proposed whch sulaes nrnsc ovaon cognve echans based on he curosy n psychology and cognve processes fro percepon o acon s realzed by probablsc acon selecon echans. In order o verfy he proposed curosy-drven "percepon-acon" cognve odel's behavoral learnng ably we ake a sulaon research usng ypcal SPE n behavoral learnng heory and ake a experenal coparave analyss wh he OCRM o llusrae he effecveness of he proposed ehod. Inforaon enropy llusraes ha applyng he proposed odel can acheve beer cognon and ndeed reflecs he bologcal cognve processes. Coparng sulaon resuls show ha hs ehod s effecve. ACKNOWLEDGMENT We acknowledge suppor fro Naonal Naural Scence Foundaon of Chna (No. 648) Tanjn Cy Hgh School Scence & Technology Fund Plannng Projec(No.87) Tanjn Unversy of Technology and Educaon Projec (No. KJY No.KYQD4) and hank for he revewers' helpful coens o prove he qualy of hs paper. Acon Inforaon Enropy.5 5 5 /s FIGURE V. BEHAVIOR INFORMATION ENTROPY CURVE OF SKINNER PIGEON. As can be seen fro Fgure IV a) and b) probably of sknner pgeon peckng a red buon are gradually changng fro. o e pgeon selecs he behavor of peckng a he red buon wh probably evenually. Due o he REFERENCES [] Bellas F Caaaño P Faña A e al. Dynac learnng n cognve robocs hrough a procedural long er eory[j]. Evolvng Syses 4 5():49-6. [] Wu X Ruan X G Zhang X e al. Sudy of Sknner Auoaon Ipleened on a Two-Wheeled Robo[C]. rd Inernaonal Conference on Elecrc and Elecroncs : 5-56. [] RUAN XaoGangWU Xuan.The sknner auoaon: A psychologcal odel foralzng he heory of operan condonng[j].scence Chna(Technologcal Scences):745-76. [4] HUANG Jng RUAN Xao-gang YU Na-gong ZHANG Xao-png WEI Ruo-yan FAN Qng-wu. Arfcal sensoroor syse wh operan condonng funcon[j]. Conrol heory and applcaon 5(5):674-68. 4

[5] Cusurds V Taylor J G. A cognve conrol archecure for he percepon acon cycle n robos and agens[j]. Cognve Copuaon 5(): 8-95. [6] Baranes A Oudeyer P Y. Inrnscally ovaed goal exploraon for acve oor learnng n robos: A case sudy. IEEE/RSJ Inernaonal Conference on Inellgen Robos and Syses (IROS) 766-77. [7] Lle D. Y. Soer F. T. Learnng and exploraon n aconpercepon loops[j]. Froners n neural crcus 7: 7-7. [8] Baranes A. Oudeyer P-Y. Acve Learnng of Inverse Models wh Inrnscally Movaed Goal Exploraon n Robos[J] Robocs and Auonoous Syses 6():49-7. 4