A Novel Curiosity-Driven Perception-Action Cognitive Model

Similar documents
Learning Objectives. Self Organization Map. Hamming Distance(1/5) Introduction. Hamming Distance(3/5) Hamming Distance(2/5) 15/04/2015

Attribute Reduction Algorithm Based on Discernibility Matrix with Algebraic Method GAO Jing1,a, Ma Hui1, Han Zhidong2,b

A Modified Genetic Algorithm Comparable to Quantum GA

Fourier Analysis Models and Their Application to River Flows Prediction

THEORETICAL AUTOCORRELATIONS. ) if often denoted by γ. Note that

Variants of Pegasos. December 11, 2009

A DECOMPOSITION METHOD FOR SOLVING DIFFUSION EQUATIONS VIA LOCAL FRACTIONAL TIME DERIVATIVE

Normal Random Variable and its discriminant functions

Response of MDOF systems

Transmit Waveform Selection for Polarimetric MIMO Radar Based on Mutual Information Criterion

V.Abramov - FURTHER ANALYSIS OF CONFIDENCE INTERVALS FOR LARGE CLIENT/SERVER COMPUTER NETWORKS

Long Term Power Load Combination Forecasting Based on Chaos-Fractal Theory in Beijing

An Adaptive Fuzzy Control Method for Spacecrafts Based on T-S Model

Target Detection Algorithm Based on the Movement of Codebook Model

Influence of Probability of Variation Operator on the Performance of Quantum-Inspired Evolutionary Algorithm for 0/1 Knapsack Problem

A TWO-LEVEL LOAN PORTFOLIO OPTIMIZATION PROBLEM

On One Analytic Method of. Constructing Program Controls

Let s treat the problem of the response of a system to an applied external force. Again,

Cointegration Analysis of Government R&D Investment and Economic Growth in China

The Analysis of the Thickness-predictive Model Based on the SVM Xiu-ming Zhao1,a,Yan Wang2,band Zhimin Bi3,c

Robust and Accurate Cancer Classification with Gene Expression Profiling

Research on Soft Sensing Modeling Method Based on the Algorithm of Adaptive Affinity Propagation Clustering and Bayesian Theory

Chapter Lagrangian Interpolation

Neural Networks-Based Time Series Prediction Using Long and Short Term Dependence in the Learning Process

Ordinary Differential Equations in Neuroscience with Matlab examples. Aim 1- Gain understanding of how to set up and solve ODE s

MULTI-CRITERIA DECISION-MAKING BASED ON COMBINED VAGUE SETS IN ELECTRICAL OUTAGES PROBLEMS

Outline. Probabilistic Model Learning. Probabilistic Model Learning. Probabilistic Model for Time-series Data: Hidden Markov Model

Existence and Uniqueness Results for Random Impulsive Integro-Differential Equation

SOME NOISELESS CODING THEOREMS OF INACCURACY MEASURE OF ORDER α AND TYPE β

TSS = SST + SSE An orthogonal partition of the total SS

New conditioning model for robots

Learning for Cognitive Wireless Users

Sklar: Sections (4.4.2 is not covered).

WiH Wei He

5th International Conference on Advanced Design and Manufacturing Engineering (ICADME 2015)

Fall 2010 Graduate Course on Dynamic Learning

Li An-Ping. Beijing , P.R.China

HEAT CONDUCTION PROBLEM IN A TWO-LAYERED HOLLOW CYLINDER BY USING THE GREEN S FUNCTION METHOD

Development of a fuzzy logic based software for automation of a single pool irrigation canal

2/20/2013. EE 101 Midterm 2 Review

Capacity of TWSC Intersection with Multilane Approaches

10. A.C CIRCUITS. Theoretically current grows to maximum value after infinite time. But practically it grows to maximum after 5τ. Decay of current :

Application of Gray Analysis Method in Friction Coefficient Assessment for Extended Reach Well

Solution in semi infinite diffusion couples (error function analysis)

CHAPTER II AC POWER CALCULATIONS

Constant-stress accelerated life test of white organic light-emitting diode based on least square method under Weibull distribution

Modeling of Combined Deterioration of Concrete Structures by Competing Hazard Model

THE PUBLISHING HOUSE PROCEEDINGS OF THE ROMANIAN ACADEMY, Series A, OF THE ROMANIAN ACADEMY Volume 9, Number 1/2008, pp

Graduate Macroeconomics 2 Problem set 5. - Solutions

GENERATING CERTAIN QUINTIC IRREDUCIBLE POLYNOMIALS OVER FINITE FIELDS. Youngwoo Ahn and Kitae Kim

Short-Term Load Forecasting Using PSO-Based Phase Space Neural Networks

( ) () we define the interaction representation by the unitary transformation () = ()

GMM parameter estimation. Xiaoye Lu CMPS290c Final Project

Lecture 11 SVM cont

New M-Estimator Objective Function. in Simultaneous Equations Model. (A Comparative Study)

3D Human Pose Estimation from a Monocular Image Using Model Fitting in Eigenspaces

Notes on the stability of dynamic systems and the use of Eigen Values.

Chapter 6 DETECTION AND ESTIMATION: Model of digital communication system. Fundamental issues in digital communications are

Assessing Customer Equity using Interval Estimation Method and Markov Chains (Case: An Internet Service Provider)

Cubic Bezier Homotopy Function for Solving Exponential Equations

Discrete Markov Process. Introduction. Example: Balls and Urns. Stochastic Automaton. INTRODUCTION TO Machine Learning 3rd Edition

Modeling Multiple Subskills by Extending Knowledge Tracing Model Using Logistic Regression

Application of thermal error in machine tools based on Dynamic. Bayesian Network

Robustness Experiments with Two Variance Components

Online Supplement for Dynamic Multi-Technology. Production-Inventory Problem with Emissions Trading

John Geweke a and Gianni Amisano b a Departments of Economics and Statistics, University of Iowa, USA b European Central Bank, Frankfurt, Germany

Algorithm to identify axle weights for an innovative BWIM system- Part I

A NEW TECHNIQUE FOR SOLVING THE 1-D BURGERS EQUATION

Gravitational Search Algorithm for Optimal Economic Dispatch R.K.Swain a*, N.C.Sahu b, P.K.Hota c

Comparison of Differences between Power Means 1

Linear Response Theory: The connection between QFT and experiments

Approximate Analytic Solution of (2+1) - Dimensional Zakharov-Kuznetsov(Zk) Equations Using Homotopy

Static Output-Feedback Simultaneous Stabilization of Interval Time-Delay Systems

Research Article Adaptive Synchronization of Complex Dynamical Networks with State Predictor

Anisotropic Behaviors and Its Application on Sheet Metal Stamping Processes

Dynamic Team Decision Theory. EECS 558 Project Shrutivandana Sharma and David Shuman December 10, 2005

AT&T Labs Research, Shannon Laboratory, 180 Park Avenue, Room A279, Florham Park, NJ , USA

Optimal environmental charges under imperfect compliance

Opening Shock and Shape of the Drag-vs-Time Curve

Theoretical Analysis of Biogeography Based Optimization Aijun ZHU1,2,3 a, Cong HU1,3, Chuanpei XU1,3, Zhi Li1,3

Complex Dynamics Analysis for Cournot Game with Bounded Rationality in Power Market

Transient Response in Electric Circuits

THE PREDICTION OF COMPETITIVE ENVIRONMENT IN BUSINESS

Lagrangian support vector regression based image watermarking in wavelet domain

Multi-Objective Control and Clustering Synchronization in Chaotic Connected Complex Networks*

The Role of Random Spikes and Concurrent Input Layers in Spiking Neural Networks

CHANGE DETECTION BY FUSING ADVANTAGES OF THRESHOLD AND CLUSTERING METHODS

Introduction ( Week 1-2) Course introduction A brief introduction to molecular biology A brief introduction to sequence comparison Part I: Algorithms

Anti-Islanding Protection Using Histogram Analysis in Self Excited Generations Wind Turbines

ACCEPTED VERSION. IWA Publishing 2006

This document is downloaded from DR-NTU, Nanyang Technological University Library, Singapore.

Homework 8: Rigid Body Dynamics Due Friday April 21, 2017

A New Generalized Gronwall-Bellman Type Inequality

Learning Goal Seeking and Obstacle Avoidance using the FQL Algorithm

The Comparison of Spline Estimators in the Smoothing Spline Nonparametric Regression Model Based on Weighted...

Lecture Slides for INTRODUCTION TO. Machine Learning. ETHEM ALPAYDIN The MIT Press,

Chapter 6: AC Circuits

MANY real-world applications (e.g. production

Performance Analysis for a Network having Standby Redundant Unit with Waiting in Repair

FI 3103 Quantum Physics

Transcription:

Inernaonal Conference on Arfcal Inellgence: Technologes and Applcaons (ICAITA 6) A Novel Curosy-Drven Percepon-Acon Cognve Model Jng Chen* Bng L and L L School of Inforaon Technology Engneerng Tanjn Unversy of Technology and Educaon Tanjn Chna *Correspondng auhor adapve curosy fraework as he nrnsc ovaon of he arge exploraon echans[8]. Absrac Ang a agens' auonoous cognve probles n unknown envronen a novel curosy-drven perceponacon cognve odel s proposed whch sulaes nrnsc ovaon cognve echans based on he curosy n psychology and cognve processes fro percepon o acon s realzed by probablsc acon selecon echans. Inforaon enropy llusraes ha can acheve beer cognon o acon applyng he proposed odel and ndeed reflecs he bologcal cognve processes. Coparng sulaon resuls usng SPE shows ha hs ehod s effecve. Self-learnng s a an ovaon of he behavor akng and he exploraon sraegy n he nrnsc ovaon plays an poran role n he process of cognon. As one of he an facors ha ncrease he nrnsc ovaon curosy has played an poran role n he learnng process of "perceponacon" loop. Curosy anly ncludes sensory curosy(sc) and cognve curosy(cc). A new odel naed as curosydrven percepon-acon cognve odel (CPACM) s proposed n hs paper whch apply he sensory curosy (curosy cognve) n he learnng of he percepon-acon loop. Usng he classcal sknner pgeon experens he effecveness s verfed n he sulaon envronen of MATLAB. Keywords-percepon-acon; cognve odel; sknner pgeon; curosy I. INTRODUCTION Wh he sudy of cognve robos "cognve conrol" has gradually replaced he "nellgen conrol"[] whch s anly used o solve he conrol proble by sulang he bologcal cognon echans. Accordng o he Page's cognve developen heory sensoroor s he prary sage of cognve developen. Huan's sensoroor (also can be called Percepon-Acon) funcon s he resul of cognve developen whch s gradually fored developed and perfeced. In Wu Xuan and Xaogang Ruan presen a psychology odel called sknner auoaa whch pleened he heory of operan condonng reflex(ocr) and s used n equlbru learnng of self-balancng robo and showed he psychology odel effecveness n he ncreenal learnng process[]. In 5 Huang Jng pu forward a knd of arfcal sensoroor syse wh OCR funcon and realzed he cognve process of he appng relaonshp beween he sae and acon and ade a coparson verfy usng wo experen fro psychology and cybernecs feld[4]. II. A. Srucure Desgn On he bass of operan condonng cognve odels wh probablsc behavor akng n he foraon process of "percepon-acon" appng he proposed odel ncrease he curosy-drven odule and he cognve srucure s shown n Fgure I. In Cusurds e al proposed a cognve conrol archecure for he percepon acon cycle n robos and agens whch s coposed of a large nuber of neural copung echans and hs vew s srongly suppored by he evdence of bran experenal research[5]. In he nrnsc ovaon research fraework n Baranes e al. proposed he adapve arge generaon algorh based on curosy as he echans of nrnsc ovaon whch s used n he auonoous learnng of robo[6]. In aconpercepon loop based on nernal odel s used n he research of exploraon echans n he process of cognon of agens whou he exernal reward feedback[7]. Baranes and Oudeyer proposed an adapve arge generaon - robus nellgen 6. The auhors - Publshed by Alans Press CURIOSITY-DRIVEN PERCEPTION-ACTION COGNITIVE MODEL FIGURE I. COGNITIVE STRUCTURE OF CPACM B. Maheacal Descrpon The proposed curosy-drven percepon-acon cognve odel can be descrbed as a 9-uplpe copuaonal odel CPACM S A f r V ( S A ) C ( S ) P( ) L. Every eleens are explaned as follows. 9

S s he nernal sae se of CPACM S { s L n} ; A s he oponal acon se of CPACM A{ a L } ; f s he sae ranson of CPACM f : S( ) a( ) S( ) whch s alos deerned by envronen or syse odel; s he orenaon echans () ( S()) whch denoes he sae orenaon a e and can be defned based on he specfc case and he bgger of he value he beer of he syse; r : r ( ) rs [ ( ) A ( )] s he reward fro sae S ( ) o S ( ) afer akng he acon A ( ) a e ; V( S A ) s he predcon of CPACM V( S A ) [ v v L v ] [ vsa ( ) vsa ( ) L vsa ( )] s a vecor; CS ( ) s he curosy funcon whch denoes he sae curosy a e. I s a onoone decreasng funcon wh respec o e. In he condon of he sae sae curosy s declned over e n lne wh he characerscs of bologcal learnng; P () s he probably vecor fro he condon of he sae o oponal behavor for he CPACM P () [ pa ( ) pa ( ) L pa ( ) S] [ pa S() pa S() L pa ()] S acon selecon probably V( S aj )/ C( S) V( S a)/ C( S) pa () ( ) j S p a aj S e e aa denoes ha agen choose he acon a j A as p( aj ) P n he condon of sae S a e and pa ( s ) pa ( s ) ; L denoes he updang of "Percepon-Acon" appng of CPACM L: PA( ) PA( ). Ths updang echans s pleened hrough changng he predcon crc nework whch use he TD() ehod o updae he wegh of V( S A ) of CPACM ha s W () TD() VINW ( ) W where TD() r ( ) V ( ) V ( ) and hen he acon selecon probably s changed. The cognon process could be concluded as follows. A e he sae of agen s S () s S and based on he nal predcon crc funcon V( S a ) curosy paraeer and P vecor he behavor selecon probably of each acon could be decded and hen selec he acon a k A as probably o ac on he envronen and sae s ransfored o S ( ) S and hen ge he edae crc nforaon r and hen V( S A ) s updaed based on TD() algorh o forulae new predcon esaon value and updae he curosy value a hs sae and he ge new probably vecor P and he frs loop of cognon s copleened. Loopng as he above process he "percepon-acon" loop based on curosy-drven nrnsc ovaon s fored and he agen learns he behavor selecon echans so he cognon process s over. In order o characerze he cerany degree of he syse we use nforaon enropy o easure and condonal enropy s used here o descrbe he behavor nforaon enropy n a defned sae (n b) whch s defned as follows. H ( S ) p ( )log ( ) a S pa S where p () represens probably of selecon behavor a a S n he condon of sae S a e and sasfes pa () S whch denoes ha he bgger of he nforaon enropy he hgher of he syse's uncerany degree. The proposed cognve odel express he process fro rando o cerany and reflec he gradual learnng characerscs slar o bology. III. EXPERIMENTAL SIMULATION AND ANALYSIS In order o verfy he proposed curosy-drven "percepon- Acon" cognve odel's behavoral learnng ably we ake a sulaon research usng ypcal experen(sknner pgeon experen) n behavoral learnng heory and ake a experenal coparave analyss wh he operan condonng odels o llusrae he effecveness of he proposed ehod. A. Sknner Pgeon Experen Sknner Pgeon Experen(SPE) whch has been descrbed n leraure [] n deal s a classc experen n behavoral learnng heory. In hs experen he pgeon s placed n a desgned box naed sknner box. In he face of hree dfferen colored buons ha s red yellow and blue color he pgeon akes an acon selecon of peckng a he hree buons. When pecks a dfferen buon wll ge dfferen response. If peckng a red buon pgeon wll ge foo for reward and peckng a yellow buon does no respond and peckng a blue buon pgeon wll ge elecrc shock as punshen. A he begnnng nuber of pgeon peckng he hree buons s alos equal. Bu afer a perod of cognve he nuber of pgeon peckng a he red buon s sgnfcanly ncreased and he pgeon learned he behavor of peckng a he red buon auonoously o ge food reward hs behavor. In order o faclae sulaon of he experen he sae and behavor was coded and he splfed dscree aheacal odel s esablshed. The sae ranson dagra s shown n Fgure II n whch he crcle denoe he sae n whch and arrows ndcae he ranson beween wo saes and he value above he arrow ndcaes he acon aken. 4

FIGURE II. STATE TRANSITION DIAGRAM OF SPE In Fgure s s he sae of pan; s s he sae of desre o oban food; s s he sae of sasfed. The bgger he sae value he beer he pgeon's sae. The orenaon of he pgeon s sasfyng self need n he axu exend and he beer he sae he bgger he orenaon value. The hree acon of pgeon s a a a whch respecvely denoes peckng a red yellow and blue buon. To express predcon funcon V a recurren neural r nework s used and he npu vecor s IN R oupu s Y R hdden nodes nuber s h he npu vecor of hdden layer s T h O() o() o() L oh () R whch could be seen as he excaon funcon of nework nernal sae. The oupu vecor of hdden layer s T h H() h() h() L hh () R whch denoes he nernal sae se of he nework. The weghs of hs recurren () hr () hh () h neural nework s W R W R W R and he acvaon funcon of oupu layer s a lnear weghng funcon as follows. () () () Y W H The nernal sae ranson funcon s () () T H( ) g( IN( ) H( ) W W )=[ h( ) h( ) L hh ( )] and he excaon funcon of nernal sae () T () T s O() ( W ) H ( ) ( W ) IN() where hj( ) ( exp( oj( )))( j L h).the percepon sae and acon n SPE are all one denson varable so T IN [ x a] R. In order o express sae curosy of pgeon we ake a sascs of he sae sae n he learnng process denong as Nu( S ) whch s a onoone decreasng funcon wh respec o e shown as forula (). CS ( ) k k Nu( S ) e where k k s he adjusen facor of curosy paraeer sasfy he followng condon. When Nu( S ) CS ( ) ha ees he characersc of bologcal cognon. Defne he reward echans based on he orenaon as () S() and f ( ) ( ) r. If ( ) ( ) r. For he SPE he cognve flow based on he proposed odel s shown as Fgure III where sepax s he ax run sep durng sulaon. The nal sae of pgeon s he sae of desre o oban food. sepax r h s W k k A Nu( S) V () = VINW ( ) CS ( ) Nu( S) Nu( S) sep ax FIGURE III. COGNITIVE FLOW BASED ON THE PROPOSED CPACM MODEL. B. Sulaon Resuls and Analyss Usng he CPACM cognve odels proposed n hs paper and OCR odel cognve process of SPE were sulaed. The saplng e s s and we record he daa of he hree acon's selecon probably sknner pgeon's sae and acon aken a each second. Coparson of he resuls s shown n Fgure IV. probably of a a a.9.8.7.6.4... 5 5 /s a) acon selecon probably changng usng CPACM odel wh curosy pr py pb 4

probably of a a a.9.8.7.6.4... 5 5 b) acon selecon probably changng usng OCRM odel whou curosy Sknner Pgeon Sae Sknner Pgeon Acon.5.5 /s 4 6 8 4 6 8 /s.5.5 pr py pb :pan;:desre o oban food;:sasfed :Red;:Yellow;:Blue 4 6 8 4 6 8 /s c) sknner pgeon's sae and acon aken changng usng CPACM odel wh curosy Sknner Pgeon Sae Sknner Pgeon Acon.5.5 5 5 /s.5.5 :pan;:desre o oban food;:sasfed :Red;:Yellow;:Blue 5 5 /s d) sknner pgeon's sae and acon aken changng usng OCRM odel whou curosy FIGURE IV. SKINNER PIGEON COGNITION RESULTS COMPARISON OF CPACM AND OCRM presence of curosy n CPACM n he nal oens pgeon has a ceran curosy a hree saes so ha a ceran degree of behavor selec probably flucuaon of sknner pgeon occur copared wh OCRM ehod. By observng Fgure IV c) and d) we wll fnd ha alhough he probably of CPACM ehod appears shock bu on he lae sage sknner pgeon behavor and sae have reached a ore deerned sae. Bu n he sudden sall probably evens of OCRM ehod behavoral choces cerany decreased. Fro Fgure IV d) would be observed ha here s also he cases of choosng yellow or blue buon ha s because n he early sages of learnng curosy does no work lack of cognon under ceran saes and behavor cognon process s affeced. Fgure V shows a varaon of he syse behavor nforaon enropy. By enropy's rend ends o we can conclude ha he cognon of he syse s a evoluon process fro uncerany o cerany. IV. CONCLUSION In hs paper ang a agens' auonoous cognve probles n unknown envronen a novel curosy-drven percepon-acon cognve odel s proposed whch sulaes nrnsc ovaon cognve echans based on he curosy n psychology and cognve processes fro percepon o acon s realzed by probablsc acon selecon echans. In order o verfy he proposed curosy-drven "percepon-acon" cognve odel's behavoral learnng ably we ake a sulaon research usng ypcal SPE n behavoral learnng heory and ake a experenal coparave analyss wh he OCRM o llusrae he effecveness of he proposed ehod. Inforaon enropy llusraes ha applyng he proposed odel can acheve beer cognon and ndeed reflecs he bologcal cognve processes. Coparng sulaon resuls show ha hs ehod s effecve. ACKNOWLEDGMENT We acknowledge suppor fro Naonal Naural Scence Foundaon of Chna (No. 648) Tanjn Cy Hgh School Scence & Technology Fund Plannng Projec(No.87) Tanjn Unversy of Technology and Educaon Projec (No. KJY No.KYQD4) and hank for he revewers' helpful coens o prove he qualy of hs paper. Acon Inforaon Enropy.5 5 5 /s FIGURE V. BEHAVIOR INFORMATION ENTROPY CURVE OF SKINNER PIGEON. As can be seen fro Fgure IV a) and b) probably of sknner pgeon peckng a red buon are gradually changng fro. o e pgeon selecs he behavor of peckng a he red buon wh probably evenually. Due o he REFERENCES [] Bellas F Caaaño P Faña A e al. Dynac learnng n cognve robocs hrough a procedural long er eory[j]. Evolvng Syses 4 5():49-6. [] Wu X Ruan X G Zhang X e al. Sudy of Sknner Auoaon Ipleened on a Two-Wheeled Robo[C]. rd Inernaonal Conference on Elecrc and Elecroncs : 5-56. [] RUAN XaoGangWU Xuan.The sknner auoaon: A psychologcal odel foralzng he heory of operan condonng[j].scence Chna(Technologcal Scences):745-76. [4] HUANG Jng RUAN Xao-gang YU Na-gong ZHANG Xao-png WEI Ruo-yan FAN Qng-wu. Arfcal sensoroor syse wh operan condonng funcon[j]. Conrol heory and applcaon 5(5):674-68. 4

[5] Cusurds V Taylor J G. A cognve conrol archecure for he percepon acon cycle n robos and agens[j]. Cognve Copuaon 5(): 8-95. [6] Baranes A Oudeyer P Y. Inrnscally ovaed goal exploraon for acve oor learnng n robos: A case sudy. IEEE/RSJ Inernaonal Conference on Inellgen Robos and Syses (IROS) 766-77. [7] Lle D. Y. Soer F. T. Learnng and exploraon n aconpercepon loops[j]. Froners n neural crcus 7: 7-7. [8] Baranes A. Oudeyer P-Y. Acve Learnng of Inverse Models wh Inrnscally Movaed Goal Exploraon n Robos[J] Robocs and Auonoous Syses 6():49-7. 4