Learning Probabilistic Finite State Automata For Opponent Modelling

Size: px

Start display at page:

Download "Learning Probabilistic Finite State Automata For Opponent Modelling"

Edgar Todd
6 years ago
Views:

1 Mate in Atificial Intelligence (UPC-URV-UB) Mate of Science Thei Leaning Pobabilitic Finite State Automata Fo Oonent Modelling Toni Cebián Chuliá Advio/: René Alquéza and Albeto Sanfeliu Januay 17 th, 2011

3 A Sanda

4 2

5 Content 1 Intoduction Motivation Poblem definition and goal Thei oveview Poblem Decition The Game Envionment Inteacting agent Reeated game Game. Nomal Fom Oonent Modelling Leaning a model Model aumtion Leaning the oonent tanduce ALERGIA MDI GIATI algoithm Defeating the oonent Utility function Examle Leaning Pobabilitic Finite State Automata Intoduction Symbol, Language and Gamma Finite State Automaton Quotient Automaton Pobabilitic Automata Algoithm Identifying egula language RPNI ALERGIA MDI

6 4 CONTENTS 4 Oonent Modelling Defeating an Oonent The GIATI algoithm Some teminology Infeing Finite-State Tanduce Alying GIATI in the eeated game cenaio Makov Deciion Pocee Link between Pobabilitic Mealy Machine and MDP Infinite hoizon model Finding otimal olicie Summay Exeiment The exeiment Pione Dilemma Round length Match length Alha RoShamBo The exeiment Concluion Futue wok A JSON fomat fo a game 57 B Automata fo the RoShamBo exeiment 59

7 Chate 1 Intoduction 1.1 Motivation Atificial Intelligence (AI) i the banch of the Comute Science field that tie to imbue intelligent behaviou in oftwae ytem. In the ealy yea of the field, thoe ytem wee limited to big comuting unit whee eeache built exet ytem that exhibited ome kind of intelligence. But with the advent of diffeent kind of netwok, which the moe ominent of thoe i the Intenet, the field became inteeted in Ditibuted Atificial Intelligence (DAI) a the nomal move. The field thu moved fom monolithic oftwae achitectue fo it AI ytem to achitectue whee eveal iece of oftwae wee tying to olve a oblem o had inteet on thei own. Thoe iece of oftwae wee called Agent and the achitectue that allowed the inteoeation of multile agent wee called Multi-Agent Sytem (MAS). The agent act a a metaho that tie to decibe thoe oftwae ytem that ae embodied in a given envionment and that behave o eact intelligently to event in the envionment. The AI mainteam wa initially inteeted in ytem that could be taught to behave deending on the inut eceived. Howeve thi aidly howed ineffective becaue the human o the exet acted a the knowledge bottleneck fo ditilling ueful and efficient ule. Thi wa in bet cae, in wot cae the tak of enumeating the ule wa difficult o lainly not affodable. Thi aked the inteet of anothe ubfield, Machine Leaning and it counte at in a MAS, Ditibuted Machine Leaning. If you can not code all the cenaio combination, code within the agent the ule that allow it to lean fom the envionment and the action efomed. With thi famewok in mind, alication ae endle. Agent can be ued to tade bond o othe financial deivative without human intevention, o they can be embedded in a obotic hadwae and lean uneen ma configuation in ditant location like ditant lanet. Agent ae not eticted to inteaction with human o the envionment, they can alo inteact with othe agent themelve. Fo intance, agent can negotiate the quality of evice of 5

8 6 CHAPTER 1. INTRODUCTION a channel befoe etablihing a communication o they can hae infomation about the envionment in a cooeative etting like obot occe laye. But thee ae ome hotcoming that emege in a MAS achitectue. The one elated to thi thei i that atitioning the tak at hand into agent uually entail that agent have le memoy o comuting owe. It i not economically feaible to elicate the big comuting unit on each eaate agent in ou ytem. Thu we can ay that we hould think about ou agent a comutationally bounded, that i, they have a limited amount of comuting owe to lean fom the envionment. Thi ha eiou imlication on the algoithm that ae commonly ued fo leaning in thee etting. The claical aoach fo leaning in MAS ytem i to ue ome vaiation of a Reinfocement Leaning (RL) algoithm [BT96, SB98]. The main idea aound thoe algoithm i that the agent ha to maintain a table with the eceived value of each action/tate ai and though multile iteation obtain a et of deciion ule that allow to take the bet action fo a given envionment. Thi aoach ha eveal flaw when the cuent action deend on a ingle obevation een in the at (fo intance, a waning ign that a obot eceive). Seveal technique ha been ooed to alleviate thoe hotcoming. Fo intance to avoid the combinatoial exloion of tate and action, intead of toing a table with the value of the ai an aoximating function like a neual netwok can be ued intead. And fo event in the at, we can extend the tate definition of the envionment ceating dummy tate that coeond to the N-tule (tate N,tate N 1,...,tate N t ) 1.2 Poblem definition and goal Given the oblem befoe mentioned thi thei tudie the effect of uing a model baed aoach to lean in a cometitive agent envionment. In ode to model the envionment, in thi cae, an ooing agent, obabilitic finite tate automata will be emloyed. Thi will limit the ange of oible hyothei when eaching the tate ace. Fo illutative uoe, the envionment whee ou agent will live i an abtaction of a cometitive etting modelled a a game theoetic model. In thi thei we ae going to exloe the following elated iue: 1. Some tate-of-the-at algoithm fo leaning obabilitic automata will be eented and dicued. Comaion of the leaning abilitie of each algoithm will be unde tudy 2. Agent can be een a highly comlicated tanduce. That i, they ae oftwae atifact that afte eceiving ome enoy inut eact to the envionment efoming ome action. How can the imlicit tanduce that an agent hide within it machiney be infeed, will be eeached.

9 1.3. THESIS OVERVIEW 7 3. Finally, when a woking hyothei i obtained, thi model mut be exloited in ode to take an advantage in thi etu. But ou agent ha to kee in mind that thi hyothei might not be the undelying tanduce and that ome inexactitude may haen, both in the tuctue of the model o in the cuent tate. How diffeent model of comlexity degade ove time will be tudied becaue we ae mainly inteeted on winning the agent cometition Thi Mate Thei tie to exloe beyond the oint whee the thei of David Camel ended. In hi thei he tie to infe deteminitic automata eticted to an alhabet with only 2 ymbol. In thi wok, the geneal famewok i extended by allowing abitay alhabet length and by tying to infe tochatic deteminitic tanduce. We have not been able to find online the befoe mentioned thei but the main line of eeach can be leaned by eading the efeence [CM96b, CM96a, CM98, CM99]. 1.3 Thei oveview Thi thei i oganized a follow: Chate 2 outline the diffeent aea thi thei elie on. A vey imle examle of the te and comutation i ued fo illutative uoe. Chate 3 intoduce the neceay automata concet to undetand the building block of the methodology and exoe the algoithm ued to lean the PDFA. Chate 4 eent the algoithmic theoy that allow to infe tanduce uing automata. Once we have ou woking hyothei of the model ou oonent i uing fo chooing movement, we hould exloit thi knowledge. To that end we how how can tanduce be tanlated into and Makov Deciion Pocee. InChate 5wetudyhowouetuiabletoinfeoonenttategie and the ate and effectivene of the leaning oce. Chate 6 eent the concluion deived fom thi tudy

10 8 CHAPTER 1. INTRODUCTION

11 Chate 2 Poblem Decition 2.1 The Game Envionment In ode to tet ou idea we need an envionment whee two cometing agent inteact with each othe. Figue 2.1 illutate all the element that ae involved in the exeimental etu. Figue 2.1: Two ooing agent, one that lean and the othe that ha a fixed tategy ae cometing laying the RoShamBo game Let eview the main comonent: Agent ae laying a eeated game in the ene that i defined by the Game Theoy field. Playe lay imultaneou game emitting ymbol, and afte eeing the ai of ymbol, an extenal efeee aign ayoff to each laye. In the figue, agent ae laying RoShamBo and the ymbol they can emit ae eithe Pae, Rock o Scio. The cuent movement 9

12 10 CHAPTER 2. PROBLEM DESCRIPTION give one oint to ou leaning agent and -1 oint to ou fixed agent becaue in RoShamBo, Pae win Rock. Thee ae two ooing agent. In the figue they ae eeented by the lette A and B. The B-agent i the agent we ae tying to model. It ha a obabilitic Mooe Machine a it inne coe tategy hown in the ue ight cloud. That tategy i fixed and will not change duing the game and i govened by the ymbol that the A-agent end. The A-agent i ou leaning agent. Afte ome iteation with the B- agent it ceate a model hyothei deicted in the ue-left cloud. Thi automaton tanlate nicely to a tanduce which thi agent believe i the tategy of the ooing agent. Once the agent ha thi hyothei it devie a counte tategy that i ued to lay the game until the end The following ection adde each element with moe detail. 2.2 Inteacting agent One of the mot inteeting aea of the Atificial Intelligence field i Multi Agent Sytem (MAS). Accoding to [Woo02], MAS Multiagent ytem ae ytem comoed of multile inteacting comuting element, known a agent.agent ae comute ytem with two imotant caabilitie. Fit, they ae at leat to ome extent caable of autonomou action of deciding fo themelve what they need to do in ode to atify thei deign objective. Second, they ae caable of inteacting with othe agent not imly by exchanging data, but by engaging in analogue of the kind of ocial activity that we all engage in evey day of ou live: cooeation, coodination, negotiation, and the like. Anothe definition about what an agent i can be found in [RN03] An agent i anything that can be viewed a eceiving it envionment though eno and acting uon that envionment though actuato A ational agent i an agent that doe the ight thing accoding to a efomance meaue that evaluate a given equence of envionment tate. Note that the efomance meaue evaluate envionment tate, not agent tate. We can thu define ationality a deendant on fou thing: 1. The efomance meaue that define the citeion of ucce 2. The agent io knowledge of the envionment

13 2.3. REPEATED GAMES The action that the agent can efom 4. The agent ecet equence to date Thi again lead to the definition of a ational agent. Fo each oible ecet equence, a ational agent hould elect an action that i exected to maximize it efomance meaue, given the evidence ovided by the ecet equence and whateve built-in knowledge the agent ha [RN03] The built-in knowledge i uually called the agent ogam. Thi ogam guide how the agent ick action baed on the envionment. Deending on the ocedue ued by the agent, agent ogam can be claified in fou baic kind of agent ogam: 1. Simle eactive agent 2. Model-baed eactive agent 3. Goal-baed agent 4. Utility-baed agent In thi thei we ae going to exloe Model-baed eactive agent. Fo a moe exhautive exlanation of the diffeent kind of agent ogam check [PW04]. When an agent can be acibed to the model-baed eactive aadigm we exect that the agent hould maintain ome ot of intenal tate that deend on the ecet hitoy and theeby eflect at leat ome of the unobeved aect of the cuent envionment tate. In ou cae, the model of the wold will be an infeed tanduce and the tate ou agent hould maintain i in which tate ou tanduce cuently i. 2.3 Reeated game Game. Nomal Fom A game [LR57] i a mathematical tuctue that comie thee element: 1. A et of ule with the inne woking of the game 2. A et of available move to each laye deending on the tate of the game 3. A function with the outcome deending on the tate of the game

14 12 CHAPTER 2. PROBLEM DESCRIPTION A game i uually ued to eeent a conflict between eveal contending laye. Thi mathematical fomulation allow the eeache to emloy mathematic to tudy the outcome of eveal olicie o tategie. In thi thei, move fom each laye do not deend on the game tate, all movement ae available in whicheve intant of the game. Moeove, outcome o ayoff ae alo eadily available when both laye efom a move. Thee containt allow u to utilize in thi thei the nomal fom decition of a game. When uing the nomal fom, game ae decibed by uing a matix whee the column how the available movement fo one laye and the ow the movement fo the othe laye. Cell inteection ae ued to eeent the ayoff of each laye. When the ayoff i diffeent fo each laye, a tule i ued intead, with the ow laye ayoff a it fit element and the column laye ayoff a it econd element. The cometitive inteaction between agent can thu be fomalized a a eetition of a imle two-laye game, whee on each movement agent inteact with each othe by imultaneouly efoming an action. The comlete et of ai of action can be aggegated to denote the hitoy game. Let ee ome examle of game. Pione dilemma Pione dilemma i a two-laye game, whee each laye ha two oible action A = {c,d} and the ayoff fo each laye, 1, 2 i decibed in the ayoff matix 2.1. Fo the Pione Dilemma the game theoetical tategy c d c (3,3) (0,5) d (5,0) (1,1) Table 2.1: The ayoff matix fo the Pione dilemma game i to defect alway. Thi i the otimal tategy fo game that only involve one inteaction, but fo eeated game, it i of the inteet of the agent to cooeate in the long un becaue thi tategy ewad the agent the mot. RoShamBo RoShamBo i a oula game that i eemingly ued aco cultue to dilucidate mino conflict o choice. Each laye can efom one of thee action A = {ae,tone,cio} whee each action beat o i beaten by the othe two action. Thi can be exlicitly hown in the ayoff matix 2.2. RoShamBo ha no table tategy neithe fo the ingle inteaction no fo the eeated game inteaction. It i alo a zeo um game whee what one laye win, the othe laye loe.

15 2.4. OPPONENT MODELLING 13 ae tone cio ae (0,0) (1,-1) (-1,1) tone (-1,1) (0,0) (1,-1) cio (1,-1) (-1,1) (0,0) Table 2.2: The ayoff matix fo the RoShamBo game 2.4 Oonent Modelling Thi thei i mainly about leaning the undelying tategy of an ooing oonent. In game theoetic tem it i called oonent modelling. We want to infe the mot accuate model oible of ou oonent becaue once we have that ictue we can ceate exlicit tategie ecially tailoed to counte act thi oonent tategie Leaning a model Leaning a model of an oonent i a kind of unuevied leaning envionment. We do not have a et of labeled ai of enoy inut and oonent tate. Intead what we have i the enoy inut, that i the et of action exchanged, and the game outcome that ae omehow elated to the agent tate. In oonent modelling we ae tying to infe an abtacted decition of the laye o the laye behaviou duing the game. Oonent modelling i widely ued in domain whee the cometitive natue of the agent ae cental like in Poke. Oonent modelling technique ae emloyed thee to claify an oonent in an abtact ene a aggeive, ock o calling tation 1. Altenatively, oonent modelling can go a te futhe in ganulaity and ty to edict whethe a given laye will call, check o aie in a Poke ound. Fo leaning the oonent model, you uually have to emloy ome kind of Reinfocement Leaning algoithm (RL)[SB98]. RL algoithm ae algoithm that lean which action ou agent ha to efom in a given envionment to accomlih a given goal uually modelled a a numeical ewad. Thi family of algoithm, amongt we can find TD-leaning, Q-leaning and SARSA [BT96], ty to infe a function that goe fom the tate of envionment to the action that hould be efomed. Thi function can be a imle a a table with ai tate/action o in cae whee the combinatoial exloion of tate and action make infeaible to toe uch table, thi function can be aoximated by an auxiliay function flexible enough to model thoe ubtletie. Thoe aoximating function ae uually ceated by taining neual netwok [Hay08]. 1 Fo a moe comehenive oke laye decition check [Skl99]

16 14 CHAPTER 2. PROBLEM DESCRIPTION Model aumtion In thi thei, ou main aumtion i that ou ooing agent ha an algoithmically encoded tategy to lay the game. Thi algoithm belong to the family of function that can be decibed a Stochatic Finite State Tanduce(SFST). Baically the model we ae auming i that ou oonent eact to ou action deteminitically changing to anothe tate in it automaton. In ode fo the agent to geneate it next action, each tate ha a dicete obabilitic ditibution whee ymbol will be geneated accoding to that ditibution. It i alo known a a Stochatic Mooe Machine. Mooe machine can be diectly tanlated into Mealy machine and the othe way aound. Fo ou leaning uoe it i moe convenient that thi Stochatic Mooe Machine be decibed a a Stochatic Mealy Machine ince thi tanlation allow u to ue all the aleady known machiney to deal with Makov Deciion Pocee. Tanfoming a Mooe machine into it Mealy counteat can be eaily accomlihed by mean of aending the ymbol emitted in the tate to each of the incoming tanition. Since the Mooe machine i Stochatic, thi oagation caie the obability aociated and geneate in tun a Stochatic Mealy Machine. Let ee a concete examle. In Figue 2.2 we can ee an examle of uch Mooe machine eeenting an agent that almot alway emit the ame ymbol that tiggeed the tanition. Afte efoming the outut ymbol c d c=0.9 d=0.1 d c c=0.1 d=0.9 Figue 2.2: Stochatic Mooe Machine oagation on each te, we obtain the Stochatic Mealy machine that can be een in Figue 2.3. Leaning thi kind of tanduce that eeent the comutational tategy of the ooing agent will be the objective of the leaning agent. 2.5 Leaning the oonent tanduce In ode to lean the oonent tanduce, ound of the game will be layed and the eult of each ound will confom the taining data fo ou PDFA leaning algoithm. Afte a PDFA ha been obtained and uing the GIATI

17 2.5. LEARNING THE OPPONENT S TRANSDUCER 15 c/c (0.45) c/d (0.05) q0 d/c (0.05) d/d (0.45) c/c (0.45) c/d (0.05) q1 d/d (0.45) d/c (0.05) Figue 2.3: Tanfomed Mealy machine fom the oiginal Mooe machine algoithm a tanduce i alo obtained. Thi tanduce i imila to a Makov Deciion Poce fo which known algoithm fo oe contol exit. Thi thei i concened with the comaion of diffeent PDFA leaning algoithm and thei effect on the cometitive envionment of the ooed game. Two main algoithm ae going to be tudied, ALERGIA and MDI ALERGIA The ALERGIA algoithm by Caaco and Oncina [CO94] i an extenion of the non-tochatic algoithm RPNI eented in [Onc92] MDI The MDI algoithm ([Tho00]) i concetually imila to the ALERGIA algoithm but imove eult by taking into account global featue wheea ALERGIA i concened in local one GIATI algoithm Afte obtaining an automaton, we ue the GIATI algoithm fo obtaining the tanduce. In ode to lean the Stochatic Finite State Tanduce SFST we emloy the GIATI algoithm decibed in [CVP05]. Given an inut alhabet Σ and an outut alhabet ou taining cou conit of wod fomed by (,t) Σ Each taining ai (,t) i tanfomed into a ting z fom an extended alhabet Γ (ting of Γ-ymbol) yielding a amle S of ting Z Γ 2. A tochatic egula gamma G i infeed fom S 3. The Γ-ymbol of the gamma ule ae tanfomed back into ai of ouce/taget ymbol/ting (fom Σ ). The main oblem uing the GIATI algoithm i that the maing fom the taining ai to the new alhabet, the labeling function L : Σ Γ

18 16 CHAPTER 2. PROBLEM DESCRIPTION and the invee function ae not ecified and they mut make ene in the oblem at hand. In ou oblem, we don t have uch nuance ince, in the game we ae tudying, both agent efom only one action on each tun and thu, no oblem of alignment between ting ha to be infeed. In Figue 2.4 we can ee the machine eented in Section tanfomed into it equivalent Pobabilitic Deteminitic Finite State Machine ove the new vocabulay Γ : {cc,cd,dc,dd} {x,y,w,z}. x (0.45) y (0.05) q0 w (0.05) z (0.45) x (0.45) y (0.05) q1 z (0.45) w (0.05) Figue 2.4: The Mealy Machine ha been tanfomed into a Pobabilitic DFA with an extended alhabet 2.6 Defeating the oonent Once we have an oonent model hyothei, ou tak i to geneate the coeonding action that hel u to goven the automaton. Thi i eaily accomlihed given the fact that a Mealy Machine, a tanduce, can be eaily conveted to an equivalent Makov Deciion Poce. So ou fit te will be to efom that conveion and afte that ue one of the algoithm fo infeing the bet olicy fo the MDP Utility function Inodetoolvethebet eone oblemwehavetodefinetheutilityfunction that ou agent i going to ue to evaluate a eult. Thi utility function i elated to each of the ayoff on each game tage and to the emaining hoizon in the whole game. Thee ae eveal oible utility function that dive the each of action in the MDP, hee ae a coule fo illutative uoe. The dicounted-um function: Ui d ( 1, 2 ) = (1 γ i ) γiu t i ( 1 (g (1, 2 )(t)), 2 (g (1, 2 )(t))) t=0 Whee U i i the utility function fo laye i when hi i laying accoding to tategy 1 and hi oonent i uing tategy 2.

19 2.7. EXAMPLE 17 The limit-of-the-mean function: U lm i ( 1, 2 ) = lim k inf 1 k k u i ( 1 (g (1, 2 )(t)), 2 (g (1, 2 )(t))) t=0 Whee again, U i i the utility function fo laye i when he i laying accoding to tategy 1 and hi oonent i uing tategy Examle To illutate all the oce involved, let develo an examle. Each te will be exlained in futhe detail in next ection, thi i a high level ummay of the info that ha to come. Let tat auming that ou agent ae cometing in a Pione Dilemma cometition. The agent that ha the fixed tategy i laying the obabilitic Tit-Fo-Tat tategy. That mean that thi agent, almot alway, eeat you lat move. Thi agent ha been een befoe in Figue 2.2 and we eoduce it hee fo eaine in following the exlanation (Figue 2.5). c/c (0.45) c/d (0.05) q0 d/c (0.05) d/d (0.45) c/c (0.45) c/d (0.05) q1 d/d (0.45) d/c (0.05) Figue 2.5: Pobabilitic tanduce (Tit-fo-Tat) Since thi i ou fit iteation, ou leaning agent ha no infomation about what i the tuctue of it oonent, o the bet aoach i to geneate movement at andom. The initial tuctue can be een in Figue 2.6. Kee in d (0.5) q0 c (0.5) Figue 2.6: Random Automata mind, that ou andom automaton i only a concie way to eeent a unifom

20 18 CHAPTER 2. PROBLEM DESCRIPTION ditibution of ymbol. Fo ou leaning agent the model ued to geneate movement can be whateve tuctue, automata, function o a tanlation table. Let advance ome teminology. Ou objective, i that ou leaning agent outefom it oonent in a Game. A game i comoed of eveal ound. Each ound ha a fixed numbe of matche with a fixed numbe of movement fo each match (Figue 2.7) whee each agent ick a ymbol, and a ewad i given a an anwe of thi action. Fo ou examle, the game will conit on two ound with 25 move each ditibuted on 5 matche e ound. Figue 2.7: Decomoition of each game in eveal laye of detail The hitoy of the fit 5 matche haen to be: (d-d) (c-d) (c-c) (c-c) (d-d) (d-d) (d-d) (c-d) (d-c) (d-d) (d-d) (d-d) (c-d) (d-d) (d-d)

21 2.7. EXAMPLE (d-d) (d-d) (c-d) (c-c) (d-c) (c-d) (c-c) (d-c) (d-d) (d-d) whee each tule eeent a imultaneou move of the agent. At thi moment, the game efeee to the exchange of movement, ack the hitoy obtained o fa and notifie it to the leaning Agent. Ou leaning, agent efom the language tanfomation of the imlicit tanduce and gou lette into wod. The eulting wod et i zxwwz zzxyz zzxzz zzxwy xwyzz Note, that the leaning tage occu ove the extended alhabet. When unning the ALERGIA algoithm with aamete α = 0.7 ove thi taining et we obtain the PDFA in Figue 2.8 z [0.47] x [0.26] 0 w [0.16] y [0.053] z [0.36] y [0.18] w [0.091] 1 Figue 2.8: PDFA leaned afte the fit 25 inteaction But don t foget that thi automaton i eeenting the Mealy machine that can be een in Figue 2.9 The cautiou eade hould have noticed that outgoing obability tanition in thi automaton do not add 1. Thi i o becaue algoithm leaning

22 20 CHAPTER 2. PROBLEM DESCRIPTION d/d [0.47] c/d [0.26] d/d [0.36] d/c [0.18] c/c [0.091] 0 c/c [0.16] d/c [0.053] 1 Figue 2.9: Mealy Machine deived fom the leaned PDFA State Action 0 c 1 d Table 2.3: Deived action of each tate given the Mealy-Machine of Figue 2.10 PDFA take into account a tanition obability to hidden ink tate ued to mak the end of the wod. Since in ou cenaio thee i no uch concet wod we mut mooth the obtained automata in ode to nomalize the outgoing obabilitie. The final Mealy machine can be een in d/d [0.50] c/d [0.28] d/d [0.57] d/c [0.29] c/c [0.14] 0 c/c [0.17] d/c [0.05] 1 Figue 2.10: Smoothed Mealy Machine deived fom the leaned PDFA Late in thi thei it i hown that thi obabilitic Mealy Machine coeond with a Makov Deciion Poce fo which algoithm exit that allow u to goven it otimally. Alying a olicy iteation algoithm to infe ou deciion ule and uing the dicounted infinite ewad function a the accumulated ewad function, we obtain that fo thi automaton the ule govening the automaton ae the one in Table 2.3.

23 Chate 3 Leaning Pobabilitic Finite State Automata 3.1 Intoduction The fundamental mathematical object in thi thei i a Finite State Machine, in it diffeent flavo, deteminitic, non-deteminitic, obabilitic o an extended veion, the tanduce. It i thu neceay to exlain ome elated concet about language and automata. 3.2 Symbol, Language and Gamma The field of Syntactic Patten Recognition i concened about finding tuctue in a team of dicete ymbol. Symbol ae the element of a finite et. Thoe element ae uually eeented by lette and the et itelf i called alhabet and i uually eeented by the ymbol Σ. A wod i a finite equence of ymbol in a given alhabet. Fo convenience the emty wod i alo defined and i uually eeented by the geek ymbol λ. The et of all the wod that can be comoed by a given alhabet i called the univeal language of Σ. Thi language Σ alo include the emty wod. So fo examle, given the alhabet coniting of a ingle lette Σ = {a}, the univeal language that can be contucted i: Σ = {λ,a,aa,aaa, } Let calllanguage ovethealhabetσtoaubetoftheuniveallanguage of Σ. L Σ We call obabilitic o tochatic language to the ai of a language and a function that aign obabilitie to each wod of the language {L,P L ( )} 21

24 22CHAPTER 3. LEARNING PROBABILISTIC FINITE STATE AUTOMATA 3.3 Finite State Automaton A finite tate automaton (FSA) [HMU01, ASMO97], i defined by a five-tule (Q,Σ,δ,q 0,F) whee: Q i a finite et of tate Σ i an alhabet q 0 i the initial tate F Q i the et of final tate δ : Q Σ 2 Q i a atial function We ay that q i a ucceo of if δ(q,a). We call ou automaton deteminitic if fo all q Q and fo all a Σ, the tanition δ(q,a) ha at mot one element Quotient Automaton Let A be a FSA and π a atition of Q, we denote by B(q,π) a the only block that contain q and we denote the quotient et {B(q,π) q Q} a Q/π. Given a FSA A and a atition π ove Q we define the quotient automaton A/π a: whee δ i defined a: A/π = (Q/π,Σ,δ,B(q 0,π),{B Q/π B F }) B,B Q/π, a Σ, B δ (B,a) if q,q Q,q B,q B : q δ(q,a) Given the automaton A and the atition π ove Q, we have that the language defined by the quotient automaton atifie L(A) L(A/π) Pobabilitic Automata All the concet exlained fo imle automata can be extended to the obabilitic cae. A tochatic finite automaton (SFA), A = (Σ,Q,P,q 0 ), conit of an alhabet Σ, a finite et of node Q = {q 0,q 1,...,q n } with q 0 the initial node, and a et P of obability matice ij (a) giving the obability of a tanition fom node q i to node q j led by the ymbol a in the alhabet. If we call if the obability that the ting end at node q i, the following containt alie: if + ij (a) = 1 q j Q a A

25 3.4. ALGORITHMS 23 The obability (w) fo the ting w to be geneated by A i defined by: (w) = q j Q 0j (w) jf ij (w) = q k Q ik (wa 1 ) kj (a) a A and the language geneated by the automaton A i defined a: L = {w Σ : (w) 0} Thoe language geneated by mean of a SFA ae called tochatic egula laguage. In cae the SFA contain no uele node 1, it geneate a obability ditibution fo the ting in Σ : w Σ (w) = 1 The algoithm that we ae going to ue to infe the obabilitic automata, ae eticted to lean obabilitic deteminitic finite automata (PDFA). Thi mean that fo evey node q i Q and ymbol a A thee exit at mot one node uch ij 0. In uch cae a tanition function h = δ(i,a) can be defined. 3.4 Algoithm Identifying egula language Hee we eent ome concet that will be ueful when develoing late the automata algoithm. A efix tee acceto (PTA) i a tee-like DFA built fom the leaning amle by taking all the efixe in the amle a tate and contucting the mallet DFA which i a tee. A fomal algoithm buildpefixtee can be een in Algoithm 1. Note that we can alo build a PTA fom a et of oitive ting only. Thi coeond to building the PTA( S +, ). The algoithm we ae going to tudy take the PTA a a tating oint and tie to genealie fom it by meging tate. In ode not to get lot in the oce it will be inteeting to divide the tate into thee categoie: 1. The RED tate which coeond to tate that have been analyed and which will not be eviited; they will be the tate of the final automaton 2. The BLUE tate which ae the candidate tate: they have not been analyed yet and it hould be fom thi et that a tate i dawn in ode to conide meging it with a RED tate 1 A node q i i uele if thee ae no ting x,y Σ uch that j 1i(x) ij (y) jf 0

26 24CHAPTER 3. LEARNING PROBABILISTIC FINITE STATE AUTOMATA Algoithm 1 buildpefixtee Inut: A amle S +,S Outut: A = PTA( S +,S ) = Σ,Q,q λ,f A,F R,δ F A F R Q {q u : u PREF(S + S )} fo q u a Q do δ(q u,a) q ua end fo fo q u Q do if u S + then F A F A q u end if if u S then F R F R q u end if end fo etun A 3. The WHITE tate, which ae all the othe. They will in tun become BLUE and then RED The baic oeation that allow the maniulation of the PTA ae comatible, mege and omote. comatible: deciding equivalence between tate The quetion hee i of deciding if two tate ae comatible o not, that i, if meging thee two tate will not eult in ceating confuion between acceting and ejecting tate. Tyically the comatibility might be teted by: q A q L FA (A q ) L FR (A q ) = and L FR (A q ) L FA (A q ) = But it haen that the fomula above i not ufficient to mege two tate and thee ae ituation in which moe oeation like meging, folding and then teting conitency ae needed. mege: meging two tate The meging oeation take two tate fom an automaton and mege them into a ingle tate. It hould be noted that the effect of the mege i that a deteminitic automaton will obably loe the deteminim oety though thi oeation, and thu we will attemt to avoid having to ue thee automata.

27 3.4. ALGORITHMS 25 omote: omoting a tate Pomotion i anothe deteminitic and geedy deciion. The idea hee i that having decided that a tate in the PTA that i a candidate fo meging with the final automata tate, thi tate hould finally become a final tate of the eulting automata and hould not be meged RPNI Although not an algoithm to lean Pobabilitic DFA but egula Deteminitic Finite Automata, the algoithm RPNI [Onc92] i the bae fo the Alegia (3.4.3) and ubequently the MDI algoithm (3.4.4). It i thu inteeting to biefly eview how it wok. Although it imilaitie with it obabilitic deivation, thi algoithm need both a amle of oitive attibute (S + ) belonging to the language we ae tying to lean and a et of negative examle (S ) that do not belong to the intended language. Obviouly, the moe examle both oitive and negative, the moe i coveed the language and thu the eaie to lean exactly. Thi algoithm tat by building the efix tee acceto of the oitive intanceofthetainingamle(s + )andthenoceedbyiteatively chooing oible mege, check if a given mege i coect and i made between two comatible tate, make the mege if admiible and omote the tate if no mege i oible. The algoithm ha a a tating oint the PTA, which i a deteminitic finite automaton. In ode to avoid oblem with non-deteminim, the mege of two tate i immediately followed by a folding oeation: the mege in RPNI alway occu between a tate aleady elected a final and a tate that i conideed in the iteation. At the end of the oce we exect the obtained automaton to accet the ting eent in the taining amle and to eject the negative one. Algoithm 2 RPNI-PROMOTE Inut: a DFA A = Σ,Q,q λ,f A,F R,δ, et Red,Blue Q,q u Blue Outut: A, Red, Blue udated Red Red {q u } Blue Blue {δ(q u,a),a Σ} etun A, Red,Blue Examle Hee we how how a PTA i built fom a et of examle. Fo thi dataet we have that S + = {011,101} and S = {1,01}. So the PTA obtained fom the et of oitive examle can be een in Figue 3.1.

28 26CHAPTER 3. LEARNING PROBABILISTIC FINITE STATE AUTOMATA Algoithm 3 RPNI-COMPATIBLE Inut: A,S Outut: a Boolean, indicating if A i conitent with S fo w S do if δ A (q λ,w) F A then etun Fale end if end fo etun Tue Algoithm 4 RPNI-MERGE Inut: a DFA A, tate q Red, q Blue Outut: A udated Let (q f,a) be uch that δ A (q f,a) = q δ A (q f,a) q etun RPNI-FOLD(A,q,q ) Algoithm 5 RPNI-FOLD Inut: a DFA A, tate q,q Q q being the oot of a tee Outut: A udated, whee ubtee q i folded into q if q F A then F A F A {q} end if fo a Σ do if δ A (q,a) i defined then if δ A (q,a) i defined then A RPNI-FOLD(A,δ A (q,a),δ A (q,a)) ele δ A (q,a) δ A (q,a) end if end if end fo

29 3.4. ALGORITHMS 27 Algoithm 6 RPNI Inut: a amle S = S +,S, function COMPATIBLE, CHOOSE Outut: a DFA A = Σ,Q,q λ,f A,F R,δ A BUILD-PTA(S + ) Red {q λ } Blue {q a : a Σ Pef(S + )} while Blue do CHOOSE(q b Blue) Blue Blue\{q b } if q ReduchthatRPNI-COMPATIBLE(RPNI-MERGE(A,q,q b ),S ) then A RPNI-MERGE(A,q,q b ) Blue Blue {δ(q,a) : q Red a Σ δ(q,a) / Red} ele A RPNI-PROMOTE(q b,a) end if end while fo q Red do if λ (L(A q ) 1 S then F R F R {q } end if end fo λ Figue 3.1: Pefix tee of the oitive taining amle ALERGIA The ALERGIA algoithm [CO94] fo leaning PDFA follow the ame incile than the RPNI algoithm een in Section Fit begin by building the Pefix Tee Acceto (PTA) fom the taining amle and evaluate at evey node the elative obabilitie of the tanition coming out fom the node. Next it tie to mege coule of node following a well defined ode (eentially, that of the level in the PTA o lexicogahical ode). Meging i efomed if the eulting automaton i, within tatitical uncetainty, equivalent to the PTA. The oce end when futhe meging i not oible. The

30 28CHAPTER 3. LEARNING PROBABILISTIC FINITE STATE AUTOMATA algoithm can be een in Algoithm 9 Algoithm 7 ALERGIA-TEST Inut: an FFA A,f 1,n 1,f 2,n 2,α > 0 Outut: a Boolean indicating if the fecuencie f 1 cloe γ f 1 ( ( etun γ < n 1 f 2 n 2 1 n 1 + ) 1 n 2 ) 1 ln 2 2 α n 1 and f 2 n 2 ae ufficiently The comatibility tet make ue of the Hoeffding bound. The algoithm ALERGIA-COMPATIBLE (8) call the ALERGIA-TEST (7) a many time a needed, thi numbe being finite due to the fact that the ecuive call viit a tee. The baic function CHOOSE i a follow: take the mallet tate in an odeing that ha been done at the beginning (on the PTA). The tet that i ued to decide if the tate ae to be meged o not (function COMPATIBLE) i baed on the Hoeffding tet made on the elative fequencie of the emty ting and of each efix. Algoithm 8 ALERGIA-COMPATIBLE Inut: an FFA A, two tate q u,q v,α > 0 Outut: q u and q v comatible? Coect Tue if ALERGIA-TEST(F P A (q u ),FREQ A (q u ),F P A(q v ),α) then Coect Fale end if fo a Σ do if ALERGIA-TEST(δ f (q u,a),freq A (q u ),δ f (q v,a),freq A (q v ),α) then Coect Fale end if end fo MDI Algoithm ALERGIA decided uon meging (and thu genealiation) though a local tet: ubting fequencie ae comaed and if it i not uneaonable to mege, then meging take lace. A moe agmatic oint of view could be to mege wheneve doing o i going to give u an advantage. The goal i of coue to educe the ize of the hyothei while keeing the edictive qualitie of the hyothei (at leat with eect to the leaning amle) a good a oible. Fo thi we can ue the likelihood of each ting. The goal i to obtain a good balance between the gain in ize and the lo in elexity.

31 3.4. ALGORITHMS 29 Algoithm 9 ALERGIA Inut: a amle S, α > 0 Outut: an FFA A Comute t 0, thehold on the ize of the multiet needed fo the tet to be tatitically ignificant A PTA(S) Red {q λ } Blue {q a : a Σ Pef(S)} while CHOOSE(q b ) fom Blue uch that FREQ(q b ) t 0 do if q Red : ALERGIA-COMPATIBLE(A,q,q b,α) then A STOCHASTIC-MERGE(A,q,q b ) ele Red Red {q b } end if Blue {q ua : ua Pef(S) q u Red}\Red end while etun A Attemting to find a good comomie between thee two value i the main idea of algoithm MDI (Minimum Divegence Infeence). Algoithm 10 MDI-COMPATIBLE Inut: an FFA A, two tate q and q,s,α > 0 Outut: a Boolean indicating if q and q ae comatible B STOCHASTIC-MERGE(A,q,q ) etun (coe(s,b) < α) The key diffeence i that the ecuive mege ae made inide Algoithm MDI-COMPATIBLE (3.4.4) and befoe the new coe i comuted intead of in the main algoithm. Like in the ALERGIA algoithm, a difficult quetion to anwe i that of etting the tuning aamete (α): if et too high, mege will take lace ealy, which will eha include a wong mege, ohibiting late neceay mege, and the eult can be bad. On the contay, a mall α will block all mege, including thoe that hould take lace, at leat until thee i little data left. Thi i the afe otion, which lead in mot cae to vey little genealization.

32 30CHAPTER 3. LEARNING PROBABILISTIC FINITE STATE AUTOMATA Algoithm 11 MDI Inut: a amle S,α > 0 Outut: an FFA A Comute t 0, the thehold on the ize of the multiet needed fo the tet to be tatitically ignificant A PTA(S) Red {q λ } Blue {q a : a Pef(S) cuent coe coe(s,pta(s)) while CHOOSE(q b ) fom Blue uch that FREQ(q b ) t 0 do if q Red : MDI-COMPATIBLE(A,q,q b,s,α) then A STOCHASTIC-MERGE(A,q,q b ) ele Red Red {q b } end if Blue {q ua : ua Pef(S) q u Red}\Red end while etun A

33 Chate 4 Oonent Modelling 4.1 Defeating an Oonent In Chate 3 we leaned about the algoithm ued fo infeing the PDFA. Thoe algoithm ae ued in the context of a eeated game etu whee ou leaning agent i tying to defeat an oonent. Thi chate i about the two main te that mut be efomed to accomlih thi goal: 1. Deive a woking hyothei about the intenal tategy that ou oonent i uing. Thi will be accomlihed uing the GIATI algoithm. 2. Deive a counte-tategy that could exloit any weaknee that tategy could have. Thi will be accomlihed tanlating ou woking hyothei into a Makov Deciion Poce and deiving the govening ule fo it. 4.2 The GIATI algoithm In the eviou chate we have een technique to infe obabilitic automata fom examle but in thi thei what we ae tying to infe i the tanduce that ou oonent i uing to lay the game. So, we hould fill the ga between having algoithm that lean PDFA and to convet thoe model into the taget tanduce. Thi chate i devoted to the neceay technique to accomlih thi tak Some teminology A finite-tate tanduce (FST), T, i a tule Σ,,Q,q 0,F,δ, in which: 1. Σ i a finite et of ouce ymbol 2. i a finite et of taget ymbol 3. Q i finite et of tate 31

34 32 CHAPTER 4. OPPONENT MODELLING 4. q 0 i the initial tate 5. F Q i a et of final tate 6. δ Q Σ Q i a et of tanition. Note that Σ =. A tanlation fom φ of length I in T i defined a the equence of tanition: φ = (q φ 0, φ 1,t φ 1,q φ 1)(q φ 1, φ 2,t φ 2,q φ 2) (q φ I 1,φ I,tφ I,qφ I ) whee(q φ i 1,φ i,tφ i,qφ i ) δ. Aai(,t) Σ iatanlation aiifthee i a tanlation fom φ of length I in T uch that I = and t = t φ 1t φ 2 t φ I. A ational tanlation i the et of all tanlation ai of ome finite-tate tanduce T. Thi definition of a finite-tate tanduce i imila to the definition of a egula o finite-tate gamma. The main diffeence i that in a finite-tate gamma, the et of taget ymbol doe not exit, and the tanition ae defined on Q Σ Q. A tanlation fom i the tanduce counteat of a deivation in a finite-tate gamma, and the concet of ational tanlation i eminicent of the concet of (egula) language, defined a the et of ting aociated with the deivation in the gamma G. A tochaticfinite-tatetanduce,t P idefinedaatule Σ,,Q,q 0,,f in which Q,q 0,,Σ ae a in the definition of a finite-tate tanduce and and f ae two function: 1. : Q Σ Q [0,1] 2. f : Q [0,1] That atify q Q f()+ (a,w,q ) Σ Q (q,a,w,q ) = 1 The obability of a tanlation ai (,t) Σ accoding to T P i the um of the obabilitie of all the tanlation fom of (,t) in T: P TP (,t) = φ d(,t) whee the obability of a tanlation fom φ i P TP (φ) P TP (φ) = I (q i 1, i,t i,q i ) f(q I ) i=0 that i, the oduct of the obabilitie of all the tanition involved in φ.

35 4.2. THE GIATI ALGORITHM 33 Thee ae two main tye of finite tate tanduce, the Mooe machine and the Mealy machine. Since we ae inteeted in it tochatic deivation, we will how the definition of the obabilitic intance. The definition of the deteminitic entitie could be obtained by imly taking into account degeneate obabilitie whee only one tanition ha the full weight, i.e. thee i a tanition that ha obability 1. Mooe machine A Mooe machine i a deteminitic automaton with the ability to geneate ymbol. Like othe automata, the Mooe machine efom tate tanition deending on the inut ymbol conumed. When the automata land in a new tate, an outut ymbol i geneated accoding to an intenal fomula. Fomally a tochatic Mooe machine i a tule Q,Σ,,δ,λ,q 0 whee: Q i the et of node in the automaton Σ and ae the inut and outut alhabet δ : Q Σ P Q i the et of obability ditibution ove Q λ : Q P i the obabilitic outut function. The Mooe machine geneate outut ymbol accoding to a given obability function P q 0 i the initial tate Mealy machine A Mealy machine i alo a deteminitic automaton that geneate outut ymbol duing the tate tanition. The main diffeence with the Mooe machine i that you can define diffeent obability function fo the tanition and fo the outut ymbol while in the Mealy machine the obability function i a joint function fo the tanition and the outut ymbol. Mathematically a Mealy machine i a tule Q,Σ,,δ,q 0 whee: Q i the et of node in the automata Σ and ae the inut and outut alhabet δ : Q Σ P Q i the et of obability ditibution ove the et of tanition and outut ymbol Q q 0 i the initial tate The GIATI algoithm elie on the following two theoem. The inteeted eade hould check [Be09] fo the coeonding oof. Theoem 4.1. T Σ i a ational tanlation if and only if thee exit an alhabet Γ, a egula language L Γ, and two mohim h Σ : Γ Σ and h : Γ uch that T = {(h Σ (w),h (w)) w L}.

36 34 CHAPTER 4. OPPONENT MODELLING and Theoem 4.2. A ditibution P T : Σ [0,1] i a tochatic ational tanlation if and only if thee exit an alhabet Γ, two mohim h Σ : Γ Σ and h : Γ, and a tochatic egula language P L uch that (,t) Σ, P T (,t) = P L (w) w Γ :(h Σ (w),h (w))=(,t) Infeing Finite-State Tanduce The methodology exlained in [CV04] i called gammatical infeence and alignment fo tanduce infeence(giati). Baed on the wok of Betel [Be09] it i well known that(tochatic) ational tanlation T can be obtained a a homomohic image of cetain (tochatic) egula language L ove an adequate alhabet Γ. Thi ugget the following geneal technique fo leaning a tochatic finitetate tanduce, given a finite amle I + of ting ai (,t) Σ : 1. Each taining ai (,t) fom I + i tanfomed into a ting z fom an extended alhabet Γ (ting of Γ-ymbol) yielding a amle S of ting S Γ. Let call thi tanfomation L : Σ Γ 2. A (tochatic) egula gamma G i infeed fom S 3. The Γ-ymbol of the gamma ule ae tanfomed back into ai of ouce/taget ymbol/ting (fom Σ ). The invee labelling function Λ : Γ Σ i one that Λ(L(I + )) = I +. Following Theoem 4.2 and 4.1, Λ( ) conit of a coule of mohim, h Σ, h, uch that fo a ting z Γ, Λ(z) = (h Σ (z),h (z)) The oveall ocedue can be een in the Figue 4.1. A Σ T : A T(T) Labeling - L( ) S Γ GI algoithm Invee labeling - Λ( ) G : S L(G) Figue 4.1: Commutative diagam of the tanfomation efomed uing the GIATI algoithm

37 4.3. MARKOV DECISION PROCESSES Alying GIATI in the eeated game cenaio When alying the GIATI algoithm thee laceholde have to be filled in ode to un the algoithm, the labelling function L( ), the invee labelling function Λ( ) and the algoithm ued to lean the obabilitic automaton. Fo ou oonent modelling cenaio that ha been defined in Section 2.3, the GIATI element ae defined a follow: Chooing the labelling function L tend to be a difficult deciion. In tatitical tanlation, that i one of the domain whee FST ae of common ue, a eviou tudy of the tatitical coelation between the oition of the wod in the ouce language and the taget language, called tatitical alignment, i efomed. In ou cae, ince the action and the eone ae alway aied, thee i no need to efom uch tatitical analyi of the data, each inut ymbol i aied with only one outut ymbol. So, the labelling function L will be L(,t) = concat(,t) = t = z z Γ whee the ymbol t Γ i obtained fom the alhabet Γ that eceive it element fom all the emutation of the ymbol in Σ and. Leaning the obabilitic automata will be efomed uing one of the befoe mentioned algoithm, ALERGIA and MDI The invee labelling function jut lit the ouce ymbol fom the taget ymbol Λ(z) = Λ(t) = (,t) Σ,t 4.3 Makov Deciion Pocee A Makov Deciion Poce (MDP) [Be95, Put94] conit of five element: deciion eoch, tate, action, tanition obabilitie, and ewad. Deciion Eoch and Peiod Deciion ae made at oint of time efeed a deciion eoch. Let T denote the et of deciion eoch, in a dicete envionment T can be finite o infinite anging in the eal oitive line. In ou dicete envionment, at each eoch deciion ae made to goven the obabilitic ytem.

38 36 CHAPTER 4. OPPONENT MODELLING State and Action Set At each deciion eoch, the ytem occuie a tate. We denote the et of oible ytem tate a S. If, at ome deciion eoch, the deciion make obeve the ytem in tate S, he may chooe action a fom the et of allowable action in tate, A. Given thi etu let A = S A. Note that S and A do not vay with t. Action may be choen eithe andomly o deteminitically. Chooing actionandomlymeanelectingaobabilityditibutionq( ) P(A )being P(A ) the collection of obability ditibution on ubet of A. When we ae dealing with deteminitic election of action, ou model i imly uing degeneate obability ditibution. Rewad and Tanition Pobabilitie A a eult of chooing action a A in tate, 1. the deciion make eceive a ewad, t (,a) and 2. the ytem tate at the next deciion eoch i detemined by the obability ditibution (, a) Fom the eective of the model, it i immateial how the ewad i accued duing the eiod. We only equie that it value o exected value be known befoe chooing an action, and that it not be effected by futue action. The ewad might be, 1. a lum um eceived io to the next deciion eoch 2. a andom quantity that deend on the ytem tate at the ubequent eoch 3. o a combination of the above When the ewad deend on the tate of the ytem at the next deciion eoch, we let (,a,j) denote the value at time t of the ewad eceived when the ttate of the ytem at deciion eoch t i, action a A i elected and the ytem occuie tate j at deciion eoch t+1. It exected value at deciion eoch t may be evaluated by comuting t (,a) = j S t (,a,j) t (j,a) We uually aume that t (j,a) = 1 j S We efe to the collection of object foming a tule T,S,A,(,a),(,a,j) (4.1)

39 4.3. MARKOV DECISION PROCESSES 37 a a Makov Deciion Poce. The qualifie Makov i ued becaue the tanition obabilitie and ewad function deend on the at only though the cuent tate of the ytem and the action elected by the deciion make in that tate. Deciion Rule A deciion ule ecibe a ocedue fo action election in each tate at a ecified deciion eoch. Thee i a lot of vaiety on how deciion ule ick thei action but the imay focu in thi thei will be on deteminitic Makovian ule becaue they ae eaie to imlement and evaluate. Such deciion ule ae function d : S A, which ecify the action choice when the ytem occuie tate, and thu fo each S we have d() A. Thi deciion ule i aid to be Makovian (memoyle) becaue it doe not deend on eviou ytem tate and action only thought the cuent tate of the ytem, and deteminitic becaue it chooe an action with cetainty. Policie A olicy ecifie the deciion ule to be ued at all deciion eoch. We call a olicy tationay if d t = d fo all t T Link between Pobabilitic Mealy Machine and MDP The GIATI algoithm ovide u with a tanduce in the fom of a nondeteminitic Mealy Machine. Thee i an almot diect tanlation between thi Mealy machine and an MDP but ome conideation hould be taken into account. Let eca the definition of thoe mathematical object, a Mealy Machine (Section 4.2.1) i the tule: Q,Σ,,δ,q 0 and a Makov Deciion Poce (Equation 4.1) i defined by: T,S,A,(,a),(,a, ) So the diffeent tanfomation can be ummaized by The deciion eoch T in a MDP ae moe geneal than the te tanition in a tanduce ince in an MDP, continuou o Boel et ae allowed a eoch. Since we ae doing the tanlation fom a Mealy machine to an MDP the tanlation i diect, the MDP i fixed to dicete and evenly aced deciion eoch and no futhe adatation mut be done.

40 38 CHAPTER 4. OPPONENT MODELLING The et of tate Q and S can be ued intechangeably between the two model. Hee we efom a diect tanlation Fo the tanition obabilitie in an MDP, (,a) ome deivation mut be efomed. In a Mealy machine ou tanition function δ ecifie obabilitie fo the ai next tate lu emitting ymbol, Q, o the baic tanlation hould be (,a) = t (,a,,t) The ewad function, (,a, ) deend on the game we ae laying. When a tanition haen in a Mealy machine, both the inut ymbol and the emitted ymbol ae available. Thi eeent a movement that a efeee late will aign a value to each laye, o in ode to ovide an exected value fo a tanition, the ewad ae weighted by the obabilitie of a given tanition haening when eceiving an inut ymbol (,a, ) = t (,a,,t) evalmove(a,t) Whee the function evalmove i game deendent. With thee imle tanlation, a given Mealy machine obtained by the GIATI algoithm can be tanlated into a Makov Deciion Poce, eady fo obtaining the govening deciion ule Infinite hoizon model When a Makov deciion oce i un indefinitely, each olicy induce a dicete-time ewad oce. That ewad team ha an aociated utility to the agent deending on ome function ued to value that team. Seveal function can be ued to aggegate that team into a ingle value fo eaine when comaing diffeent team: 1. The exected total ewad olicy π, v π i defined to be { N } v π () lim N Eπ (X t,y t ) = lim N vπ N+1() (4.2) t=1 Note that the limit in 4.2 may be + o and conequently thi efomance meaue i not alway aoiate. 2. The exected dicounted ewad of olicy π i defined to be { N } vλ() π lim N Eπ λ t 1 (X t,y t ) fo 0 λ 1 t=1

Basic propositional and. The fundamentals of deduction

Basic propositional and. The fundamentals of deduction Baic ooitional and edicate logic The fundamental of deduction 1 Logic and it alication Logic i the tudy of the atten of deduction Logic lay two main ole in comutation: Modeling : logical entence ae the