Jonathan Mugan. July 15, 2013

Size: px

Start display at page:

Download "Jonathan Mugan. July 15, 2013"

Evan Richardson
5 years ago
Views:

1 Jonthn Mugn July 15, 2013

2 Imgine rt in Skinner box. The rt cn see screen of imges, nd dot in the lower-right corner determines if there will be shock. Bottom-up methods my not find this dot, but top-down pproch requires supervisory signl. Our supervisory signl comes from predictions. 2

3 The importnce of top-down bstrction lerning Autonomous development with top-down bstrction lerning Appliction of utonomous development to cyber security Top-down bstrction lerning for cyber security 3

4 The importnce of top-down bstrction lerning Autonomous development with top-down bstrction lerning Appliction of utonomous development to cyber security Top-down bstrction lerning for cyber security 4

5 1. Begin with very brod discretiztion of the environment. 2. Simultneously lern discretiztion nd set of predictive models of the environment. 3. Convert the models into plns, nd form the plns into set of hierrchicl ctions. 4. Use lerned ctions to explore the environment. 5

6 Continuous vribles Discrete vribles Models of the environment Imges time Feedbck from model to discretiztion 6

7 Continuous vribles Imges time 7

8 Continuous vribles Discrete vribles Imges time 8

9 Continuous vribles Discrete vribles Models of the environment Imges time 9

10 Continuous vribles Discrete vribles Models of the environment Imges time Feedbck from model to discretiztion 10

11 () s Model Model in the form of dynmic Byesin network [Den nd Knzw, 1989] Pln Pln in the form of Q-function [Sutton nd Brto, 1998] 11

12 Low-level motor commnds The result is useful bstrct stte representtion nd hierrchy of effective higher-level ctions. 12

13 Qulittive Representtion Lerning Predictive Models From Models to s nd Plns Explortion 13

14 A qulittive representtion encodes the vlues of vribles reltive to known lndmrks [Kuipers, 1994] X Lndmrks bridge the gp between the continuous nd the discrete. The vrible vlue is either less thn, greter thn, or equl to tht lndmrk. 14

15 X lndmrks 15

16 X X x 16

17 X X x 17

18 Vrible X with lndmrks l 1, l 2 hs qulittive vlues Q X l l l l l l {(, ),,(, ),,(, )} 18 18

19 Initilly, vribles hve no lndmrks Q X {(, )} 19

20 Initilly, vribles hve no lndmrks Q X {(, )} But for ech vrible X we define X X X t t t 1 Q X {(,0),0,(0, )} {[ ],[0],[ ]} 20

21 1. Generliztion: different rel vlues mp to the sme qulittive vlue. 2. Focus: the lerner cn focus on importnt events. 21

22 Consider x Q X l1 l1 l1 l2 l2 l2 ( ) {(, ),,(, ),,(, )} event ( t, X, x) 22

23 Consider x Q X l1 l1 l1 l2 l2 l2 ( ) {(, ),,(, ),,(, )} Exmple: event ( t, X, l ) 2 X t 1 l 2 nd X t = l 2 X t l 2 23

24 Consider x Q X l1 l1 l1 l2 l2 l2 ( ) {(, ),,(, ),,(, )} Exmple: event ( t, X, l ) 2 X t 1 l 2 nd X t = l 2 X t l 2 soon( t, X, x) t '[ t t ' t k nd event ( t ', X, x)] soon is time window for n event to occur 24

25 Qulittive Representtion Lerning Predictive Models From Models to s nd Plns Explortion 25

A contingency is pir of events tht occur together in time. E.g., flip switch nd light goes on. Humns hve n innte contingency detection module [Gergely nd Wtson, 1999].

26 A contingency is pir of events tht occur together in time. E.g., flip switch nd light goes on. Humns hve n innte contingency detection module [Gergely nd Wtson, 1999]. Humn infnts cn detect contingencies shortly fter birth [DeCsper nd Crstens, 1981]. Contingencies re: 1. Esy to lern; they only require looking t pirs of events. 2. A nturl representtion for plnning. 26

27 Lern contingency E E 1 2 when E 2 is more likely to soon occur given tht E 1 hs occurred thn otherwise P( soon( E ) E ) P( soon( E )) We look t ll pirs of events. 27

28 consequent event ntecedent event event( t, X, x) soon( t, Y, y) Extrcted contingences become dynmic Byesin networks. 28

29 Context vribles lerned through mrginl ttribution [Drescher, 1991] consequent event ntecedent event event( t, X, x) soon( t, Y, y) V 1 V 1 context vribles V n V 2 Conditionl Probbility Tble 29

30 event( t, X, x) soon( t, Y, y) 30

31 event( t, X, x) soon( t, Y, y) Environment responds with yes or no. 92 Yes Vt () 93 Yes 96 Yes 100 No 103 No 105 No 31

32 event( t, X, x) soon( t, Y, y) Environment responds with yes or no. 92 Yes Vt () 93 Yes 96 Yes 100 No 103 No 105 No Fyyd nd Irni [1993] Entropy: Informtion gin: j 2 j H( S) P( S s )log P( S s ) S S I g H( S) H( S ) H( S ) S S j 32

33 Qulittive Representtion Lerning Predictive Models From Models to s nd Plns Explortion 33

34 Plnning in QLAP combines symbolic nd MDP plnning Symbolic Plnning MDP Plnning Useful when only some sttes nd vribles re relevnt. QLAP uses symbolic plnning to link models together. Useful when you need to model uncertinty. QLAP uses reinforcement lerning within models. d c b Q( s, ) P( s ' s, ) R( s ') mx Q( s ', ') ' s' 34

35 Low-level motor commnds There is pln nd ction for ech discrete motor vlue. Motor ctions directly set effectors. 35

36 Low-level motor commnds Esy to build on top of existing pieces. 36

37 Low-level motor commnds The end product of development. 37

38 38

39 Qulittive Representtion Lerning Predictive Models From Models to s nd Plns Explortion 39

40 40

41 Lerns bstrctions: 1. The force needed to move the hnd. 2. The limits of movement. 3. Hving its hnd be the left or right of the block. the block. 41

42 42

43 43

44 44

45 The importnce of top-down bstrction lerning Autonomous development with top-down bstrction lerning Appliction of utonomous development to cyber security Top-down bstrction lerning for cyber security 45

46 QLAP Cy-QLAP 46

47 We expnded the QLAP developmentl lerning lgorithm into domin-generl system protection lgorithm. Generlized Cy-QLAP Algorithm 1. humn SME specifies set of sttes nd ctions 2. humn SME specifies set of undesirble events tht should be voided 3. Cy-QLAP ctively explores to lern the dynmics of the environment 4. do forever:. Cy-QLAP monitors the system to see if it is possible to formulte pln to bring bout n undesirble event b. If such pln is found, Cy-QLAP tkes proportionl ction to brek link in tht pln 47

48 Windows Registry Cy-QLAP Cy-QLAP remote process Windows logs pcket sniffer sensitive files Linux Ubuntu remote virtul mchine Windows 7 protected end node 48

49 sensitive files Windows 7 protected end node 49

50 Cy-QLAP sensitive files Windows 7 protected end node 50

51 Windows Registry Cy-QLAP Windows logs pcket sniffer sensitive files Windows 7 protected end node 51

52 Windows Registry Cy-QLAP Cy-QLAP remote process Windows logs pcket sniffer sensitive files Linux Ubuntu remote virtul mchine Windows 7 protected end node 52

53 Generlized Cy-QLAP Algorithm 1. humn SME specifies set of sttes nd ctions 2. humn SME specifies set of undesirble events tht should be voided 3. perform Autonomous Explortion nd Lerning 4. do forever:. ctivte the Thret Monitoring Module b. If such pln is found, ctivte the Thret Intervention Module 53

54 Cy-QLAP lerned the importnt dynmics of the environment Cy-QLAP lerned how to open file remotely exfiltrte file open nd close file shre 54

55 Generlized Cy-QLAP Algorithm 1. humn SME specifies set of sttes nd ctions 2. humn SME specifies set of undesirble events tht should be voided 3. perform Autonomous Explortion nd Lerning 4. do forever:. ctivte the Thret Monitoring Module b. If such pln is found, ctivte the Thret Intervention Module 55

56 Cy-QLAP: lerned tht if file shre ws open, sensitive file could be exfiltrted. lso lerned how to close file shres. Cy-QLAP therefore would close file shre s soon s it ws opened. Cy-QLAP lerned to protect the system without being told how. We know of no other cyber defense system tht lerns through explortion. 56

57 The importnce of top-down bstrction lerning Autonomous development with top-down bstrction lerning Appliction of utonomous development to cyber security Top-down bstrction lerning for cyber security 57

58 high-level bstrctions Instlled progrms System logs Processes System clls Register vlues low-level bstrctions 58

59 , X X x 0 100, ,,0 0,100 QLAP noted the rel vlue of ll vribles ech time model ws pplied. 59

60 Approch: define set of bstrction hierrchies nd note the vlue of the current nd next level of ech hierrchy ech time model is pplied Keep going down until the next level is not more relible thn the current level vlue field file vlue directory field Exmple: configurtion file bstrction hierrchy file 60

Jonthn Mugn jmugn@21ct.com www.jonthnmugn.

61 Jonthn Mugn 6011 West Courtyrd Drive Building 5, Suite 300 Austin, TX Phone: Fx:

WE would like to build intelligent agents that can. Autonomous Learning of High-Level States and Actions in Continuous Environments

WE would like to build intelligent agents that can. Autonomous Learning of High-Level States and Actions in Continuous Environments Autonomous Lerning of High-Level Sttes nd s in Continuous Environments Jonthn Mugn nd Benjmin Kuipers, Fellow, IEEE Abstrct How cn n gent bootstrp up from pixel-level representtion to utonomously lern