Branes with Brains. Reinforcement learning in the landscape of intersecting brane worlds. String_Data 2017, Boston 11/30/2017

Size: px

Start display at page:

Download "Branes with Brains. Reinforcement learning in the landscape of intersecting brane worlds. String_Data 2017, Boston 11/30/2017"

Marshall McDowell
5 years ago
Views:

1 Branes with Brains Reinforcement learning in the landscape of intersecting brane worlds FABIAN RUEHLE (UNIVERSITY OF OXFORD) String_Data 2017, Boston 11/30/2017 Based on [work in progress] with Brent Nelson and Jim Halverson

2 Motivation - ML Three approaches to machine learning: Supervised Learning: Train the machine by telling it what to do Unsupervised Learning: Let the machine train without telling it what to do Reinforcement Learning: [Sutton, Barto 98 17] Based on behavioral psychology Don t tell the machine exactly what to do but reward good and/or punish bad actions AI = reinforcement learning + deep (neural networks) learning [Silver 16]

3 Motivation - RL Agents interact with an environment (e.g. string landscape) Each interaction changes the state of the agent, e.g. the dof s parameterizing the string vacuum Each step is either rewarded (action lead to a more realistic vacuum) or punished (action lead to a less realistic vacuum) The agent acts with the aim of maximizing its long-term reward Agent repeats actions until it is told to stop (found a realistic vacuum or give up)

4 Outline String Theory setup: Intersecting D6-branes on orbifolds of toroidal orientifolds Implementation in Reinforcement Learning (RL) Basic overview Implementing the RL code Modelling the environment Preliminary results Finding consistent solutions Conclusion

5 String Theory 101 Intersecting D6-branes on orbifolds of toroidal orientifolds

6 String Theory 101 Have: IIA String Theory in 9D + time with 32 supercharges Want: A Theory in 3D + time with 4 supercharges Idea: Make extra 6D so small that we do not see them How do we do that? 1. Make them compact 2. Make their diameter so small that our experiments cannot detect them Reduce supercharges from 32 to 4: Identify some points with their mirror image

7 String Theory Setup Why this setup? Well studied Comparatively simple Number of (well-defined) solutions known to be finite: [Blumenhagen,Gmeiner,Honecker,Lust,Weigand '04'05; Douglas, Taylor '07,...] [Ibanez, Uranga 12] Use symmetries to relate different vacua [Douglas, Taylor 07] Combine consistency conditions to rule out combinations BUT: Number of possibilities so large that not a single interesting solution could be found despite enormous random scans (estimated to 1:10 9 ) Seems Taylor-made for big data / AI methods

8 String Theory Compactification How to make a dimension compact? ) Pacman

9 String Theory Compactification How to make a dimension compact? ) Pacman

10 String Theory Compactification How to make a dimension compact? ) Pacman

11 String Theory Compactification How to make a dimension compact? ) Pacman

12 String Theory Compactification y 1 y 2 y 3 x 1 x 2 x 3 Now six compact dimensions, but idea too simple Resulting space too simple (but just a little bit) Make it a bit more complicated

13 String Theory Orbifolds y 1 T 2 T 2 /Z 2 x 1 Mathematically: (x 1,y 1 )! ( x 1, y 1 ) Resulting object is called an orbifold Need to also orientifold: (x 1,y 1 )! (x 1, y 1 ) (plus something similar for the string itself)

14 String Theory Winding numbers (n, m) =(1, 0) (n, m) =(0, 1) (n, m) =(1, 2) Winding numbers (n, m) : Note: Due to orientifold: include (n, m), (n, m)

15 String Theory D6 branes 3D y 1 y 2 y 3 T 2 T 2 T 2 x 1 x 2 x 3 D6 brane: our 3D + a line on each torus Can stack multiple D6 branes on top of each other Brane stacks, Tuple: (N,n 1,m 1,n 2,m 2,n 3,m 3 )

16 String Theory Gauge group and particles Observed gauge group: U(N) : N Special cases: D6 branes on top of each other SO(2N) : N D6 branes parallel to O6-plane Sp(N) : N D6 branes orthogonal to O6-plane Intersection of N-brane and M -brane stack: Particles in representation SU(3) SU(2) U(1) Y (N,M) 1, 1 Observed particles in the universe: 3 (3, 2) 1 +3 (3, 1) 4 +3 (3, 1) (1, 2) 3 +1 (1, 2) 3 +3 (1, 1) 6 Quarks Leptons + Higgs

17 String Theory MSSM 3D y 1 y 2 y 3 T 2 T 2 T 2 x 1 x 2 x 3 Green and yellow intersect in 3 1 1=3 points Note: Counting intersections on the orbifold a bit more subtle

18 String Theory Consistency Tadpole cancellation: Balance energy of D6 and O6: 0 N a n a #stacks 1 n a 2 n a X B N a n a 1 m a 2m a N a m a a=1 1 n a 2 m a C A = B A N a m a 1m a 2 n a 3 8 K-Theory: Global consistency: 0 2N a m a 1m a 2m a 1 #stacks 3 X B N a m a 1 n a 2 n a N a n a a=1 1 m a 2 n a C A mod 3 2N a n a 1 n a 2 m a C A = C A

19 String Theory Consistency SUSY (computational control): 8a =1,...,# stacks m a 1m a 2m a 3 jm a 1n a 2n a 3 kn a 1m a 2n a 3 `n a 1n a 2m a 3 =0 n a 1n a 2n a 3 jn a 1m a 2m a 3 km a 1n a 2m a 3 `m a 1m a 2n a 3 > 0 Pheno: SU(3) SU(2) U(1) + particles T =(T 1,T 2,...,T k ), k =#U(N) stacks is U(1) iff: N 1 m 1 1 2N 2 m 2 1 2N k m k 1 T 1 0 2N 1 m 2 1 2N 2 m 2 2 2N k m k A T 2 2 B. C 2N 1 m 2 3 2N 2 m 2 3 2N k m A 3 T k A

20 String Theory IIA state space State space gigantic Choose a maximal value for winding number N B Let be the number of possible winding number combinations (up to ) after symmetry reduction N S Let be the maximal number of stacks NB Allows for combinations N S w max w max Note: Each stack can have N =1, 2, 3,... branes

21 Reinforcement learning

22 Reinforcement learning - Overview At time t, agent in state s t 2 S total Select action from action space A based on policy : S total 7! A a t Receive reward r t 2 R for action a t based on reward function R, R : S total A! R Transition to the next state Try to maximize long-term return G t =, Keep track of state value s t+1 v(s) ( how good is the state ) Adv = r 2 (0, 1] Compute advantage estimate ( how much better than expected has the action turned out to be ) 1X k=1 v k r t+k

23 Reinforcement Learning - Overview How to maximize future return? Depends on policy Several approaches Tabular (small state/action spaces): [Sutton, Barto 98] Temporal difference learning SARSA ) my breakout group on Friday Q-learning Deep RL (large/infinite state/action spaces): Deep Q-Network [Mnih et al 15] Asynchronous advantage actor-critic (A3C) [Mnih et al 16] Variations/extensions: Wolpertinger [Dulac-Arnold et al 16], Rainbow[Hessel et al '17]

24 Reinforcement Learning - A3C Global instance Policy Value Network Input Worker 1 Worker 2 Worker n Policy Value Policy Value Policy Value Network Network Network Input Input Input Environment Environment Environment

25 Reinforcement Learning - A3C Asynchronous: Have n workers explore the environment simultaneously and asynchronously improves training stability (experience of workers separated) improves exploration Advantage: Use advantage to update policy Actor-critic: To maximize return need to know state or action value and optimize policy. Methods like Q-learning focuses on value function Methods like policy-gradient focus on policy AC: Use value estimate ( critic ) to update policy ( actor )

Reinforcement Learning - Implementation Open AI Gym: Interface between agent (RL) and

ChainerRL s implementation of A3C for the agent Environment step Chainer RL action

LSTM, ) step: go to new state return (new_state, reward, done, comment) reset: reset

26 Reinforcement Learning - Implementation Open AI Gym: Interface between agent (RL) and environment (string landscape) [Brockman et al '16] We provide the environment We use ChainerRL s implementation of A3C for the agent Environment step Chainer RL action space observation (state) space reset make env method (A3C,DQN, ) NN architecture (FF, LSTM, ) step: go to new state return (new_state, reward, done, comment) reset: reset episode return start_state make environment specify RL method (A3C) specify policy NN (FF,LSTM)

27 Reinforcement learning - Model the environment State space:, s t =[(N 1,n 1 1,m 1 1,n 1 2,m 1 2,n 1 3,m 1 3), (N 2,n 2 1,...),...] Action space Reward R s t 2 S total S total = N max NB : Need a notion of how good a state is 1. By how much does a set of stacks violate the tadpole? 2. Is a set of stacks fully consistent (Tadpole, K-Theory, SUSY) (Note: the latter two are binary, hard to define distance) 3. How far is the state from the Standard Model Missing a group factor of SU(3) SU(2) U(1)? Too few Standard model particles (Q, u, d, L, H u,h d,e)? Extra exotics (particles charged under the Standard Model but not observed so far) Note: Only works if good states are close by in this sense N S N S A = {N a! N a ± 1, add stack (N,n 1,...), remove stack (N,n 1,...)}

28 Preliminary results Parameters: 16 or 32 workers (1 CPU, threads, 2.6GHz) Training time of the order 10h Neural networks for value and policy evaluation Feed-forward NN with 2 hidden Softmax layers with 200 nodes RNN with linear (200 nodes) and LSTM layer (128 nodes) Initial state: Empty stack Maximal steps per episode: 10, , evaluation runs every 100,000 steps

29 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # steps log(# steps) Tadpole cancellation Maximum of 10,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpolei (s)

30 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # steps log(# steps) Tadpole cancellation Maximum of 50,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpole i (s)

31 Preliminary results - Tadpole cancellation mean scores log(average # steps to solution) # of Steps log(# steps) Tadpole cancellation Maximum of 250,000 steps in an episode Reward for Tadpole cancellation: 10 6 Punishment for step: X 8 Tadpole i (s)

32 Preliminary results - Tadpole+K-Theory+SUSY mean scores # steps Full consistency (TC + K-Th + SUSY) Maximum of 10,000 steps in an episode Rewards: (TC,K-Th, SUSY) = (10 7, 10 8, 10 9 ) Punishment for step: X 8 Tadpole i (s)

33 Conclusion Reinforcement learning very promising approach to AI + ML A3C performs very well for different environments (mostly tested for Atari games) Type II orientifold setup well-suited for landscape analysis Physics well-understood Number of configurations too large to approach w/ conventional methods! no Standard model found so far Preliminary results: A3C works well for consistency (Tadpole, K-Theory, SUSY) Getting close to SM

34 Thank you for your attention!

On the Phenomenology of Four Dimensional Gepner Models

On the Phenomenology of Four Dimensional Gepner Models Mirian Tsulaia (Vienna University of Technology) Work in progress with Elias Kiritsis (University of Crete) Liverpool, March 27 29, 2008 String Phenomenology