The Planning-by-Inference view on decision making and goal-directed behaviour

Size: px

Start display at page:

Download "The Planning-by-Inference view on decision making and goal-directed behaviour"

Rosemary Floyd
5 years ago
Views:

1 The Planning-by-Inference view on decision making and goal-directed behaviour Marc Toussaint Machine Learning & Robotics Lab FU Berlin Nov. 14, /20

2 pigeons 2/20

3 pigeons Wolfgang Köhler (1917) Intelligenzprüfungen am Menschenaffen The Mentality of Apes 2/20

4 Richard E. Bellman ( ) Bellman s principle of optimality B A A opt B opt V(s) = max a [R(a,s)+γ ] s P(s a,s) V(s ) π(s) = argmax a [R(a,s)+γ ] s P(s a,s) V(s ) Dynamic Programming, recursing backward Reinforcement Learning (Q-learning, etc) the paradigm of goal-directed decision making nowadays value function over full state space curse of dimensionality distinguished role of rewards 3/20

5 4/20

another state (random) variable Shachter

for 15 years should have had great influence

6 von Neumann ( ) Morgenstern ( ) utility/reward/success just another state (random) variable Shachter (Stanford) Cooper (Pittsburgh) Shachter/Cooper/Peot (1988): inference P(behavior R = 1) I think: idea was sleeping for 15 years should have had great influence on thinking about behavior! a group of MLs independently rediscovered this recently 5/20

7 Planning by Inference X Y Z things you know/ current state A things to do/ actions things you want to see in the future We condition on goals/desired observations and infer actions/motion. 6/20

8 CURRENT STATE CONSTRAINTS GOALS ACTIONS OBSERVATIONS assume permanent relaxation process 7/20

9 CURRENT STATE CONSTRAINTS GOALS ACTIONS OBSERVATIONS assume permanent relaxation process imagining a goal means conditioning a variable the network relaxes into a new posterior, implying new actions 7/20

10 CURRENT STATE CONSTRAINTS GOALS ACTIONS OBSERVATIONS assume permanent relaxation process imagining a goal means conditioning a variable the network relaxes into a new posterior, implying new actions works for networks of goals, constraints, observations, etc. 7/20

11 CURRENT STATE CONSTRAINTS GOALS ACTIONS OBSERVATIONS assume permanent relaxation process imagining a goal means conditioning a variable the network relaxes into a new posterior, implying new actions works for networks of goals, constraints, observations, etc. doesn t distinguish between sensor/motor, perception/action 7/20

12 CURRENT STATE CONSTRAINTS GOALS ACTIONS OBSERVATIONS assume permanent relaxation process imagining a goal means conditioning a variable the network relaxes into a new posterior, implying new actions works for networks of goals, constraints, observations, etc. doesn t distinguish between sensor/motor, perception/action we got rid of notion of state as one big variable 7/20

13 Markov Decision Processes a 0 a 1 a 2 s 0 s 1 s 2 r 0 r 1 r 2 Problem: Find argmax π V π = E{ t=0 γt r t ;π}, γ [0,1] 8/20

14 Markov Decision Processes a 0 a 1 a 2 s 0 s 1 s 2 r 0 r 1 r 2 Problem: Find argmax π V π = E{ t=0 γt r t ;π}, γ [0,1] Planning by Inference: Define another stochastic process (mixture of finite-time MDPs) s.t. Max. Likelihood L π = P(R = 1;π) in this process Max. Value V π in the MDP (Toussaint & Storkey, ICML 2006) 8/20

15 Stochastic Optimal Control z 0 z 1 z 2 z T x 0 x 1 x 2 x T u 0 u 1 u 2 u T prior P(x 0:T,u 0:T) := P(x 0) controlled q π(x 0:T,u 0:T) := P(x 0) posterior p(x 0:T,u 0:T) := P(x0) P(z=1) T P(u t x t) t=0 T t=0 δ ut =π t (x t ) T P(x t u t-1x t-1) t=1 T P(u t x t) t=0 T P(x t u t-1x t-1) t=1 T P(x t u t-1x t-1) t=1 T exp{ c t(x t,u t)} t=0 For uniform P(u t x t): D ( ) q π p = logp(z 0:T =1)+E qπ(x 0:T ){C(x 0:T,π(x 0:T ))} ( General Kalman Duality ; Rawlik & Toussaint 2010) 9/20

16 MDPs (Toussaint, Storkey, ICML 2006) a0 s0 r Network-Distributed POMDP (Kumar, Toussaint, Zilberstein, IJCAI 2011) a0 a1 r s0 s1 a0 a1 a2 r s0 s1 s2 a0 a1 a2 at r s0 s1 s2 Relational RL (Lang, Toussaint, ICML 2009) s xt hierarchical POMDP (Charlin, Poupart, Toussaint, UAI 2008) n 2 n 1 e 1 n 2 n 1 e 0 n 0 n 0 y a y s s 10/20

17 Planning by Inference... An alternative framework when exact Bellman is infeasible approximate inference can exploit structure Literature on reductions to Probabilistic Inference: Toussaint & Storkey: Probabilistic inference for solving discrete and continuous state Markov Decision Processes (ICML 2006). Todorov: General duality between optimal control and estimation (Decision and Control, 2008) Toussaint: Robot Trajectory Optimization using Approximate Inference (ICML 2009). Kappen, Gomez & Opper: Optimal control as a graphical model inference problem (arxiv: , 2009) Rawlik, Toussaint & Vijayakumar: Approximate Inference and Stochastic Optimal Control (arxiv: , 2010) 11/20

18 Random exploration: Planning: Relational explore-exploit: Real-world: 12/20

19 Neural perspective 13/20

Neural perspective Goal-directed decision making in prefrontal cortex: A computational framework (NIPS 2008) Matthew Botvinick (Princeton) We take three empirically motivated points as founding

20 Neural perspective Goal-directed decision making in prefrontal cortex: A computational framework (NIPS 2008) Matthew Botvinick (Princeton) We take three empirically motivated points as founding premises: (1) Neurons in dorsolateral prefrontal cortex represent action policies, (2) Neurons in orbitofrontal cortex represent rewards, and (3) Neural computation, across domains, can be appropriately understood as performing structured probabilistic inference. [...]... essentially proposes Planning by Inference 13/20

21 Karl J. Friston, Jean Daunizeau, James Kilner, Stefan J. Kiebel: Action and behavior: a free-energy formulation, Biol Cyb. Karl Friston (UCL) minimize sensory prediction error (free-energy) through action Expectation Maximization and Free Energy Minimization are equivalent. 14/20

22 15/20

23 Inference in neural systems Bayesian Brain: Probabilistic Approaches to Neural Coding. K. Doya, S. Ishii, A. Pouget, RPN. Rao (editors), MIT Press (2007) Hierarchical Bayesian Inference in Networks of Spiking Neurons. Rajesh P. N. Rao (NIPS 2004) The Neurodynamics of Belief Propagation on Binary Markov Random Fields. T. Ott, R. Stoop (NIPS 2006) current work by Wolfgang Maas... 16/20

24 Johnson & Redish (J o Neuroscience, 2007): Neural ensembles in CA3 transiently encode paths forward of the animal at a decision point 17/20

25 Theories of thought Rick Grush (Behavioral and Brain Sciences, 2004): The emulation theory of representation: motor control, imagery, and perception. (20 pages + 46 pages commentary & response!) Keywords: Kalman filters, overt & covert actions, imagery 18/20

26 Theories of thought G. Hesslow (Trends in Cog Sciences, 2002): Conscious thought as simulation of behaviour and perception. A simulation theory of cognitive function can be based on three assumptions about brain function. (1) First, behaviour can be simulated by activating motor structures, as during an overt action but suppressing its execution. (2) Second, perception can be simulated by internal activation of sensory cortex, as during normal perception of external stimuli. (3) Third, both overt and covert actions can elicit perceptual simulation of their normal consequences. A large body of evidence supports these assumptions. It is argued that the simulation approach can explain the relations between motor, sensory and cognitive functions and the appearance of an inner world. 19/20

27 Thanks to all collaborators: PhD students: Tobias Lang, Nikolay Jetchev, Stanio Dragiev Pascal Poupart, Laurent Charlin (U Waterloo) Nikos Vlassis (U Crete) Michael Gienger, Christian Goerick (Honda, Offenbach) Klaus-Robert Müller, Manfred Opper (TU Berlin) Amos Storkey, Stefan Harmeling, Sethu Vijayakumar (U Edinburgh) thanks for your attention! Source code: on my webpage: code for AICO, DDP/iLQG, EM-POMDP solver on Tobias webpage: code for PRADA 20/20

Artificial Intelligence

Artificial Intelligence Dynamic Programming Marc Toussaint University of Stuttgart Winter 2018/19 Motivation: So far we focussed on tree search-like solvers for decision problems. There is a second important