Visual meta-learning for planning and control

Size: px

Start display at page:

Download "Visual meta-learning for planning and control"

Job Gibson
5 years ago
Views:

1 Visual meta-learning for planning and control Seminar on Current Works in Computer Chair of Pattern Recognition and Image Processing. Samuel Roth Winter Semester 2018/19 Albert-Ludwigs-Universität Freiburg

2 Content Motivation, Goal. Meta learning. Concept acquisition through meta learning (CAML). Implementation of CAML. Use observation classifier as reward functions. Experimental results. 1

3 Introduction - Motivation Motivation Reinforcement Learning needs rewards for learning. 2

4 Introduction - Motivation Example Tasks Motivation Reinforcement Learning needs rewards for learning. 2

5 Introduction - Motivation Example Tasks Motivation Reinforcement Learning needs rewards for learning. Specifying rewards are often hand crafted and domain specific. 2

6 Introduction - Motivation Example Tasks Motivation Reinforcement Learning needs rewards for learning. Specifying rewards are often hand crafted and domain specific. Learning rewards for a specific tasks requires large labeled dataset for training. 2

7 Introduction - Goal and Approach Goal Derive reward function. From camera input. Only few positive examples of success. Utilize information given by other tasks. 3

8 Introduction - Goal and Approach Goal Derive reward function. From camera input. Only few positive examples of success. Utilize information given by other tasks. Meta learning approach Learn to learn Train a model to classify tasks for multiple different tasks. Expect the model to learn internal representations. For a new task is given Model has a good initialization. 3

9 Introduction - Framework 4

10 Introduction - Framework 4

11 Meta Learning for few shot objective learning Concept Given a dataset of labeled observations over multiple tasks D i := {(o j, y j ) j = 1,, k; y j {0, 1}} A meta-learner f L and a classifier f C With the objective to f C = f L (D + new ; θ) : Observation [0, 1] min θ n i (o j,y j ) D train i L ( y j, f L (D + i ; θ)(o j ) ) 5

12 CAML - Concept Acquisition through meta learning. 6

13 CAML - Concept Acquisition through meta learning. Incorporate gradient descent for parameter adaption. 6

14 CAML - Concept Acquisition through meta learning. Incorporate gradient descent for parameter adaption. Adapt θ to new task τ: g(θ, D τ + ) := θ α θ L ( y i, f CAML (θ)(o i ) ) = θ τ (o i,y i ) D + τ 6

15 CAML - Concept Acquisition through meta learning. Incorporate gradient descent for parameter adaption. Adapt θ to new task τ: g(θ, D τ + ) := θ α θ L ( y i, f CAML (θ)(o i ) ) = θ τ (o i,y i ) D + τ Initial θ learned in meta training: min θ n i (o j,y j ) D train τ i ( L yj, f CAML (g(θ, D τ + i ))(o j ) ) 6

16 CAML - Algorithm. Algorithm 1 CAML training phase Require: α, β learning rate Hyper-parameters. k examples size (k-shot). 1: Randomly initialize θ 2: while not done do 3: Sample batch of tasks of size k. 4: for all τ i in batch do 5: Sample Di train and D + i s.t. Di train D + i = 6: Calculate task parameters with gradient decent: 7: θ i = θ α θ (o j,y j ) D L ( y + j, f CAML (θ)(o j ) ) i 8: end for 9: Update θ θ β θ τ i batch 10: end while (o j,y j ) D L ( y i train j, f CAML (θ i )(o j ) ) 7

17 CAML - Algorithm. Algorithm 2 CAML test phase Require: D + new positive examples of success for a new task. Require: θ Pre-trained meta-learner parameters. Require: α learning rate Hyper-parameters. 1: Adapt parameters with gradient decent: 2: θ = θ α θ (o j,y j ) D + new L( y j, f CAML (θ)(o j ) ) 3: return f CAML (θ ) 8

18 From observation classifier to rewards. f C (o) [0, 1] interpreted as probability of o to ba a successful execution of the task. Directly use the f C output as reward. To reduce the effect of false positives. Use a threshold t (0, 1) and set f C (o) 0 if f C (o) < t. 9

19 Experimental results - Single task relative position 10

20 Experimental results - Multi task relative position 11

21 Experimental evaluation competitors Pixel distance: Measure the l 2 distance between observation and successful observation. Latent space distance: Measure distance in a learned latent space using an auto encoder. Here Deep spatial autoencoders for visuomotor learning by Finn et al Oracle: Use the ground truth if available to get an upper bound of performance (only applied in the rope manipulation domain). 12

22 Experimental results - Relative position 13

23 Experimental results - Relative position planning costs 14

24 Experimental results - Rope manipulation 15

25 Experimental results - Rope manipulation 15

26 Experimental results - 3D navigation 16

27 Experimental results - 3D navigation 16

28 Conclusion 17

29 Conclusion Model that can achieve rapid adaptation. 17

30 Conclusion Model that can achieve rapid adaptation. Internal representation that is broadly suitable to many tasks. 17

31 Conclusion Model that can achieve rapid adaptation. Internal representation that is broadly suitable to many tasks. If tasks have distinct visual concept unlikely that meta-learning will acquire useful knowledge. 17

32 Conclusion Model that can achieve rapid adaptation. Internal representation that is broadly suitable to many tasks. If tasks have distinct visual concept unlikely that meta-learning will acquire useful knowledge. Experiments showed good performance on vision-based manipulation skills. 17

33 Thanks! 17

34 Bibliography Finn, C., Abbeel, P., and Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. arxiv preprint arxiv: Xie, A., Singh, A., Levine, S., and Finn, C. (2018). Few-shot goal inference for visuomotor learning and planning. CoRR, abs/

Few-shot learning with KRR

Few-shot learning with KRR Prudencio Tossou Groupe de Recherche en Apprentissage Automatique Départment d informatique et de génie logiciel Université Laval April 6, 2018 Prudencio Tossou (UL) Few-shot