Few-shot learning with KRR

Size: px

Start display at page:

Download "Few-shot learning with KRR"

Dina Gibbs
6 years ago
Views:

1 Few-shot learning with KRR Prudencio Tossou Groupe de Recherche en Apprentissage Automatique Départment d informatique et de génie logiciel Université Laval April 6, 2018 Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

2 Motivation Plan 1 Motivation 2 Problem statement 3 Our approach: MetaKRR 4 Experiments 5 Future works and conclusion Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

3 Motivation The future of ML Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

4 Motivation What few learning is trying to do? By leveraging past learning experiences Distribution T T Family F = {T 1, T 2,..., T i...} T Meta-train {D 1, D 2,..., D M } Through META-LEARNING T T k Meta-learning algorithm The past gives a strong prior knowledge Training data D train Meta-model or learning algorithm A Input x Model h If one use it, things can be done more efficiently in the present Output h(x) Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

5 Motivation Few-shot regression Recent works focus on classification and reinforcement learning Not much experiments with regression datasets Versus classification: harder to generalize from few examples Applications: drug discovery > Drugs at lower costs recommender systems > Deal with products with few ratings Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

6 Problem statement Plan 1 Motivation 2 Problem statement 3 Our approach: MetaKRR 4 Experiments 5 Future works and conclusion Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

7 Problem statement The objective Distribution T T Family F = {T 1, T 2,..., T i...} T The optimal meta-model Meta-train {D 1, D 2,..., D M } A = argmin A E T T E D train T k E L(A(D train)(x), y) (x,y) T T T k Meta-learning algorithm Few-shot regression Quadratic loss L(y, y ) = ( y y ) 2 Training data D train Meta-model or learning algorithm A D train 20 Input x Model h Output h(x) Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

8 Problem statement In practice (1) Distribution T T Family F = {T 1, T 2,..., T i...} T The meta-datasets Sample a distribution T i from F For each T i sample a D i Split the resulting collection of datasets in 3 partitions: D meta train for training D meta valid for hyper-parameter selection D meta test for unbiased evaluation of the meta-model T T k Training data D train Input x Meta-train {D 1, D 2,..., D M } Meta-learning algorithm Meta-model or learning algorithm A Model h Output h(x) Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

9 Problem statement In practice (2) Episodic training Initialize Θ Loop Distribution T T Family F = {T 1, T 2,..., T i...} T Meta-train {D 1, D 2,..., D M } Sample an D i from D meta train Sample D train and D test of k examples each from D i Compute h := A(D train, Θ) Estimate the loss of h on D test T T k Training data D train Meta-learning algorithm Meta-model or learning algorithm A Update Θ Input x Model h An episode A pair of D train and D test from a D i Output h(x) Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

10 Our approach: MetaKRR Plan 1 Motivation 2 Problem statement 3 Our approach: MetaKRR 4 Experiments 5 Future works and conclusion Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

11 Our approach: MetaKRR The meta-model Tandem combination Feature extractor φ : X K shared by all tasks Regression Algorithm > h(x) = w φ(x), w K Feature extractor Could be anything CNN for images LSTM for sequences FC for vectors, etc Parameters to be found during the episodic training Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

12 Our approach: MetaKRR The model (1) The regression algorithm should aim for generalization (SRM) [4] Given D train, w D train = argmin w The optimal solution is given by KRR[3] (x,y) D train (w φ(x) y) 2 + λ w 2 2, w = k α i φ(x i ), with α = (α 1, α 2,..., α k ) T = (K + λi ) 1 y, i=1 K ij = φ(x i ) φ(x j ), where i = 1... k, j = 1... k Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

13 Our approach: MetaKRR The model (2) Advantages of KRR Closed form Few-shot > solving the dual system is highly advantageous over the primal Drawbacks Fine tuning regularizer and kernel hyper-parameters Cross-validation and validation set: costly and need more data Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

14 Our approach: MetaKRR Selection of regression hyper-parameters Option 1: Episode dependant FC network g to predict the right values inputs = sufficient statistics of the training examples of D train statistics = mean, std, max, min of {y 1, y 2,..., y k } and {φ(x 1 ), φ(x 2 ),..., φ(x k )} For example, for a given D train, the KRR regularizer is given by: a if x < a exp (HardTanh a,b (g(d train ))), with HardTanh a,b (x) = x if a x b b if x > b Option 2: Same for all episodes Associate a parameter to each Hp Find the right value during back propagation Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

15 Our approach: MetaKRR MetaKRR: the training with option 1 Pseudo-code Initialize Θ of φ and Λ of g Loop Sample an D i from D meta train Sample a D train and D test from D i Transform all inputs with φ Compute λ train with g Solve KRR to find w, thus h Compute the quadratic loss of h on D test Back-propagate the loss and update Θ and Λ Other details Train for 20K episodes Use D meta valid to select the best model Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

16 Experiments Plan 1 Motivation 2 Problem statement 3 Our approach: MetaKRR 4 Experiments 5 Future works and conclusion Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

17 Experiments Datasets MHC class II peptides Task: predict the binding energy of a peptide to a protein (MHC II complex). Collection of 14 datasets, one per protein Each dataset has from 500 to 5K examples Input = peptide (string) Output=energy to a MHC protein 14 few-shot regression tasks CNN feature extractor Binding molecules Task: predict the binding affinity of small molecules to a protein Collection of 3741 regression task each related to a protein and an organism Input = molecule SMILES (string) Output = binding affinity Meta-train, meta-valid and meta-test contain 2104, 702 and 935 tasks CNN feature extractor Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

18 Experiments Results on MHC class II peptides MetaKRR-g MetaKRR-u MAML[1] MANN[2] pretrain Test complex DRB1* DRB1* DRB1* DRB1* DRB1* DRB1* DRB1* DRB1* DRB1* DRB1* DRB1* DRB3* DRB4* DRB5* Average Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

19 Experiments Results on BindingDB MetaKRR versus MAML MetaKRR versus MANN Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

20 Experiments Discussion MAML MANN Hard to find the best initialization point. Easy to classify with but harder for regression Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

21 Future works and conclusion Plan 1 Motivation 2 Problem statement 3 Our approach: MetaKRR 4 Experiments 5 Future works and conclusion Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

22 Future works and conclusion Future works Select the right values with the network g within a grid of hyper-parameters Include in our experiments a recommender system dataset : Netflix challenge Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

23 Future works and conclusion Conclusion We have introduced MetaKRR, a few-shot regression algorithm State of the art performances Three key ideas: Leverage past experiences to find the most appropriate mapping function Use the structural risk minimization to enforce generalization Leverage past experiences to choose adequately the trade-off inside the SRM Not new ideas but they graciously combine together to give the MetaKRR Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

24 Future works and conclusion References [1] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks. arxiv preprint arxiv: , [2] A. Santoro, S. Bartunov, M. Botvinick, D. Wierstra, and T. Lillicrap. Meta-learning with memory-augmented neural networks. In International conference on machine learning, pages , [3] C. Saunders, A. Gammerman, and V. Vovk. Ridge regression learning algorithm in dual variables [4] V. Vapnik. Principles of risk minimization for learning theory. In Advances in neural information processing systems, pages , Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

25 Questions Thanks for your attention Prudencio Tossou (UL) Few-shot learning with KRR April 6, / 25

Reptile: a Scalable Metalearning Algorithm

Reptile: a Scalable Metalearning Algorithm Alex Nichol and John Schulman OpenAI {alex, joschu}@openai.com Abstract This paper considers metalearning problems, where there is a distribution of tasks, and