Machine Learning and Related Disciplines

Size: px

Start display at page:

Download "Machine Learning and Related Disciplines"

Verity Simmons
5 years ago
Views:

1 Machine Learning and Related Disciplines The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang (Mon.-Fri.) Yung-Kyun Noh

2 Machine Learning Interdisciplinary work of statistics, psychology, neuroscience, physics, and mathematics. Used to be a sub-discipline of Statistics Neural networks from psychology and neuroscience (1980s, 2010s) Convex optimization (2000s) SVMs Graphical models (2000s) return to statistics. Come to have knowledge how to interpret state-of-the-art techniques theoretically. Embedding, data visualization (2000s) Number of submissions in major conferences: papers in recent 10 years 2

Universal Law of Generalization Toward a Universal Law of Generalization for Psychological Science (Shepard, Science 1987) The tercentenary of the publication, in 1687, of Newton's Principia prompts

3 Universal Law of Generalization Toward a Universal Law of Generalization for Psychological Science (Shepard, Science 1987) The tercentenary of the publication, in 1687, of Newton's Principia prompts the question of whether psychological science has any hope of achieving a law that is comparable in generality to Newton s universal law of gravitation. Exploring the direction that currently seems most favorable for an affirmative answer, I outline empirical evidence and a theoretical rationale in support of a tentative candidate for a universal law of generalization. Psychology s first general law should, I suggest, be a law of generalization. 3

4 The probability that a response learned to any stimulus will generalize to any other is an invariant monotonic function of the distance between them. To a good approximation, this probability of generalization decays exponentially with this distance, Bayesian extension: Tenenbaum & Griffiths (2001) 4

5 Exemplar Model Nadaraya-Watson estimator D = fx i ; y i g N i=1 Kernel methods K (x i ; x) jjx i xjj 5

6 Models in Science vs. Models for Prediction Feynman, R. P. (1998) the more specific a rule is, the more interesting it is. The more definite the statement, the more interesting it is to test. Box, G. E. P. (1979) All models are wrong but some are useful For such a model there is no need to ask the question "Is the model true?". If "truth" is to be the "whole truth" the answer must be "No". The only question of interest is "Is the model illuminating and useful?". 6

7 Model Confine Hypothesis Space H Estimation with large number of data Optimal solution ( ) and the selected solutions of different realizations 7

8 Model Confine Hypothesis Space H Estimation with small number of data Optimal solution ( ) and the selected solutions of different realizations 8

9 Science of Learning Algorithms for Prediction Disentangling the various approaches to supervised learning: answer questions like What are the relative strengths of the different approaches to supervised learning? What are the relative weaknesses? How do the questions they are addressing differ? How do their assumptions differ? How much do they duplicate one another? What fundamentally distinct ideas and insights do they collectively embody? What can be gained by combining techniques from the different approaches? 9 [David H. Wolpert, 1994]

10 DIMENSIONALITY IN MACHINE LEARNING 10

11 Benefits of Using High Dimensionalities Feature 1 and Feature 2 have correlation Feature 2 Feature 1 11

12 Curse of Dimensionality To achieve same density as N = 100 for 1- variable We need N = 100 D for D variables Conversely, when we have 60,000 data for 10-dimensional space, the density is the same as 3 data in 1-dimensional space. 12

13 The Manifold Ways of Perception [H. Sebastian Seung and Daniel D. Lee, Science, 2000] 13

14 Data and Manifold 14

15 Transformation Invariance [Bishop 2007] 15

16 Transformation Invariance [Bishop 2007] 16

17 Learning Manifolds with Autoencoders 17

18 Autoencoder 18

19 Denoising Autoencoder 19

20 Trivial vs. Nontrivial Autoencoder Trivial autoencoder Nontrivial autoencoder 20

21 Denoising Autoencoder With Gaussian noise, output input = [Guillaume Alain and Yoshua Bengio 2014] 21

22 Denoising Autoencoder 22

23 GENERATIVE VS. DISCRIMINATIVE METHODS 23

24 Model + Estimated Parameters Ex. Gaussian model 24

25 Homoscedastic Two-class Gaussians Common covariance matrix c = 1, 2 (boundary) Solution of 25

26 In Terms of the Posterior Class-conditional model vs. Posterior model 26

27 Logistic Regression Starts from the posterior Learn w using posterior model (instead of class-conditional model) 27

28 FDA and Logistic Regression Are the results same or not? 28

29 Generative vs. Discriminative Methods Generative methods Gaussian class-conditional model Graphical model Restricted Boltzmann Machines Discriminative methods Support Vector Machines (SVMs) Logistic regression Artificial Neural Networks 29

30 Comparative Study (1/2) Generative & Discriminative Pair Same number of parameters, same form of h(x) Risk upper bound Hybrid? Logistic regression (discriminative) Naïve Bayes (generative) sample size S. Lacoste-Julien et al. (2009) The generative and discriminative learning interface, NIPS Workshop A. Y. Ng & M. I. Jordan (2001) On discriminative vs. generative classifiers: a comparison of logistic regression and naïve Bayes, NIPS 30

31 Comparative Study (2/2) Discriminative analog of naïve Bayes (or FDA) is logistic regression The error converges to, and is no worse than linear classifier picked by naïve Bayes (or FDA). With, the parameters of are uniformly close to uniformly. The parameter convergence implies approaches. 31

32 NEURAL INFORMATION PROCESSING SYSTEMS (NIPS) CONFERENCE 32

33 33

34 Neural Information Processing Systems (NIPS) Growth 34

participants are listening to the single presentation.

35 Neural Information Processing Systems (NIPS 2015) Oral talks:15 Spotlights: 37 Accepted papers: 403 Single session: more than 3000 participants are listening to the single presentation. 7pm 12am (5hr) poster session every day Look at the poster sessio ho it does ook From Neil Lawrence s Blog 35

36 Reviewer Bias Elimination for NIPS Reviews Elimination of reviewer bias 36

37 Toronto Paper Matching System 37

38 Poster Arrangement deep learning neural networks kernel methods object recognition natural language processing 38 Topic model

39 Books Introduction to Graphical Models (Michael I. Jordan & Christopher Bishop), unpublished Pattern Recognition and Machine Learning (Information Science and Statistics) (Christopher Bishop, 2007) Machine Learning (Tom M. Mitchell, 1997) Pattern Classification (Richard O. Duda, Peter E. Hart, David G. Stork, 2000) Probabilistic Graphical Models Principles and Techniques (Daphne Koller, Nir Friedman, 2009) Machine Learning: A Probabilistic Perspective (Adaptive Computation and Machine Learning series) (Kevin P. Murphy, 2012) 39

40 Generative Models 통계적기계학습 생성모델에근거한패턴인식 스기야마마사시지음 노영균, 남현하, 김은솔옮김 서울대학교출판문화원 (SNU Press) 40

41 THANK YOU Yung-Kyun Noh 41

Introduction to Graphical Models

Introduction to Graphical Models The 15 th Winter School of Statistical Physics POSCO International Center & POSTECH, Pohang 2018. 1. 9 (Tue.) Yung-Kyun Noh GENERALIZATION FOR PREDICTION 2 Probabilistic