Do you like to be successful? Able to see the big picture

Size: px
Start display at page:

Download "Do you like to be successful? Able to see the big picture"

Transcription

1 Do you like to be successful? Able to see the big picture 1

2 Are you able to recognise a scientific GEM 2

3 How to recognise good work? suggestions please item#1 1st of its kind item#2 solve problem item#3 learning process effort put in value added 3

4 All these works resulted in Nobel Prizes Description Enrico Fermi's seminal paper on weak interaction, 1933 Reason for rejection It contained speculations too remote from reality to be of interest to the reader Hans Krebs' paper on the citric acid cycle, AKA the Krebs cycle, 1937 Nature publishing had a huge backlog Murray Gell-Mann's work on classifying the elementary particles, 1953 The invention of the radioimmunoassay, 1955 The discovery of quasicrystals, 1984 Paper outlining nuclear magnetic resonance (NMR) spectroscopy, 1966 He did not use a name editor like for his discovery reviewers were skeptical that humans could make antibodies small enough to bind to things like insulin It was rejected on the grounds that it will not interest physicists rejected twice by the J. Chem. Phys. to be finally accepted and published in the Review of Scientific Instruments 4

5 Is your idea so radical that your professor thinks you are out of your mind? Do you discover nature? Are you trying to break conventional wisdom? If you are doing deep learning for healthcare, are you listening to good doctors? engineers building a tennis racket must ask the tennis players 5

6 Generative Adversarial Networks This (GAN),... is the most interesting idea in the last 10 years in ML, in my opinion. Yann LeCun 6

7 Generative 7

8 Generative sampled noise input image generated Generator Network 8

9 Adversarial 9

10 When the generator network is not trained well Generator Network get nonsense instead of a nice picture 10

11 Training the generator via an adversarial the discriminator network Discriminator Network Generator Network 11

12 Training the generator via an adversarial the discriminator network Discriminator Network Generator Network 12

13 The original GAN diagram real data x x = g(z) Discriminator Network prediction y noise z Generator Network 13

14 Classical Supervised Learning The passive learner x+ x - SVM RandForest CNN MLP etc etc Training... 14

15 Classical Supervised Learning The passive learner It eats what you gave to it No data = dead learner Bad data = bad learner Has totally no control over data Retrain when data change a little bit Is this an intelligent machine? 15

16 Humans The active learner Select to pay attention or not pay attention Creative and perform thought experiments Generalises well, just little bit of data is ok We perform experiments to generate data Learn by analogy :: 举 反三 16

17 Passive Learner Humans Active Learner where is GAN? X X? Select to pay attention or not pay attention Creative and perform thought experiments Generalises well, just little bit of data is ok Performs experiments to generate data 17

18 Generative models and their problems 18

19 All generative models boils down to sampling data from a distribution (x 1,x 2 ) a 1 N(µ 1, 1)+a 2 N(µ 2, 2) 19

20 Sampling black / white pixels (x 1,x 2, x N ) exp( E(x 1, x N )/T ) Z Z = 2 N X x i = ±1 x 1,x 2, x N exp( E(x 1, x N )/T ) E(x 1, x N )= X number of neighboring x i with the same sign x1 x6 x11 x16 x21 x26 x2 x7 x12 x17 x22 x27 x3 x8 x13 x18 x23 x28 x4 x9 x14 x19 x24 x29 x5 x10 x15 x20 x25 x30 20

21 Sampling black / white pixels (x 1,x 2, x N ) exp(e(x 1, x N )/T ) x i = ±1 E(x 1, x N )= X number of neighboring x i with the same sign Z Z = 2 N X x 1,x 2, x N exp(e(x 1, x N )/T ) 21

22 How to sample is the key problem in generative models This one is easy (x 1,x 2 ) a 1 N(µ 1, 1)+a 2 N(µ 2, 2) This one is harder but can do with advanced methods (x 1,x 2, x N ) exp( E(x 1, x N )/T ) Z In general if we can build a distribution, theoretical or empirical distribution will do, we can find a way to sample 22

23 The difficulty is building a distribution We don t have the slightest idea how the distribution looks like How can a distribution be build in a high dimension space where data is very sparse? e.g. images with one mega pixels of greyscale [0,255], is embedded in a volume 255^{1,000,000} 23

24 Image data are certainly always very very sparse, it is believed that they lie in some very complex manifold. Many real world data too occupy infinitesimal volume of feature space and form very complex manifold. 24

25 Given a data set x i Distribution can be constructed by finding a function f(x) that is large `near data points and small elsewhere by minimising a cost function over f(x) C = Z x near x i f(x 0 )dx 0 + Z x everywhere else f(x 0 )dx 0 25

26 C = Z x near x i f(x 0 )dx 0 + Z x everywhere else f(x 0 )dx 0 f(x) f(x1,x x x 26 x

27 Given a data set x i Distribution can be constructed by finding a function f(x) that is large `near data points and small elsewhere by minimising a cost function over f(x) C = Z x near x i f(x 0 )dx 0 + Z x everywhere else f(x 0 )dx 0 Two difficulties here: Defining `near - regularisation issue High dimensional integrals, 1,000,000 dimension for image data 27

28 Neural networks offer a pseudo-solution 28

29 Lets start by reviewing the classical supervise learning on two classes x+ Classifier high value e.g

30 Lets start by reviewing the classical supervise learning on two classes x - Classifier low value e.g

31 x i Classifier w f w (x i ) Minimize a cost function with respect to C(w) = 1 N + N + X i f w (x i )+ 1 N w NX j f w (x j ) An example C(w) = 1 N + X i log(d w (x i )) + 1 N X j 1 log(1 D w (x j )) A D w (x i ) 2 (0, 1) 31

32 f(x1,x2) x1 x2 32

33 f(x1,x2) x2 x1 x2 x1 33

34 Idea of GAN (Generative Adversarial Networks) Given a set of example real data Generate fake data that looks like read data Generator is a neural network parametrised by Sample a set of random noise z i i =1, N Generate set of fake data from sampled noise g (z i ) i =1, N 34

35 Generator Network z 1 z 2... g (z 1 ) g (z 2 )... z N g (z N ) If we adjust well, g (z i ) will look like this 35

36 Training Generator network using Discriminator network Discriminator network needs positive and negative training examples x i Discriminator w f w (x i ) C(w) = 1 N + N + X i f w (x i )+ 1 N NX j f w (x j ) put real data examples here put fake generated data here 36

37 First train the Discriminator network C(w, ) = 1 N + N + X i f w (x i )+ 1 N NX j f w (g (z j )) put real data examples here put fake generated data here w = arg min w C(w, ) keeping constant Discriminator network is trained to grade generated data as negative It criticised any imperfections of the generator 37

38 Discriminator : x -> 10 -> 10 -> 10 -> out 38

39 Next train the Generator network C(w, ) = 1 N + N + X i f w (x i )+ 1 N NX j f w (g (z j )) put real data examples here put fake generated data here w = arg min w C(w, ) keeping constant = arg max C(w, ) keeping w constant 39

40 Generator : x -> 5 -> 5 -> 5 -> out 40

41 C(w, ) = Cost function use in original GAN paper 1 N + X i log(d w (x i )) + 1 N X j 1 log(1 D w (g (z j ))) A w = arg min w C(w, ) keeping constant = arg max C(w, ) keeping w constant w 1.Randomly initialise and 2.Use g (z i ) to generate fake data 3.Keep w constant and optimise for some n iterations 4.Keep constant and optimise for some k iterations w 41

42 Discriminator : x -> 10 -> 10 -> 10 -> out Generator : x -> 5 -> 5 -> 5 -> out 42

43 Discriminator : x -> 10 -> 10 -> 10 -> out Generator : x -> 5 -> 5 -> 5 -> out 43

44 Shortcomings of GAN 44

45 When two adversarial of equal strength competes both can improve 45

46 Discriminator Generator 46

47 Discriminator fail Discriminator : x -> 10 -> 10 -> 10 -> out Generator : x -> 5 -> 5 -> 5 -> out 47

48 Generator fail due to zero gradient 48

49 Generator fail due to zero gradient (surrounded) 49

50 Mode collapse 50

51 Improved Techniques for Training GANs T. Salimans, I Goodfellow et al 51

52 Generator Network 1. Feature matching for training the generator Discriminator Network ~f(x) Cost function for generator C g (w, ) = 1 N X ~f w (x i ) i 1 M X j ~f w (g (z j )) 2 2 = arg min C g (w, ) 52

53 Discriminator cost function use in original GAN paper C(w, ) = 1 N + X i log(d w (x i )) + 1 N X j 1 log(1 D w (g (z j ))) A = arg max C(w, ) keeping w constant 53

54 2. Minibatch discrimination for training the discriminator ~f(x i ) T ~o(x i ) Read paper for details o contains information on how this data point cluster with other data points in the minibatch Discriminator Network ~f(x) Concatenate features to create a longer feature ~f(xi ), ~o(x i ) Put this new feature back into subsequent layers of discriminator 54

55 3. Historical averaging Add a term in cost function 1 t X i i 2 Effect, at every step, theta does not change too much in magnitude and direction 55

56 4. One-sided label smoothing Changes in discriminator training Instead of asking discriminator to output 1 for real data, ask discriminator to output a softer value like 0.9 for real data. Fake data still has label 0 Encourage a softer decision boundary, e.g. don t need to saturate out the sigmoid and make gradient zero 56

57 5. Virtual batch normalisation Instead of normalising differently for each minibatch, use a reference batch so that normalisation is consistent across minbatches. What I would do is to use one statistics for normalisation across all data, e.g. compute z-score using one mean and one variance What paper proposed is to recalculate mean and variance for each data point, which will increase computational time a lot 57

58 Wasserstein GAN 58

59 Earth Mover s Distance cost =

60 Earth Mover s Distance A B C D E cost = 14 A(2) B(2) C(3) D(3) E(4)

61 Earth Mover s Distance A B C D E cost = 12 A(2) C(1) E(1) B(3) D(5)

62 Earth Mover s Distance A B C D E F G H I B(3) D(3) C(5) cost = 22 G(0) H(0) F(3) E(2) I(0) A(6)

63 Distribution of real and fake data P r (x) is the distribution of real data, it is constant P (g (z)) is the distribution of generated data it changes when generator change W (P r (x),p (g (z))) is the Earth Mover s Distance between real and generated data 63

64 Idea of Wasserstein GAN No discriminator (in the paper they use the word critic ) Given P r (x) Initialize g (z i ) and generate P (g (z)) Adjust generator such that W (P r (x),p (g (z))) is minimised 64

65 Kantorovich-Rubinstein duality I did not prove it 0 1 W (P r,p )= sup 1 N X i f(x i ) 1 M X j f(g(z j )) A over 1-Lipschitz functions (loosely speaking gradient does not exceed one) Sample from P r (x) Sample from P (g (z)) 65

66 Kantorovich-Rubinstein duality I did not prove it 0 1 W (P r,p )= sup 1 N X i f(x i ) 1 M X j f(g(z j )) A Sample from P r (x) Sample from P (g (z)) This is a constrained optimisation problem! 66

67 Representation of 1-Lipschitz function One way to parameterise a function is to use a neural network. By adjusting weights, we create different functions. When network is small, class of function obtained is limited. When network is deep, we can represent more functions. 67

68 W (P r,p )= sup kfk<1 1 N X i f(x i ) 1 M X j 1 f(g(z j )) A Represent f( ) by a neural network f w ( ) Use gradient ascend to find W (P r (x),p (g (z))) 68

69 Algorithm Initialize w and Generates P (g (z)) (1) Maximize by adjusting w 0 W (P r,p )= 1 X f w (x i ) kfk<1 N i 1 M X j 1 f w (g(z j )) A (2) Minimize Earth Mover s distance arg min W (P r,p ) (3) Repest (1) & (2) 69

70

71 Assignment #3: There is something strange about this algorithm. What is it?

72 Deep Convolutional GAN A. Radford, L. Metz, S. Chintala Unsupervised Representation Learning with Deep Convolutional GAN, ICLR

73 one epoch 73

74 5 epoch 74

75 one epoch 5 epoch 75

76 Generator 76

77 77

78 Every 10 iterations Final generated digits 78

79 Every 10 iterations Final generated digits 79

80 Every 10 iterations Final generated digits 80

81 Every 10 iterations Final generated digits 81

82 BiGAN Bidirectional GAN 82

83 x Discriminator Network y z Generator Network x = g(z) 83

84 Give z, generate x noise z Generator Network x = g(z) how about give x, generate z z = E(x) x Encoder Network 84

85 x Discriminator Network y z Generator Network x = g(z) 85

86 z = E(x) Encoder Network x Discriminator Network y z Generator Network x = g(z) 86

87 z = E(x) Encoder Network x x,z x,z Discriminator Network y z Generator Network x = g(z) 87

88 C(w,, )= 1 N + X i Cost function log{d w (x i,e (x i ))} + 1 N X j 1 log{1 D w (g (z j ),z j } A max, min w C(w,, ) Adjust generator and encoder such that all fake data looks like real ones g (z j )=x 0! x E (x i )=z 0! z By this procedure, g(e (x i ))! x i E g 1 88

89 x g(e(x)) x g(e(x)) 89

90 x g(e(x)) x g(e(x)) x g(e(x)) x g(e(x)) 90

91 Assignment #3 we will announce this weekend 91

92 92

Lecture 14: Deep Generative Learning

Lecture 14: Deep Generative Learning Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han

More information

Generative adversarial networks

Generative adversarial networks 14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Nishant Gurnani. GAN Reading Group. April 14th, / 107

Nishant Gurnani. GAN Reading Group. April 14th, / 107 Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,

More information

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)

More information

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:

More information

Adversarial Examples

Adversarial Examples Adversarial Examples presentation by Ian Goodfellow Deep Learning Summer School Montreal August 9, 2015 In this presentation. - Intriguing Properties of Neural Networks. Szegedy et al., ICLR 2014. - Explaining

More information

Wasserstein GAN. Juho Lee. Jan 23, 2017

Wasserstein GAN. Juho Lee. Jan 23, 2017 Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)

More information

Generative Adversarial Networks. Presented by Yi Zhang

Generative Adversarial Networks. Presented by Yi Zhang Generative Adversarial Networks Presented by Yi Zhang Deep Generative Models N(O, I) Variational Auto-Encoders GANs Unreasonable Effectiveness of GANs GANs Discriminator tries to distinguish genuine data

More information

Classification: The rest of the story

Classification: The rest of the story U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher

More information

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks

Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Emily Denton 1, Soumith Chintala 2, Arthur Szlam 2, Rob Fergus 2 1 New York University 2 Facebook AI Research Denotes equal

More information

6.036 midterm review. Wednesday, March 18, 15

6.036 midterm review. Wednesday, March 18, 15 6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that

More information

arxiv: v1 [cs.lg] 20 Apr 2017

arxiv: v1 [cs.lg] 20 Apr 2017 Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).

More information

Generative Adversarial Networks

Generative Adversarial Networks Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo

More information

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,

Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, 2016-08-31 Generative Modeling Density estimation Sample generation

More information

Logistic Regression. COMP 527 Danushka Bollegala

Logistic Regression. COMP 527 Danushka Bollegala Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will

More information

Singing Voice Separation using Generative Adversarial Networks

Singing Voice Separation using Generative Adversarial Networks Singing Voice Separation using Generative Adversarial Networks Hyeong-seok Choi, Kyogu Lee Music and Audio Research Group Graduate School of Convergence Science and Technology Seoul National University

More information

Gaussian and Linear Discriminant Analysis; Multiclass Classification

Gaussian and Linear Discriminant Analysis; Multiclass Classification Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015

More information

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.

Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted

More information

Experiments on the Consciousness Prior

Experiments on the Consciousness Prior Yoshua Bengio and William Fedus UNIVERSITÉ DE MONTRÉAL, MILA Abstract Experiments are proposed to explore a novel prior for representation learning, which can be combined with other priors in order to

More information

MMD GAN 1 Fisher GAN 2

MMD GAN 1 Fisher GAN 2 MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber

More information

Deep Learning for Computer Vision

Deep Learning for Computer Vision Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science

More information

GENERATIVE ADVERSARIAL LEARNING

GENERATIVE ADVERSARIAL LEARNING GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate

More information

Notes on Adversarial Examples

Notes on Adversarial Examples Notes on Adversarial Examples David Meyer dmm@{1-4-5.net,uoregon.edu,...} March 14, 2017 1 Introduction The surprising discovery of adversarial examples by Szegedy et al. [6] has led to new ways of thinking

More information

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17

Neural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17 3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural

More information

Machine Learning Lecture 5

Machine Learning Lecture 5 Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory

More information

J. Sadeghi E. Patelli M. de Angelis

J. Sadeghi E. Patelli M. de Angelis J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of

More information

Machine Learning. Lecture 9: Learning Theory. Feng Li.

Machine Learning. Lecture 9: Learning Theory. Feng Li. Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell

More information

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping

More information

Deep Generative Models. (Unsupervised Learning)

Deep Generative Models. (Unsupervised Learning) Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What

More information

Training Generative Adversarial Networks Via Turing Test

Training Generative Adversarial Networks Via Turing Test raining enerative Adversarial Networks Via uring est Jianlin Su School of Mathematics Sun Yat-sen University uangdong, China bojone@spaces.ac.cn Abstract In this article, we introduce a new mode for training

More information

STA 414/2104: Lecture 8

STA 414/2104: Lecture 8 STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA

More information

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6

Administration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6 Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects

More information

Generative Adversarial Networks, and Applications

Generative Adversarial Networks, and Applications Generative Adversarial Networks, and Applications Ali Mirzaei Nimish Srivastava Kwonjoon Lee Songting Xu CSE 252C 4/12/17 2/44 Outline: Generative Models vs Discriminative Models (Background) Generative

More information

GANs. Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG

GANs. Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG GANs Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG Machine Learning: Jordan Boyd-Graber UMD GANs 1 / 7 Problems with Generation Generative Models Ain t Perfect

More information

Introduction to Convolutional Neural Networks (CNNs)

Introduction to Convolutional Neural Networks (CNNs) Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei

More information

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017

Non-Convex Optimization. CS6787 Lecture 7 Fall 2017 Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper

More information

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction

Linear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the

More information

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS

A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a

More information

Neural Networks and Deep Learning

Neural Networks and Deep Learning Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost

More information

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK

Demystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK Demystifying deep learning Petar Veličković Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK London Data Science Summit 20 October 2017 Introduction

More information

Multiclass Classification-1

Multiclass Classification-1 CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass

More information

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann (Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for

More information

Security Analytics. Topic 6: Perceptron and Support Vector Machine

Security Analytics. Topic 6: Perceptron and Support Vector Machine Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter

More information

Linear Models for Classification

Linear Models for Classification Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,

More information

Learning From Data: Modelling as an Optimisation Problem

Learning From Data: Modelling as an Optimisation Problem Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify

More information

OPTIMIZATION METHODS IN DEEP LEARNING

OPTIMIZATION METHODS IN DEEP LEARNING Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate

More information

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)

Machine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods) Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March

More information

CS534 Machine Learning - Spring Final Exam

CS534 Machine Learning - Spring Final Exam CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the

More information

Multi-Layer Boosting for Pattern Recognition

Multi-Layer Boosting for Pattern Recognition Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting

More information

Image classification via Scattering Transform

Image classification via Scattering Transform 1 Image classification via Scattering Transform Edouard Oyallon under the supervision of Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, Classification of signals Let n>0, (X, Y ) 2 R

More information

Learning from Data Logistic Regression

Learning from Data Logistic Regression Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9

More information

Automatic Differentiation and Neural Networks

Automatic Differentiation and Neural Networks Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield

More information

What is semi-supervised learning?

What is semi-supervised learning? What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,

More information

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation

Intro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor

More information

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang

Deep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i

More information

Introduction to Machine Learning Midterm Exam Solutions

Introduction to Machine Learning Midterm Exam Solutions 10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,

More information

Logistic Regression. Machine Learning Fall 2018

Logistic Regression. Machine Learning Fall 2018 Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes

More information

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders

More information

Convolutional Neural Networks

Convolutional Neural Networks Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»

More information

Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018

Some theoretical properties of GANs. Gérard Biau Toulouse, September 2018 Some theoretical properties of GANs Gérard Biau Toulouse, September 2018 Coauthors Benoît Cadre (ENS Rennes) Maxime Sangnier (Sorbonne University) Ugo Tanielian (Sorbonne University & Criteo) 1 video Source:

More information

CSCI-567: Machine Learning (Spring 2019)

CSCI-567: Machine Learning (Spring 2019) CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March

More information

CMSC858P Supervised Learning Methods

CMSC858P Supervised Learning Methods CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors

More information

Advanced computational methods X Selected Topics: SGD

Advanced computational methods X Selected Topics: SGD Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety

More information

Empirical Risk Minimization

Empirical Risk Minimization Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space

More information

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści

Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?

More information

When Dictionary Learning Meets Classification

When Dictionary Learning Meets Classification When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August

More information

Understanding GANs: Back to the basics

Understanding GANs: Back to the basics Understanding GANs: Back to the basics David Tse Stanford University Princeton University May 15, 2018 Joint work with Soheil Feizi, Farzan Farnia, Tony Ginart, Changho Suh and Fei Xia. GANs at NIPS 2017

More information

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single

More information

Dreem Challenge report (team Bussanati)

Dreem Challenge report (team Bussanati) Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the

More information

Reinforcement Learning

Reinforcement Learning Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:

More information

Image Processing 2. Hakan Bilen University of Edinburgh. Computer Graphics Fall 2017

Image Processing 2. Hakan Bilen University of Edinburgh. Computer Graphics Fall 2017 Image Processing 2 Hakan Bilen University of Edinburgh Computer Graphics Fall 2017 This week What is an image? What is image processing? Point processing Linear (Spatial) filters Frequency domain Deep

More information

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning

SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are

More information

Brief Introduction to Machine Learning

Brief Introduction to Machine Learning Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector

More information

Qualifying Exam in Machine Learning

Qualifying Exam in Machine Learning Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts

More information

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).

Classification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1). Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes

More information

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017

Simple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017 Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient

More information

CMU-Q Lecture 24:

CMU-Q Lecture 24: CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma

COMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released

More information

Intelligent Systems Discriminative Learning, Neural Networks

Intelligent Systems Discriminative Learning, Neural Networks Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm

More information

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie

A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab

More information

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan

Linear Regression. CSL603 - Fall 2017 Narayanan C Krishnan Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization

More information

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan

Linear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis

More information

ECE-271B. Nuno Vasconcelos ECE Department, UCSD

ECE-271B. Nuno Vasconcelos ECE Department, UCSD ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative

More information

Day 3 Lecture 3. Optimizing deep networks

Day 3 Lecture 3. Optimizing deep networks Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient

More information

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.

Sparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation. ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent

More information

ANLP Lecture 22 Lexical Semantics with Dense Vectors

ANLP Lecture 22 Lexical Semantics with Dense Vectors ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous

More information

COMP 562: Introduction to Machine Learning

COMP 562: Introduction to Machine Learning COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu

More information

Cheng Soon Ong & Christian Walder. Canberra February June 2018

Cheng Soon Ong & Christian Walder. Canberra February June 2018 Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression

More information

Midterm Exam, Spring 2005

Midterm Exam, Spring 2005 10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name

More information

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable

More information

Linear Classifiers and the Perceptron

Linear Classifiers and the Perceptron Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible

More information

Linear classifiers: Logistic regression

Linear classifiers: Logistic regression Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The

More information

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.

More information

Linear and Logistic Regression. Dr. Xiaowei Huang

Linear and Logistic Regression. Dr. Xiaowei Huang Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics

More information

18.9 SUPPORT VECTOR MACHINES

18.9 SUPPORT VECTOR MACHINES 744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the

More information

Tufts COMP 135: Introduction to Machine Learning

Tufts COMP 135: Introduction to Machine Learning Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)

More information

How to Measure the Objectivity of a Test

How to Measure the Objectivity of a Test How to Measure the Objectivity of a Test Mark H. Moulton, Ph.D. Director of Research, Evaluation, and Psychometric Development Educational Data Systems Specific Objectivity (Ben Wright, Georg Rasch) Rasch

More information