Do you like to be successful? Able to see the big picture
|
|
- Sabina Barker
- 5 years ago
- Views:
Transcription
1 Do you like to be successful? Able to see the big picture 1
2 Are you able to recognise a scientific GEM 2
3 How to recognise good work? suggestions please item#1 1st of its kind item#2 solve problem item#3 learning process effort put in value added 3
4 All these works resulted in Nobel Prizes Description Enrico Fermi's seminal paper on weak interaction, 1933 Reason for rejection It contained speculations too remote from reality to be of interest to the reader Hans Krebs' paper on the citric acid cycle, AKA the Krebs cycle, 1937 Nature publishing had a huge backlog Murray Gell-Mann's work on classifying the elementary particles, 1953 The invention of the radioimmunoassay, 1955 The discovery of quasicrystals, 1984 Paper outlining nuclear magnetic resonance (NMR) spectroscopy, 1966 He did not use a name editor like for his discovery reviewers were skeptical that humans could make antibodies small enough to bind to things like insulin It was rejected on the grounds that it will not interest physicists rejected twice by the J. Chem. Phys. to be finally accepted and published in the Review of Scientific Instruments 4
5 Is your idea so radical that your professor thinks you are out of your mind? Do you discover nature? Are you trying to break conventional wisdom? If you are doing deep learning for healthcare, are you listening to good doctors? engineers building a tennis racket must ask the tennis players 5
6 Generative Adversarial Networks This (GAN),... is the most interesting idea in the last 10 years in ML, in my opinion. Yann LeCun 6
7 Generative 7
8 Generative sampled noise input image generated Generator Network 8
9 Adversarial 9
10 When the generator network is not trained well Generator Network get nonsense instead of a nice picture 10
11 Training the generator via an adversarial the discriminator network Discriminator Network Generator Network 11
12 Training the generator via an adversarial the discriminator network Discriminator Network Generator Network 12
13 The original GAN diagram real data x x = g(z) Discriminator Network prediction y noise z Generator Network 13
14 Classical Supervised Learning The passive learner x+ x - SVM RandForest CNN MLP etc etc Training... 14
15 Classical Supervised Learning The passive learner It eats what you gave to it No data = dead learner Bad data = bad learner Has totally no control over data Retrain when data change a little bit Is this an intelligent machine? 15
16 Humans The active learner Select to pay attention or not pay attention Creative and perform thought experiments Generalises well, just little bit of data is ok We perform experiments to generate data Learn by analogy :: 举 反三 16
17 Passive Learner Humans Active Learner where is GAN? X X? Select to pay attention or not pay attention Creative and perform thought experiments Generalises well, just little bit of data is ok Performs experiments to generate data 17
18 Generative models and their problems 18
19 All generative models boils down to sampling data from a distribution (x 1,x 2 ) a 1 N(µ 1, 1)+a 2 N(µ 2, 2) 19
20 Sampling black / white pixels (x 1,x 2, x N ) exp( E(x 1, x N )/T ) Z Z = 2 N X x i = ±1 x 1,x 2, x N exp( E(x 1, x N )/T ) E(x 1, x N )= X number of neighboring x i with the same sign x1 x6 x11 x16 x21 x26 x2 x7 x12 x17 x22 x27 x3 x8 x13 x18 x23 x28 x4 x9 x14 x19 x24 x29 x5 x10 x15 x20 x25 x30 20
21 Sampling black / white pixels (x 1,x 2, x N ) exp(e(x 1, x N )/T ) x i = ±1 E(x 1, x N )= X number of neighboring x i with the same sign Z Z = 2 N X x 1,x 2, x N exp(e(x 1, x N )/T ) 21
22 How to sample is the key problem in generative models This one is easy (x 1,x 2 ) a 1 N(µ 1, 1)+a 2 N(µ 2, 2) This one is harder but can do with advanced methods (x 1,x 2, x N ) exp( E(x 1, x N )/T ) Z In general if we can build a distribution, theoretical or empirical distribution will do, we can find a way to sample 22
23 The difficulty is building a distribution We don t have the slightest idea how the distribution looks like How can a distribution be build in a high dimension space where data is very sparse? e.g. images with one mega pixels of greyscale [0,255], is embedded in a volume 255^{1,000,000} 23
24 Image data are certainly always very very sparse, it is believed that they lie in some very complex manifold. Many real world data too occupy infinitesimal volume of feature space and form very complex manifold. 24
25 Given a data set x i Distribution can be constructed by finding a function f(x) that is large `near data points and small elsewhere by minimising a cost function over f(x) C = Z x near x i f(x 0 )dx 0 + Z x everywhere else f(x 0 )dx 0 25
26 C = Z x near x i f(x 0 )dx 0 + Z x everywhere else f(x 0 )dx 0 f(x) f(x1,x x x 26 x
27 Given a data set x i Distribution can be constructed by finding a function f(x) that is large `near data points and small elsewhere by minimising a cost function over f(x) C = Z x near x i f(x 0 )dx 0 + Z x everywhere else f(x 0 )dx 0 Two difficulties here: Defining `near - regularisation issue High dimensional integrals, 1,000,000 dimension for image data 27
28 Neural networks offer a pseudo-solution 28
29 Lets start by reviewing the classical supervise learning on two classes x+ Classifier high value e.g
30 Lets start by reviewing the classical supervise learning on two classes x - Classifier low value e.g
31 x i Classifier w f w (x i ) Minimize a cost function with respect to C(w) = 1 N + N + X i f w (x i )+ 1 N w NX j f w (x j ) An example C(w) = 1 N + X i log(d w (x i )) + 1 N X j 1 log(1 D w (x j )) A D w (x i ) 2 (0, 1) 31
32 f(x1,x2) x1 x2 32
33 f(x1,x2) x2 x1 x2 x1 33
34 Idea of GAN (Generative Adversarial Networks) Given a set of example real data Generate fake data that looks like read data Generator is a neural network parametrised by Sample a set of random noise z i i =1, N Generate set of fake data from sampled noise g (z i ) i =1, N 34
35 Generator Network z 1 z 2... g (z 1 ) g (z 2 )... z N g (z N ) If we adjust well, g (z i ) will look like this 35
36 Training Generator network using Discriminator network Discriminator network needs positive and negative training examples x i Discriminator w f w (x i ) C(w) = 1 N + N + X i f w (x i )+ 1 N NX j f w (x j ) put real data examples here put fake generated data here 36
37 First train the Discriminator network C(w, ) = 1 N + N + X i f w (x i )+ 1 N NX j f w (g (z j )) put real data examples here put fake generated data here w = arg min w C(w, ) keeping constant Discriminator network is trained to grade generated data as negative It criticised any imperfections of the generator 37
38 Discriminator : x -> 10 -> 10 -> 10 -> out 38
39 Next train the Generator network C(w, ) = 1 N + N + X i f w (x i )+ 1 N NX j f w (g (z j )) put real data examples here put fake generated data here w = arg min w C(w, ) keeping constant = arg max C(w, ) keeping w constant 39
40 Generator : x -> 5 -> 5 -> 5 -> out 40
41 C(w, ) = Cost function use in original GAN paper 1 N + X i log(d w (x i )) + 1 N X j 1 log(1 D w (g (z j ))) A w = arg min w C(w, ) keeping constant = arg max C(w, ) keeping w constant w 1.Randomly initialise and 2.Use g (z i ) to generate fake data 3.Keep w constant and optimise for some n iterations 4.Keep constant and optimise for some k iterations w 41
42 Discriminator : x -> 10 -> 10 -> 10 -> out Generator : x -> 5 -> 5 -> 5 -> out 42
43 Discriminator : x -> 10 -> 10 -> 10 -> out Generator : x -> 5 -> 5 -> 5 -> out 43
44 Shortcomings of GAN 44
45 When two adversarial of equal strength competes both can improve 45
46 Discriminator Generator 46
47 Discriminator fail Discriminator : x -> 10 -> 10 -> 10 -> out Generator : x -> 5 -> 5 -> 5 -> out 47
48 Generator fail due to zero gradient 48
49 Generator fail due to zero gradient (surrounded) 49
50 Mode collapse 50
51 Improved Techniques for Training GANs T. Salimans, I Goodfellow et al 51
52 Generator Network 1. Feature matching for training the generator Discriminator Network ~f(x) Cost function for generator C g (w, ) = 1 N X ~f w (x i ) i 1 M X j ~f w (g (z j )) 2 2 = arg min C g (w, ) 52
53 Discriminator cost function use in original GAN paper C(w, ) = 1 N + X i log(d w (x i )) + 1 N X j 1 log(1 D w (g (z j ))) A = arg max C(w, ) keeping w constant 53
54 2. Minibatch discrimination for training the discriminator ~f(x i ) T ~o(x i ) Read paper for details o contains information on how this data point cluster with other data points in the minibatch Discriminator Network ~f(x) Concatenate features to create a longer feature ~f(xi ), ~o(x i ) Put this new feature back into subsequent layers of discriminator 54
55 3. Historical averaging Add a term in cost function 1 t X i i 2 Effect, at every step, theta does not change too much in magnitude and direction 55
56 4. One-sided label smoothing Changes in discriminator training Instead of asking discriminator to output 1 for real data, ask discriminator to output a softer value like 0.9 for real data. Fake data still has label 0 Encourage a softer decision boundary, e.g. don t need to saturate out the sigmoid and make gradient zero 56
57 5. Virtual batch normalisation Instead of normalising differently for each minibatch, use a reference batch so that normalisation is consistent across minbatches. What I would do is to use one statistics for normalisation across all data, e.g. compute z-score using one mean and one variance What paper proposed is to recalculate mean and variance for each data point, which will increase computational time a lot 57
58 Wasserstein GAN 58
59 Earth Mover s Distance cost =
60 Earth Mover s Distance A B C D E cost = 14 A(2) B(2) C(3) D(3) E(4)
61 Earth Mover s Distance A B C D E cost = 12 A(2) C(1) E(1) B(3) D(5)
62 Earth Mover s Distance A B C D E F G H I B(3) D(3) C(5) cost = 22 G(0) H(0) F(3) E(2) I(0) A(6)
63 Distribution of real and fake data P r (x) is the distribution of real data, it is constant P (g (z)) is the distribution of generated data it changes when generator change W (P r (x),p (g (z))) is the Earth Mover s Distance between real and generated data 63
64 Idea of Wasserstein GAN No discriminator (in the paper they use the word critic ) Given P r (x) Initialize g (z i ) and generate P (g (z)) Adjust generator such that W (P r (x),p (g (z))) is minimised 64
65 Kantorovich-Rubinstein duality I did not prove it 0 1 W (P r,p )= sup 1 N X i f(x i ) 1 M X j f(g(z j )) A over 1-Lipschitz functions (loosely speaking gradient does not exceed one) Sample from P r (x) Sample from P (g (z)) 65
66 Kantorovich-Rubinstein duality I did not prove it 0 1 W (P r,p )= sup 1 N X i f(x i ) 1 M X j f(g(z j )) A Sample from P r (x) Sample from P (g (z)) This is a constrained optimisation problem! 66
67 Representation of 1-Lipschitz function One way to parameterise a function is to use a neural network. By adjusting weights, we create different functions. When network is small, class of function obtained is limited. When network is deep, we can represent more functions. 67
68 W (P r,p )= sup kfk<1 1 N X i f(x i ) 1 M X j 1 f(g(z j )) A Represent f( ) by a neural network f w ( ) Use gradient ascend to find W (P r (x),p (g (z))) 68
69 Algorithm Initialize w and Generates P (g (z)) (1) Maximize by adjusting w 0 W (P r,p )= 1 X f w (x i ) kfk<1 N i 1 M X j 1 f w (g(z j )) A (2) Minimize Earth Mover s distance arg min W (P r,p ) (3) Repest (1) & (2) 69
70
71 Assignment #3: There is something strange about this algorithm. What is it?
72 Deep Convolutional GAN A. Radford, L. Metz, S. Chintala Unsupervised Representation Learning with Deep Convolutional GAN, ICLR
73 one epoch 73
74 5 epoch 74
75 one epoch 5 epoch 75
76 Generator 76
77 77
78 Every 10 iterations Final generated digits 78
79 Every 10 iterations Final generated digits 79
80 Every 10 iterations Final generated digits 80
81 Every 10 iterations Final generated digits 81
82 BiGAN Bidirectional GAN 82
83 x Discriminator Network y z Generator Network x = g(z) 83
84 Give z, generate x noise z Generator Network x = g(z) how about give x, generate z z = E(x) x Encoder Network 84
85 x Discriminator Network y z Generator Network x = g(z) 85
86 z = E(x) Encoder Network x Discriminator Network y z Generator Network x = g(z) 86
87 z = E(x) Encoder Network x x,z x,z Discriminator Network y z Generator Network x = g(z) 87
88 C(w,, )= 1 N + X i Cost function log{d w (x i,e (x i ))} + 1 N X j 1 log{1 D w (g (z j ),z j } A max, min w C(w,, ) Adjust generator and encoder such that all fake data looks like real ones g (z j )=x 0! x E (x i )=z 0! z By this procedure, g(e (x i ))! x i E g 1 88
89 x g(e(x)) x g(e(x)) 89
90 x g(e(x)) x g(e(x)) x g(e(x)) x g(e(x)) 90
91 Assignment #3 we will announce this weekend 91
92 92
Lecture 14: Deep Generative Learning
Generative Modeling CSED703R: Deep Learning for Visual Recognition (2017F) Lecture 14: Deep Generative Learning Density estimation Reconstructing probability density function using samples Bohyung Han
More informationGenerative adversarial networks
14-1: Generative adversarial networks Prof. J.C. Kao, UCLA Generative adversarial networks Why GANs? GAN intuition GAN equilibrium GAN implementation Practical considerations Much of these notes are based
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationNishant Gurnani. GAN Reading Group. April 14th, / 107
Nishant Gurnani GAN Reading Group April 14th, 2017 1 / 107 Why are these Papers Important? 2 / 107 Why are these Papers Important? Recently a large number of GAN frameworks have been proposed - BGAN, LSGAN,
More informationNeed for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels
Need for Deep Networks Perceptron Can only model linear functions Kernel Machines Non-linearity provided by kernels Need to design appropriate kernels (possibly selecting from a set, i.e. kernel learning)
More informationComments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms
Neural networks Comments Assignment 3 code released implement classification algorithms use kernels for census dataset Thought questions 3 due this week Mini-project: hopefully you have started 2 Example:
More informationAdversarial Examples
Adversarial Examples presentation by Ian Goodfellow Deep Learning Summer School Montreal August 9, 2015 In this presentation. - Intriguing Properties of Neural Networks. Szegedy et al., ICLR 2014. - Explaining
More informationWasserstein GAN. Juho Lee. Jan 23, 2017
Wasserstein GAN Juho Lee Jan 23, 2017 Wasserstein GAN (WGAN) Arxiv submission Martin Arjovsky, Soumith Chintala, and Léon Bottou A new GAN model minimizing the Earth-Mover s distance (Wasserstein-1 distance)
More informationGenerative Adversarial Networks. Presented by Yi Zhang
Generative Adversarial Networks Presented by Yi Zhang Deep Generative Models N(O, I) Variational Auto-Encoders GANs Unreasonable Effectiveness of GANs GANs Discriminator tries to distinguish genuine data
More informationClassification: The rest of the story
U NIVERSITY OF ILLINOIS AT URBANA-CHAMPAIGN CS598 Machine Learning for Signal Processing Classification: The rest of the story 3 October 2017 Today s lecture Important things we haven t covered yet Fisher
More informationDeep Generative Image Models using a Laplacian Pyramid of Adversarial Networks
Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks Emily Denton 1, Soumith Chintala 2, Arthur Szlam 2, Rob Fergus 2 1 New York University 2 Facebook AI Research Denotes equal
More information6.036 midterm review. Wednesday, March 18, 15
6.036 midterm review 1 Topics covered supervised learning labels available unsupervised learning no labels available semi-supervised learning some labels available - what algorithms have you learned that
More informationarxiv: v1 [cs.lg] 20 Apr 2017
Softmax GAN Min Lin Qihoo 360 Technology co. ltd Beijing, China, 0087 mavenlin@gmail.com arxiv:704.069v [cs.lg] 0 Apr 07 Abstract Softmax GAN is a novel variant of Generative Adversarial Network (GAN).
More informationGenerative Adversarial Networks
Generative Adversarial Networks Stefano Ermon, Aditya Grover Stanford University Lecture 10 Stefano Ermon, Aditya Grover (AI Lab) Deep Generative Models Lecture 10 1 / 17 Selected GANs https://github.com/hindupuravinash/the-gan-zoo
More informationGenerative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab,
Generative Adversarial Networks (GANs) Ian Goodfellow, OpenAI Research Scientist Presentation at Berkeley Artificial Intelligence Lab, 2016-08-31 Generative Modeling Density estimation Sample generation
More informationLogistic Regression. COMP 527 Danushka Bollegala
Logistic Regression COMP 527 Danushka Bollegala Binary Classification Given an instance x we must classify it to either positive (1) or negative (0) class We can use {1,-1} instead of {1,0} but we will
More informationSinging Voice Separation using Generative Adversarial Networks
Singing Voice Separation using Generative Adversarial Networks Hyeong-seok Choi, Kyogu Lee Music and Audio Research Group Graduate School of Convergence Science and Technology Seoul National University
More informationGaussian and Linear Discriminant Analysis; Multiclass Classification
Gaussian and Linear Discriminant Analysis; Multiclass Classification Professor Ameet Talwalkar Slide Credit: Professor Fei Sha Professor Ameet Talwalkar CS260 Machine Learning Algorithms October 13, 2015
More informationMachine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function.
Bayesian learning: Machine learning comes from Bayesian decision theory in statistics. There we want to minimize the expected value of the loss function. Let y be the true label and y be the predicted
More informationExperiments on the Consciousness Prior
Yoshua Bengio and William Fedus UNIVERSITÉ DE MONTRÉAL, MILA Abstract Experiments are proposed to explore a novel prior for representation learning, which can be combined with other priors in order to
More informationMMD GAN 1 Fisher GAN 2
MMD GAN 1 Fisher GAN 1 Chun-Liang Li, Wei-Cheng Chang, Yu Cheng, Yiming Yang, and Barnabás Póczos (CMU, IBM Research) Youssef Mroueh, and Tom Sercu (IBM Research) Presented by Rui-Yi(Roy) Zhang Decemeber
More informationDeep Learning for Computer Vision
Deep Learning for Computer Vision Lecture 4: Curse of Dimensionality, High Dimensional Feature Spaces, Linear Classifiers, Linear Regression, Python, and Jupyter Notebooks Peter Belhumeur Computer Science
More informationGENERATIVE ADVERSARIAL LEARNING
GENERATIVE ADVERSARIAL LEARNING OF MARKOV CHAINS Jiaming Song, Shengjia Zhao & Stefano Ermon Computer Science Department Stanford University {tsong,zhaosj12,ermon}@cs.stanford.edu ABSTRACT We investigate
More informationNotes on Adversarial Examples
Notes on Adversarial Examples David Meyer dmm@{1-4-5.net,uoregon.edu,...} March 14, 2017 1 Introduction The surprising discovery of adversarial examples by Szegedy et al. [6] has led to new ways of thinking
More informationNeural Networks. Single-layer neural network. CSE 446: Machine Learning Emily Fox University of Washington March 10, /9/17
3/9/7 Neural Networks Emily Fox University of Washington March 0, 207 Slides adapted from Ali Farhadi (via Carlos Guestrin and Luke Zettlemoyer) Single-layer neural network 3/9/7 Perceptron as a neural
More informationMachine Learning Lecture 5
Machine Learning Lecture 5 Linear Discriminant Functions 26.10.2017 Bastian Leibe RWTH Aachen http://www.vision.rwth-aachen.de leibe@vision.rwth-aachen.de Course Outline Fundamentals Bayes Decision Theory
More informationJ. Sadeghi E. Patelli M. de Angelis
J. Sadeghi E. Patelli Institute for Risk and, Department of Engineering, University of Liverpool, United Kingdom 8th International Workshop on Reliable Computing, Computing with Confidence University of
More informationMachine Learning. Lecture 9: Learning Theory. Feng Li.
Machine Learning Lecture 9: Learning Theory Feng Li fli@sdu.edu.cn https://funglee.github.io School of Computer Science and Technology Shandong University Fall 2018 Why Learning Theory How can we tell
More informationDeep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang
Deep Feedforward Networks Han Shao, Hou Pong Chan, and Hongyi Zhang Deep Feedforward Networks Goal: approximate some function f e.g., a classifier, maps input to a class y = f (x) x y Defines a mapping
More informationDeep Generative Models. (Unsupervised Learning)
Deep Generative Models (Unsupervised Learning) CEng 783 Deep Learning Fall 2017 Emre Akbaş Reminders Next week: project progress demos in class Describe your problem/goal What you have done so far What
More informationTraining Generative Adversarial Networks Via Turing Test
raining enerative Adversarial Networks Via uring est Jianlin Su School of Mathematics Sun Yat-sen University uangdong, China bojone@spaces.ac.cn Abstract In this article, we introduce a new mode for training
More informationSTA 414/2104: Lecture 8
STA 414/2104: Lecture 8 6-7 March 2017: Continuous Latent Variable Models, Neural networks With thanks to Russ Salakhutdinov, Jimmy Ba and others Outline Continuous latent variable models Background PCA
More informationAdministration. Registration Hw3 is out. Lecture Captioning (Extra-Credit) Scribing lectures. Questions. Due on Thursday 10/6
Administration Registration Hw3 is out Due on Thursday 10/6 Questions Lecture Captioning (Extra-Credit) Look at Piazza for details Scribing lectures With pay; come talk to me/send email. 1 Projects Projects
More informationGenerative Adversarial Networks, and Applications
Generative Adversarial Networks, and Applications Ali Mirzaei Nimish Srivastava Kwonjoon Lee Songting Xu CSE 252C 4/12/17 2/44 Outline: Generative Models vs Discriminative Models (Background) Generative
More informationGANs. Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG
GANs Machine Learning: Jordan Boyd-Graber University of Maryland SLIDES ADAPTED FROM GRAHAM NEUBIG Machine Learning: Jordan Boyd-Graber UMD GANs 1 / 7 Problems with Generation Generative Models Ain t Perfect
More informationIntroduction to Convolutional Neural Networks (CNNs)
Introduction to Convolutional Neural Networks (CNNs) nojunk@snu.ac.kr http://mipal.snu.ac.kr Department of Transdisciplinary Studies Seoul National University, Korea Jan. 2016 Many slides are from Fei-Fei
More informationNon-Convex Optimization. CS6787 Lecture 7 Fall 2017
Non-Convex Optimization CS6787 Lecture 7 Fall 2017 First some words about grading I sent out a bunch of grades on the course management system Everyone should have all their grades in Not including paper
More informationLinear vs Non-linear classifier. CS789: Machine Learning and Neural Network. Introduction
Linear vs Non-linear classifier CS789: Machine Learning and Neural Network Support Vector Machine Jakramate Bootkrajang Department of Computer Science Chiang Mai University Linear classifier is in the
More informationA QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS
A QUANTITATIVE MEASURE OF GENERATIVE ADVERSARIAL NETWORK DISTRIBUTIONS Dan Hendrycks University of Chicago dan@ttic.edu Steven Basart University of Chicago xksteven@uchicago.edu ABSTRACT We introduce a
More informationNeural Networks and Deep Learning
Neural Networks and Deep Learning Professor Ameet Talwalkar November 12, 2015 Professor Ameet Talwalkar Neural Networks and Deep Learning November 12, 2015 1 / 16 Outline 1 Review of last lecture AdaBoost
More informationDemystifying deep learning. Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK
Demystifying deep learning Petar Veličković Artificial Intelligence Group Department of Computer Science and Technology, University of Cambridge, UK London Data Science Summit 20 October 2017 Introduction
More informationMulticlass Classification-1
CS 446 Machine Learning Fall 2016 Oct 27, 2016 Multiclass Classification Professor: Dan Roth Scribe: C. Cheng Overview Binary to multiclass Multiclass SVM Constraint classification 1 Introduction Multiclass
More information(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann
(Feed-Forward) Neural Networks 2016-12-06 Dr. Hajira Jabeen, Prof. Jens Lehmann Outline In the previous lectures we have learned about tensors and factorization methods. RESCAL is a bilinear model for
More informationSecurity Analytics. Topic 6: Perceptron and Support Vector Machine
Security Analytics Topic 6: Perceptron and Support Vector Machine Purdue University Prof. Ninghui Li Based on slides by Prof. Jenifer Neville and Chris Clifton Readings Principle of Data Mining Chapter
More informationLinear Models for Classification
Linear Models for Classification Oliver Schulte - CMPT 726 Bishop PRML Ch. 4 Classification: Hand-written Digit Recognition CHINE INTELLIGENCE, VOL. 24, NO. 24, APRIL 2002 x i = t i = (0, 0, 0, 1, 0, 0,
More informationLearning From Data: Modelling as an Optimisation Problem
Learning From Data: Modelling as an Optimisation Problem Iman Shames April 2017 1 / 31 You should be able to... Identify and formulate a regression problem; Appreciate the utility of regularisation; Identify
More informationOPTIMIZATION METHODS IN DEEP LEARNING
Tutorial outline OPTIMIZATION METHODS IN DEEP LEARNING Based on Deep Learning, chapter 8 by Ian Goodfellow, Yoshua Bengio and Aaron Courville Presented By Nadav Bhonker Optimization vs Learning Surrogate
More informationMachine Learning. Nonparametric Methods. Space of ML Problems. Todo. Histograms. Instance-Based Learning (aka non-parametric methods)
Machine Learning InstanceBased Learning (aka nonparametric methods) Supervised Learning Unsupervised Learning Reinforcement Learning Parametric Non parametric CSE 446 Machine Learning Daniel Weld March
More informationCS534 Machine Learning - Spring Final Exam
CS534 Machine Learning - Spring 2013 Final Exam Name: You have 110 minutes. There are 6 questions (8 pages including cover page). If you get stuck on one question, move on to others and come back to the
More informationMulti-Layer Boosting for Pattern Recognition
Multi-Layer Boosting for Pattern Recognition François Fleuret IDIAP Research Institute, Centre du Parc, P.O. Box 592 1920 Martigny, Switzerland fleuret@idiap.ch Abstract We extend the standard boosting
More informationImage classification via Scattering Transform
1 Image classification via Scattering Transform Edouard Oyallon under the supervision of Stéphane Mallat following the work of Laurent Sifre, Joan Bruna, Classification of signals Let n>0, (X, Y ) 2 R
More informationLearning from Data Logistic Regression
Learning from Data Logistic Regression Copyright David Barber 2-24. Course lecturer: Amos Storkey a.storkey@ed.ac.uk Course page : http://www.anc.ed.ac.uk/ amos/lfd/ 2.9.8.7.6.5.4.3.2...2.3.4.5.6.7.8.9
More informationAutomatic Differentiation and Neural Networks
Statistical Machine Learning Notes 7 Automatic Differentiation and Neural Networks Instructor: Justin Domke 1 Introduction The name neural network is sometimes used to refer to many things (e.g. Hopfield
More informationWhat is semi-supervised learning?
What is semi-supervised learning? In many practical learning domains, there is a large supply of unlabeled data but limited labeled data, which can be expensive to generate text processing, video-indexing,
More informationIntro. ANN & Fuzzy Systems. Lecture 15. Pattern Classification (I): Statistical Formulation
Lecture 15. Pattern Classification (I): Statistical Formulation Outline Statistical Pattern Recognition Maximum Posterior Probability (MAP) Classifier Maximum Likelihood (ML) Classifier K-Nearest Neighbor
More informationDeep Learning Basics Lecture 7: Factor Analysis. Princeton University COS 495 Instructor: Yingyu Liang
Deep Learning Basics Lecture 7: Factor Analysis Princeton University COS 495 Instructor: Yingyu Liang Supervised v.s. Unsupervised Math formulation for supervised learning Given training data x i, y i
More informationIntroduction to Machine Learning Midterm Exam Solutions
10-701 Introduction to Machine Learning Midterm Exam Solutions Instructors: Eric Xing, Ziv Bar-Joseph 17 November, 2015 There are 11 questions, for a total of 100 points. This exam is open book, open notes,
More informationLogistic Regression. Machine Learning Fall 2018
Logistic Regression Machine Learning Fall 2018 1 Where are e? We have seen the folloing ideas Linear models Learning as loss minimization Bayesian learning criteria (MAP and MLE estimation) The Naïve Bayes
More informationCS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes
CS 6501: Deep Learning for Computer Graphics Basics of Neural Networks Connelly Barnes Overview Simple neural networks Perceptron Feedforward neural networks Multilayer perceptron and properties Autoencoders
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationSome theoretical properties of GANs. Gérard Biau Toulouse, September 2018
Some theoretical properties of GANs Gérard Biau Toulouse, September 2018 Coauthors Benoît Cadre (ENS Rennes) Maxime Sangnier (Sorbonne University) Ugo Tanielian (Sorbonne University & Criteo) 1 video Source:
More informationCSCI-567: Machine Learning (Spring 2019)
CSCI-567: Machine Learning (Spring 2019) Prof. Victor Adamchik U of Southern California Mar. 19, 2019 March 19, 2019 1 / 43 Administration March 19, 2019 2 / 43 Administration TA3 is due this week March
More informationCMSC858P Supervised Learning Methods
CMSC858P Supervised Learning Methods Hector Corrada Bravo March, 2010 Introduction Today we discuss the classification setting in detail. Our setting is that we observe for each subject i a set of p predictors
More informationAdvanced computational methods X Selected Topics: SGD
Advanced computational methods X071521-Selected Topics: SGD. In this lecture, we look at the stochastic gradient descent (SGD) method 1 An illustrating example The MNIST is a simple dataset of variety
More informationEmpirical Risk Minimization
Empirical Risk Minimization Fabrice Rossi SAMM Université Paris 1 Panthéon Sorbonne 2018 Outline Introduction PAC learning ERM in practice 2 General setting Data X the input space and Y the output space
More informationDeep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, Spis treści
Deep learning / Ian Goodfellow, Yoshua Bengio and Aaron Courville. - Cambridge, MA ; London, 2017 Spis treści Website Acknowledgments Notation xiii xv xix 1 Introduction 1 1.1 Who Should Read This Book?
More informationWhen Dictionary Learning Meets Classification
When Dictionary Learning Meets Classification Bufford, Teresa 1 Chen, Yuxin 2 Horning, Mitchell 3 Shee, Liberty 1 Mentor: Professor Yohann Tendero 1 UCLA 2 Dalhousie University 3 Harvey Mudd College August
More informationUnderstanding GANs: Back to the basics
Understanding GANs: Back to the basics David Tse Stanford University Princeton University May 15, 2018 Joint work with Soheil Feizi, Farzan Farnia, Tony Ginart, Changho Suh and Fei Xia. GANs at NIPS 2017
More informationDeep Neural Networks (1) Hidden layers; Back-propagation
Deep Neural Networs (1) Hidden layers; Bac-propagation Steve Renals Machine Learning Practical MLP Lecture 3 4 October 2017 / 9 October 2017 MLP Lecture 3 Deep Neural Networs (1) 1 Recap: Softmax single
More informationDreem Challenge report (team Bussanati)
Wavelet course, MVA 04-05 Simon Bussy, simon.bussy@gmail.com Antoine Recanati, arecanat@ens-cachan.fr Dreem Challenge report (team Bussanati) Description and specifics of the challenge We worked on the
More informationReinforcement Learning
Reinforcement Learning Cyber Rodent Project Some slides from: David Silver, Radford Neal CSC411: Machine Learning and Data Mining, Winter 2017 Michael Guerzhoy 1 Reinforcement Learning Supervised learning:
More informationImage Processing 2. Hakan Bilen University of Edinburgh. Computer Graphics Fall 2017
Image Processing 2 Hakan Bilen University of Edinburgh Computer Graphics Fall 2017 This week What is an image? What is image processing? Point processing Linear (Spatial) filters Frequency domain Deep
More informationSVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning
SVAN 2016 Mini Course: Stochastic Convex Optimization Methods in Machine Learning Mark Schmidt University of British Columbia, May 2016 www.cs.ubc.ca/~schmidtm/svan16 Some images from this lecture are
More informationBrief Introduction to Machine Learning
Brief Introduction to Machine Learning Yuh-Jye Lee Lab of Data Science and Machine Intelligence Dept. of Applied Math. at NCTU August 29, 2016 1 / 49 1 Introduction 2 Binary Classification 3 Support Vector
More informationQualifying Exam in Machine Learning
Qualifying Exam in Machine Learning October 20, 2009 Instructions: Answer two out of the three questions in Part 1. In addition, answer two out of three questions in two additional parts (choose two parts
More informationClassification. The goal: map from input X to a label Y. Y has a discrete set of possible values. We focused on binary Y (values 0 or 1).
Regression and PCA Classification The goal: map from input X to a label Y. Y has a discrete set of possible values We focused on binary Y (values 0 or 1). But we also discussed larger number of classes
More informationSimple Techniques for Improving SGD. CS6787 Lecture 2 Fall 2017
Simple Techniques for Improving SGD CS6787 Lecture 2 Fall 2017 Step Sizes and Convergence Where we left off Stochastic gradient descent x t+1 = x t rf(x t ; yĩt ) Much faster per iteration than gradient
More informationCMU-Q Lecture 24:
CMU-Q 15-381 Lecture 24: Supervised Learning 2 Teacher: Gianni A. Di Caro SUPERVISED LEARNING Hypotheses space Hypothesis function Labeled Given Errors Performance criteria Given a collection of input
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationCOMS 4771 Introduction to Machine Learning. James McInerney Adapted from slides by Nakul Verma
COMS 4771 Introduction to Machine Learning James McInerney Adapted from slides by Nakul Verma Announcements HW1: Please submit as a group Watch out for zero variance features (Q5) HW2 will be released
More informationIntelligent Systems Discriminative Learning, Neural Networks
Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger WS2014/2015, Outline 1. Discriminative learning 2. Neurons and linear classifiers: 1) Perceptron-Algorithm
More informationA short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie
A short introduction to supervised learning, with applications to cancer pathway analysis Dr. Christina Leslie Computational Biology Program Memorial Sloan-Kettering Cancer Center http://cbio.mskcc.org/leslielab
More informationLinear Regression. CSL603 - Fall 2017 Narayanan C Krishnan
Linear Regression CSL603 - Fall 2017 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis Regularization
More informationLinear Regression. CSL465/603 - Fall 2016 Narayanan C Krishnan
Linear Regression CSL465/603 - Fall 2016 Narayanan C Krishnan ckn@iitrpr.ac.in Outline Univariate regression Multivariate regression Probabilistic view of regression Loss functions Bias-Variance analysis
More informationECE-271B. Nuno Vasconcelos ECE Department, UCSD
ECE-271B Statistical ti ti Learning II Nuno Vasconcelos ECE Department, UCSD The course the course is a graduate level course in statistical learning in SLI we covered the foundations of Bayesian or generative
More informationDay 3 Lecture 3. Optimizing deep networks
Day 3 Lecture 3 Optimizing deep networks Convex optimization A function is convex if for all α [0,1]: f(x) Tangent line Examples Quadratics 2-norms Properties Local minimum is global minimum x Gradient
More informationSparse vectors recap. ANLP Lecture 22 Lexical Semantics with Dense Vectors. Before density, another approach to normalisation.
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Previous lectures: Sparse vectors recap How to represent
More informationANLP Lecture 22 Lexical Semantics with Dense Vectors
ANLP Lecture 22 Lexical Semantics with Dense Vectors Henry S. Thompson Based on slides by Jurafsky & Martin, some via Dorota Glowacka 5 November 2018 Henry S. Thompson ANLP Lecture 22 5 November 2018 Previous
More informationCOMP 562: Introduction to Machine Learning
COMP 562: Introduction to Machine Learning Lecture 20 : Support Vector Machines, Kernels Mahmoud Mostapha 1 Department of Computer Science University of North Carolina at Chapel Hill mahmoudm@cs.unc.edu
More informationCheng Soon Ong & Christian Walder. Canberra February June 2018
Cheng Soon Ong & Christian Walder Research Group and College of Engineering and Computer Science Canberra February June 2018 Outlines Overview Introduction Linear Algebra Probability Linear Regression
More informationMidterm Exam, Spring 2005
10-701 Midterm Exam, Spring 2005 1. Write your name and your email address below. Name: Email address: 2. There should be 15 numbered pages in this exam (including this cover sheet). 3. Write your name
More informationNeural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann
Neural Networks with Applications to Vision and Language Feedforward Networks Marco Kuhlmann Feedforward networks Linear separability x 2 x 2 0 1 0 1 0 0 x 1 1 0 x 1 linearly separable not linearly separable
More informationLinear Classifiers and the Perceptron
Linear Classifiers and the Perceptron William Cohen February 4, 2008 1 Linear classifiers Let s assume that every instance is an n-dimensional vector of real numbers x R n, and there are only two possible
More informationLinear classifiers: Logistic regression
Linear classifiers: Logistic regression STAT/CSE 416: Machine Learning Emily Fox University of Washington April 19, 2018 How confident is your prediction? The sushi & everything else were awesome! The
More informationIntroduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis
Introduction to Natural Computation Lecture 9 Multilayer Perceptrons and Backpropagation Peter Lewis 1 / 25 Overview of the Lecture Why multilayer perceptrons? Some applications of multilayer perceptrons.
More informationLinear and Logistic Regression. Dr. Xiaowei Huang
Linear and Logistic Regression Dr. Xiaowei Huang https://cgi.csc.liv.ac.uk/~xiaowei/ Up to now, Two Classical Machine Learning Algorithms Decision tree learning K-nearest neighbor Model Evaluation Metrics
More information18.9 SUPPORT VECTOR MACHINES
744 Chapter 8. Learning from Examples is the fact that each regression problem will be easier to solve, because it involves only the examples with nonzero weight the examples whose kernels overlap the
More informationTufts COMP 135: Introduction to Machine Learning
Tufts COMP 135: Introduction to Machine Learning https://www.cs.tufts.edu/comp/135/2019s/ Logistic Regression Many slides attributable to: Prof. Mike Hughes Erik Sudderth (UCI) Finale Doshi-Velez (Harvard)
More informationHow to Measure the Objectivity of a Test
How to Measure the Objectivity of a Test Mark H. Moulton, Ph.D. Director of Research, Evaluation, and Psychometric Development Educational Data Systems Specific Objectivity (Ben Wright, Georg Rasch) Rasch
More information