Administrative Issues. CS5242 mirror site: https://web.bii.a-star.edu.sg/~leehk/cs5424.html

Similar documents
Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Introduction to Neural Networks

Neural Networks. Intro to AI Bert Huang Virginia Tech

Introduction to Neural Networks

Machine Learning. Boris

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018

Midterm: CS 6375 Spring 2015 Solutions

ECE521 Lectures 9 Fully Connected Neural Networks

Deep Learning Lab Course 2017 (Deep Learning Practical)

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Deep Feedforward Networks. Sargur N. Srihari

Artificial neural networks

Deep Learning (CNNs)

Deep Feedforward Networks

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Multilayer Perceptrons and Backpropagation

CS 570: Machine Learning Seminar. Fall 2016

ECE521 Lecture 7/8. Logistic Regression

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Neural Networks 2. 2 Receptive fields and dealing with image inputs

Logistic Regression. COMP 527 Danushka Bollegala

CSC321 Lecture 16: ResNets and Attention

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

CS6220: DATA MINING TECHNIQUES

Welcome to Physics 161 Elements of Physics Fall 2018, Sept 4. Wim Kloet

Neural Networks (Part 1) Goals for the lecture

Convolutional Neural Networks

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

text classification 3: neural networks

Deep Neural Networks (1) Hidden layers; Back-propagation

Classification of Higgs Boson Tau-Tau decays using GPU accelerated Neural Networks

CS249: ADVANCED DATA MINING

DEEP LEARNING AND NEURAL NETWORKS: BACKGROUND AND HISTORY

Machine Learning (CS 567) Lecture 2

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6

10. Artificial Neural Networks

Week 5: The Backpropagation Algorithm

Logistic Regression. Will Monroe CS 109. Lecture Notes #22 August 14, 2017

Final Examination CS 540-2: Introduction to Artificial Intelligence

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Introduction to Machine Learning Spring 2018 Note Neural Networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Machine Learning Lecture 12

CSCI567 Machine Learning (Fall 2018)

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Artificial Neuron (Perceptron)

Lecture 3 Feedforward Networks and Backpropagation

Neural Networks. Yan Shao Department of Linguistics and Philology, Uppsala University 7 December 2016

Backpropagation Introduction to Machine Learning. Matt Gormley Lecture 12 Feb 23, 2018

Machine Learning Lecture 5

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Intelligent Systems Discriminative Learning, Neural Networks

Natural Language Processing with Deep Learning CS224N/Ling284

Artificial Intelligence

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

CS Machine Learning Qualifying Exam

COGS Q250 Fall Homework 7: Learning in Neural Networks Due: 9:00am, Friday 2nd November.

CS230: Lecture 9 Deep Reinforcement Learning

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Networks in Structured Prediction. November 17, 2015

CSC321 Lecture 9 Recurrent neural nets

6.034 Quiz 3 5 November 2018

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Neural Networks and Deep Learning

Ch.6 Deep Feedforward Networks (2/3)

CSE446: Neural Networks Spring Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer

Lecture 3 Feedforward Networks and Backpropagation

Deep Neural Networks (1) Hidden layers; Back-propagation

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

CS 4700: Foundations of Artificial Intelligence

Lecture 5 Neural models for NLP

Chemistry 330 Fall 2015 Organic Chemistry I

Deep Learning: a gentle introduction

Machine Learning (CS 567) Lecture 3

CSC321 Lecture 5: Multilayer Perceptrons

8-1: Backpropagation Prof. J.C. Kao, UCLA. Backpropagation. Chain rule for the derivatives Backpropagation graphs Examples

Machine Learning, Fall 2011: Homework 5

Artificial Neural Networks. Introduction to Computational Neuroscience Tambet Matiisen

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

Long-Short Term Memory and Other Gated RNNs

Lecture 4 Backpropagation

Deep Learning. Convolutional Neural Networks Applications

Backpropagation and Neural Networks part 1. Lecture 4-1

ARTIFICIAL INTELLIGENCE. Artificial Neural Networks

Statistical NLP for the Web

Value Function Methods. CS : Deep Reinforcement Learning Sergey Levine

Decision Trees: Overfitting

From perceptrons to word embeddings. Simon Šuster University of Groningen

Introduction Biologically Motivated Crude Model Backpropagation

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Grundlagen der Künstlichen Intelligenz

Transcription:

Administrative Issues CS5242 mirror site: https://web.bii.a-star.edu.sg/~leehk/cs5424.html 1

Assignments (10% * 4 = 40%) For each assignment It has multiple sub-tasks, including coding tasks You need to submit the answers, code and execution results 2

Assignments (10% * 4 = 40%) Quiz (20%) Final Project (40%) 3

Final Projects (40%) Projects will be held on Kaggle-in-class Register using your nus email (https://inclass.kaggle.com/) The project webpages will be announced once groups are finalized No data disclosure No additional data Each group <=2 students Each group will be randomly assigned to A computer vision project A natural language processing project Assessment is based on the report (15%) and ranking (25%) Top ranked groups of each project will be invited to do presentation at week 13. Will announce in IVLE once Kaggle is ready 4

Final Project (40%) Report template (15%) Problem definition Highlights (new algorithms, insights from the experiments) Dataset pre-processing description Training and testing procedure Experimental study (clarity, model understanding, and highlights are important for the assessment) Ranking -> score (25%) Rank top 10%, score ~25 (max score) Rank top 20%, score ~21 Rank top 40%, score ~17 Rank top 70%, score ~14 Rank others, score ~13 Prediction scores less than 10% better than random guess, score ~5 Final rank to grade distribution may be adjusted depending on overall class performance 5

Final projects timeline summary Date 27 August 18:00 04 November 18:00 Event Register your project group @ IVLE 2 or less students per group Results submission deadline 05 November 18:00 11 November 18:00 Report deadline 12 November 18:00 12 November 18:00 Selection and notification of selected projects for presentation Selected project group send slides to cs5242@comp.nus.edu.sg Make appointment to meet Hwee Kuan to do presentation rehearsal 16 November 18:30 Project presentations by students 6

GPU Machine/Cluster NSCC (National Super-Computing Center) Free for NUS students 128 GPU nodes, each with a NVIDIA Tesla K40 (12 GB) Follow the manual and test program on IVLE workbin/misc/gpu to get yourself familiar with it. Google Cloud Platform 300 USD credit for new register NVIDIA Tesla K80 (2x12GB) is available for Region Taiwan Amazon EC2 (g2.8xlarge) 100 USD credit for students NVIDIA Grid GPU, 1536 CUDA cores, 4GB memory Available for Singapore region Tips: STOP/TERMINATE the instance immediately after your program terminates Check the usage status frequently to avoid high outstanding bills Use Amazon or Google cloud platform for debugging and use NSCC for actual training and tuning 7

Lecture 1: Introduction Lecture 2: Neural Network Basics Lecture 3: Training the Neural Network Lecture 4: Convolutional Neural Network Lecture 5: Convolutional Neural Network Lecture 6: Convolutional Neural Network Lecture 7: Recurrent Neural Network Lecture 8: Recurrent Neural Network Lecture 9: Recurrent Neural Network Lecture 10: Advanced Topic: Generative Adversarial Network Lecture 11: Advanced Topic: One-shot learning Lecture 12: Advanced Topic: Distributed training Lecture 13: Project demonstrations 8

Forward Propagation 9

The network and information flow feed in data into first layer transmitter receiver final layer presents the output of network transmitter receiver 10

Notation Let x 2 R d be the input space y 2 R y 2 N Let or be the label Let o 2 R be the output of the neural network 11

o x 1 x 2 x 2 x 1 12

x =(x 1,x 2 ) 2 R 2 o = (w 1 x 1 + w 2 x 2 + b) x 1 x 2 A o (x1,x2,o) A x 2 o (x1,x2) x 1 x 2 A (x1,x2) A = (x1, x2, o) x 1 13

x =(x 1,x 2 ) 2 R 2 o = (w 1 x 1 + w 2 x 2 + b) x 1 x 2 o x 2 A o A x 1 x 1 14

x =(x 1,x 2 ) 2 R 2 o = (w 1 x 1 + w 2 x 2 + b) x 1 x 2 o x 2 o x 1 x 1 x 2 15

x 2 o x 1 x 2 x 2 x 1 level sets form straight lines 16 x 1

Concept of level sets 17

Next to simplest x =(x 1,x 2 ) 2 R 2 z = w 1 x 1 + w 2 x 2 + b o = (w 1 x 1 + w 2 x 2 + b) x 1 x 2 o = (z) x 1 o x 2 lines of constant z x 1 x 2 18

o x 1 x 2 These three surfaces add to form a triangular decision boundary! 19

Adding functions f(x) function 1 + function 2 (f+g)(x) function 1 a h a+b 2h + A x = g(x) function 2 A x b h A x 20

h1 h2 h3 x 2 o x 1 x 2 x 2 x 1 21 x 1

h1 h2 h3 h 1 (x 1,x 2 ) h 2 (x 1,x 2 ) h 3 (x 1,x 2 ) + + x 1 x 2 x 1 x 2 x 1 x 2 w 1 h 1 (x 1,x 2 )+w 2 h 2 (x 1,x 2 )+w 3 h 3 (x 1,x 2 )+b x 2 22 x 1

If you add up enough step surfaces, are you able to form any functions? 23

neural network fingers activities 24

Activity one Take out a piece of paper, draw patterns as shown X X X X X o o o o o o Task: Cut (tear) with one straight line to completely separate the X and O 25

X X X X X Activity two X X X X X X o o o o X X X XX X X X X X X X o oo o oo oo o o o o X X X X X X X X X X X X X X XX X X X X X Cut (tear) with one straight line! 26

X X X X X Activity two X X X X X X o o o o X X X XX X X X X X X X o oo o oo oo o o o o X X X X X X X X X X X X X X XX X X X X X Cut (tear) with one straight line! 27

Activity three X X X X X X X X X X X X X X X X o o o o o o oo o oo oo o o o o o o o o o o o oo oo o o o o X X X X X X X X X X X X X X Cut (tear) with one straight line! 28

Activity three X X X X X X X X X X X X X X X X o o o o o o oo o oo oo o o o o o o o o o o o oo oo o o o o X X X X X X X X X X X X X X Cut (tear) with one straight line! 29

Activity four X o X X o X X X X X o oo o o o X X X X X X X X X X X o o o o o X X X X X X X X XX X X Cut (tear) with one straight line! 30

Manifold view of neural network feed in data into first layer final layer presents the output of network data hidden layer output https://www.pinterest.com/pin/414260865705737484/ 31

Function view of neural network h 1 (x 1,x 2 ) h 2 (x 1,x 2 ) h 3 (x 1,x 2 ) + + x 1 x 2 x 2 x 1 x 1 x 2 x 2 x 1 32

x 1 x 2 w (1) 1 w (1) 2 b (1) h (1) Follow the data point x 2 A (x1,x2) (x1,x2,h1) A x 2 A (x1,x2) x 1 x 1 x 2 w (2) 1 w (2) 2 b (2) h (2) x 1 (x1,x2,h2) A x 2 A (x1,x2) (x A 1,x A 2 ) 7! (h A 1,h A 2 ) x 1 33

x 1 w (2) 1 w (1) 1 w (1) 2 h (1) b (2) x w (2) 2 2 h (2) b (1) x 2 A (x A 1,x A 2 ) h 2 A (h A 1,h A 2 ) x 1 h 1 (x A 1,x A 2 ) 7! (h A 1,h A 2 ) 34

x 1 w (2) 1 w (1) 1 w (1) 2 h (1) b (2) w (2) x 2 2 h (2) b (1) x 2 A h 2 A x 1 Neighbourhood relationship is conserved h 1 35

continuous mapping x 2 h 2 x 1 h 1 (x A 1,x A 2 ) 7! (h A 1,h A 2 ) 36

x 1 h 1 h 2 x 2 h 3 x 2 A h 3 A (x A 1,x A 2 ) (h A 1,h A 2,h A 3 ) h 2 x 1 h 1 37

x 1 h 1 h 2 x 2 h 3 x 2 x 1 38

x 1 w (2) 1 w (1) 1 b (2) x 2 h (2) w (2) 2 w (1) 2 b (1) h (1) w (3) 1 w (3) 2 x 2 o x 1 h 2 h 1 h (1) = (w (1) 1 x 1 + w (1) 2 x 2 + b (1) ) h (2) = (w (2) 1 x 1 + w (2) 2 x 2 + b (2) ) o = (w (3) 1 h(1) + w (3) 2 h(2) + b (3) ) 39 o

Some real life examples courtesy of our TA Connie Kou source code will be available online 40

The Angle Data The XOR Data The Ring Data 41

The Angle Data - ReLu 42

The Angle Data - ReLu 43

The Angle Data - ReLu 44

The Angle Data - ReLu 45

The Angle Data - ReLu 46

The Angle Data - ReLu 47

The Angle Data - ReLu 48

The Angle Data - ReLu 49

The Angle Data - ReLu 50

The Angle Data - ReLu 51

The Angle Data - ReLu 52

The Angle Data - Sigmoid 53

The Angle Data - Sigmoid 54

The Angle Data - Sigmoid 55

The Angle Data - Sigmoid 56

The Angle Data - Sigmoid 57

The Angle Data - Sigmoid 3 hidden nodes 58

The Angle Data - Sigmoid 59

The Angle Data - Sigmoid 60

The XOR Data - ReLU 61

The XOR Data - ReLU 62

The XOR Data - ReLU 63

The XOR Data - ReLU 64

The XOR Data - ReLU 65

The XOR Data - ReLU 66

The XOR Data - ReLU 67

The XOR Data - Sigmoid 68

The XOR Data - Sigmoid 69

The XOR Data - Sigmoid 70

The Ring Data - Sigmoid how the manifold is being fold? 71

The Ring Data - Sigmoid 72

73

74

75

76

77

78

79

80

81

Spiral 82

How about this network? Fully connected 83

Feed backward 84

A general objective Tweak the output of neural network to be as similar to the label as possible. o y How to tweak? Adjust the weights 85

Given an example (x, y) x o o! y For this example, the neural network behaves well 86

After the first example, give another example (x 0,?) x 0 o 0 Now we say for a well behaved neural network o 0 = y 0 87

How good is the trained network? x 0 o 0 88

Square error Given data points (x i,y i ),i=1, n l(y i,o i )=ky i o i k 2 C = 1 n X l(y i,o i ) i o 1,y 1 o 2,y 2... x n,x n 1, x 2,x 1 o n,y n Adjust the weights until C = 0 (or close to zero) 89

Cross entropy cost function Two class example y i 2 {0, 1} C = 1 n P i y i log[o(x i )] + (1 y i ) log[1 o(x i )] Three class example y i 2 {(1, 0, 0), (0, 1, 0), (0, 0, 1)} m =exp(v 1 )+exp(v 2 )+exp(v 3 ) o i =( exp(v 1) m, exp(v 2) m, exp(v 3) m ) l(y i,o i )= y i log(o i )... output v 1 v2 v 3 C = 1 n X i l(y i,o i ) 90

How to adjust the weights? Conceptually, this will work: Try all possible weights combinations, for all weights combinations, calculate the cost. Then pick the weight combination that has the lowest cost. Practically, there are too many weights combinations to try. 91

Gradient descend x w1 w2 w2 w3 v1 v2 v3 v 1 = (z 1 )= (w 1 x + b 1 ) v 2 = (z 2 )= (w 2 v 1 + b 2 ) v 3 = (z 3 )= (w 3 v 2 + b 3 ) @C @w j =0 @C @b j =0 92

@C @w j =0 @C @b j =0 @C @w j = 1 n = 2 n X i X i @(y i o i ) 2 @w j (y i o i ) @o i @w j We just need @o i @w j w j (t + 1) = w j (t) @C @w j 93

x w1 w2 w2 w3 v1 v2 v3 = o v 1 = (z 1 )= (w 1 x + b 1 ) v 2 = (z 2 )= (w 2 v 1 + b 2 ) v 3 = (z 3 )= (w 3 v 2 + b 3 ) @v 3 @w 3 = @v 3 @z 3 @z 3 @w 3 = 0 (z 3 )v 2 @v 3 @w 2 = @v 3 @z 3 @z 3 @w 2 = 0 (z 3 )w 3 @v 2 @w 2 = 0 (z 3 )w 3 0 (z 2 )v 1 @v 3 @w 1 = @v 3 @z 3 @z 3 @w 1 = 0 (z 3 )w 3 @v 2 @w 2 = 0 (z 3 )w 3 0 (z 2 )w 2 @v 1 = 0 (z 3 )w 3 0 (z 2 )w 2 0 (z 1 )x @w 1 94

x w1 w2 w2 w3 v1 v2 v3 = o @v 3 @w 3 = @v 3 @z 3 @z 3 @w 3 = 0 (z 3 )v 2 @v 3 @w 2 = @v 3 @z 3 @z 3 @w 2 = 0 (z 3 )w 3 @v 2 @w 2 = 0 (z 3 )w 3 0 (z 2 )v 1 @v 3 @w 1 = @v 3 @z 3 @z 3 @w 1 = 0 (z 3 )w 3 @v 2 @w 2 = 0 (z 3 )w 3 0 (z 2 )w 2 @v 1 = 0 (z 3 )w 3 0 (z 2 )w 2 0 (z 1 )x @w 1 problem!! : long mathematical expression leads to large computational time for deep network 95

Compute and store strategy x w1 w2 w2 w3 v1 v2 v3 = o @v 3 @z 1 @v 3 @z 3 = 0 (z 3 ) @v 3 @z 2 @v 3 @z 3 @v 3 @z 2 = @v 3 @z 3 @z 3 @z 2 = @v 3 @z 3 w 3 0 (z 2 ) @v 3 @z 1 = @v 3 @z 2 @z 2 @z 1 = @v 3 @z 2 w 2 0 (z 1 ) 96

Compute and store strategy x w1 w2 w2 w3 v1 v2 v3 = o @v 3 @z 1 @v 3 @z 2 @v 3 @z 3 @v 3 @w 3 = @v 3 @z 3 @z 3 @w 3 = @v 3 @z 3 v 2 @v 3 @w 2 = @v 3 @z 2 @z 2 @w 2 = @v 3 @z 2 v 1 @v 3 @w 1 = @v 3 @z 1 @z 1 @w 1 = @v 1 @z 1 x 97 @v 3 @b j? homework question

w2 w13 @v 5 @z 3 v1 v3 w35 w01 x w14 w02 v2 w23 w24 v4 w45 v5 = o @v 5 @z 5 @v 5 @z 4 @v 5 @z 4 = @v 5 @z 5 0 (z 4 )w 45 98

w2 w13 @v 5 @z 3 v1 v3 w35 w01 x w14 w02 v2 w23 w24 v4 w45 v5 = o @v 5 @z 5 @v 5 @v 5 @z 2 @z 4 @v 5 @z 2 = @v 5 @z 4 0 (z 2 )w 24 + @v 5 @z 3 0 (z 2 )w 23 99

@v 5 @v 5 @z 1 @z 3 w2 w13 v1 v3 w35 w01 x w14 w02 v2 w23 w24 v4 w45 v5 = o @v 5 @z 5 @v 5 @v 5 @z 2 @z 4 This is call the back propagation algorithm 100

101

102

103

104

105

106

107

108

Cost function is a surface 109

x 1 w1 b=0 o l(y i,o i )=ky i o i k 2 x 2 w2 C(w 1,w 2 )= 1 n X i l(y i,o i )+0.2(w 2 1 + w 2 2) x 2 cost x 1 110 w1 w2

cost w2 w1 w j (t + 1) = w j (t) @C @w j 111

How to use data set? 1.Train a network 2.Keep it for future use 3.When previously unseen data comes in without labels, use the network to predict There is a problem here! How can we know when we keep the network at step (2), it is good enough to fulfil the task of step (3)? We got to find some ways to test it before we use the network 112

Training and Validation set Given an annotated data set, (x i,y i ),i=1, n randomly sample x% of it and keep aside. Train the network on the remaining (100-x)% Validate the network with the x% of the data that you have not yet show to the network Network is ready to use if validation results is good. Else, start troubleshoot. 113