Administrative Issues. CS5242 mirror site:

Size: px

Start display at page:

Download "Administrative Issues. CS5242 mirror site: https://web.bii.a-star.edu.sg/~leehk/cs5424.html"

Nelson Harrington
6 years ago
Views:

1 Administrative Issues CS5242 mirror site: 1

2 Assignments (10% * 4 = 40%) For each assignment It has multiple sub-tasks, including coding tasks You need to submit the answers, code and execution results 2

3 Assignments (10% * 4 = 40%) Quiz (20%) Final Project (40%) 3

4 Final Projects (40%) Projects will be held on Kaggle-in-class Register using your nus ( The project webpages will be announced once groups are finalized No data disclosure No additional data Each group <=2 students Each group will be randomly assigned to A computer vision project A natural language processing project Assessment is based on the report (15%) and ranking (25%) Top ranked groups of each project will be invited to do presentation at week 13. Will announce in IVLE once Kaggle is ready 4

5 Final Project (40%) Report template (15%) Problem definition Highlights (new algorithms, insights from the experiments) Dataset pre-processing description Training and testing procedure Experimental study (clarity, model understanding, and highlights are important for the assessment) Ranking -> score (25%) Rank top 10%, score ~25 (max score) Rank top 20%, score ~21 Rank top 40%, score ~17 Rank top 70%, score ~14 Rank others, score ~13 Prediction scores less than 10% better than random guess, score ~5 Final rank to grade distribution may be adjusted depending on overall class performance 5

6 Final projects timeline summary Date 27 August 18:00 04 November 18:00 Event Register your project IVLE 2 or less students per group Results submission deadline 05 November 18:00 11 November 18:00 Report deadline 12 November 18:00 12 November 18:00 Selection and notification of selected projects for presentation Selected project group send slides to cs5242@comp.nus.edu.sg Make appointment to meet Hwee Kuan to do presentation rehearsal 16 November 18:30 Project presentations by students 6

7 GPU Machine/Cluster NSCC (National Super-Computing Center) Free for NUS students 128 GPU nodes, each with a NVIDIA Tesla K40 (12 GB) Follow the manual and test program on IVLE workbin/misc/gpu to get yourself familiar with it. Google Cloud Platform 300 USD credit for new register NVIDIA Tesla K80 (2x12GB) is available for Region Taiwan Amazon EC2 (g2.8xlarge) 100 USD credit for students NVIDIA Grid GPU, 1536 CUDA cores, 4GB memory Available for Singapore region Tips: STOP/TERMINATE the instance immediately after your program terminates Check the usage status frequently to avoid high outstanding bills Use Amazon or Google cloud platform for debugging and use NSCC for actual training and tuning 7

8 Lecture 1: Introduction Lecture 2: Neural Network Basics Lecture 3: Training the Neural Network Lecture 4: Convolutional Neural Network Lecture 5: Convolutional Neural Network Lecture 6: Convolutional Neural Network Lecture 7: Recurrent Neural Network Lecture 8: Recurrent Neural Network Lecture 9: Recurrent Neural Network Lecture 10: Advanced Topic: Generative Adversarial Network Lecture 11: Advanced Topic: One-shot learning Lecture 12: Advanced Topic: Distributed training Lecture 13: Project demonstrations 8

9 Forward Propagation 9

10 The network and information flow feed in data into first layer transmitter receiver final layer presents the output of network transmitter receiver 10

11 Notation Let x 2 R d be the input space y 2 R y 2 N Let or be the label Let o 2 R be the output of the neural network 11

12 o x 1 x 2 x 2 x 1 12

13 x =(x 1,x 2 ) 2 R 2 o = (w 1 x 1 + w 2 x 2 + b) x 1 x 2 A o (x1,x2,o) A x 2 o (x1,x2) x 1 x 2 A (x1,x2) A = (x1, x2, o) x 1 13

14 x =(x 1,x 2 ) 2 R 2 o = (w 1 x 1 + w 2 x 2 + b) x 1 x 2 o x 2 A o A x 1 x 1 14

15 x =(x 1,x 2 ) 2 R 2 o = (w 1 x 1 + w 2 x 2 + b) x 1 x 2 o x 2 o x 1 x 1 x 2 15

16 x 2 o x 1 x 2 x 2 x 1 level sets form straight lines 16 x 1

17 Concept of level sets 17

18 Next to simplest x =(x 1,x 2 ) 2 R 2 z = w 1 x 1 + w 2 x 2 + b o = (w 1 x 1 + w 2 x 2 + b) x 1 x 2 o = (z) x 1 o x 2 lines of constant z x 1 x 2 18

19 o x 1 x 2 These three surfaces add to form a triangular decision boundary! 19

20 Adding functions f(x) function 1 + function 2 (f+g)(x) function 1 a h a+b 2h + A x = g(x) function 2 A x b h A x 20

21 h1 h2 h3 x 2 o x 1 x 2 x 2 x 1 21 x 1

22 h1 h2 h3 h 1 (x 1,x 2 ) h 2 (x 1,x 2 ) h 3 (x 1,x 2 ) + + x 1 x 2 x 1 x 2 x 1 x 2 w 1 h 1 (x 1,x 2 )+w 2 h 2 (x 1,x 2 )+w 3 h 3 (x 1,x 2 )+b x 2 22 x 1

23 If you add up enough step surfaces, are you able to form any functions? 23

24 neural network fingers activities 24

25 Activity one Take out a piece of paper, draw patterns as shown X X X X X o o o o o o Task: Cut (tear) with one straight line to completely separate the X and O 25

26 X X X X X Activity two X X X X X X o o o o X X X XX X X X X X X X o oo o oo oo o o o o X X X X X X X X X X X X X X XX X X X X X Cut (tear) with one straight line! 26

27 X X X X X Activity two X X X X X X o o o o X X X XX X X X X X X X o oo o oo oo o o o o X X X X X X X X X X X X X X XX X X X X X Cut (tear) with one straight line! 27

28 Activity three X X X X X X X X X X X X X X X X o o o o o o oo o oo oo o o o o o o o o o o o oo oo o o o o X X X X X X X X X X X X X X Cut (tear) with one straight line! 28

29 Activity three X X X X X X X X X X X X X X X X o o o o o o oo o oo oo o o o o o o o o o o o oo oo o o o o X X X X X X X X X X X X X X Cut (tear) with one straight line! 29

30 Activity four X o X X o X X X X X o oo o o o X X X X X X X X X X X o o o o o X X X X X X X X XX X X Cut (tear) with one straight line! 30

31 Manifold view of neural network feed in data into first layer final layer presents the output of network data hidden layer output 31

32 Function view of neural network h 1 (x 1,x 2 ) h 2 (x 1,x 2 ) h 3 (x 1,x 2 ) + + x 1 x 2 x 2 x 1 x 1 x 2 x 2 x 1 32

33 x 1 x 2 w (1) 1 w (1) 2 b (1) h (1) Follow the data point x 2 A (x1,x2) (x1,x2,h1) A x 2 A (x1,x2) x 1 x 1 x 2 w (2) 1 w (2) 2 b (2) h (2) x 1 (x1,x2,h2) A x 2 A (x1,x2) (x A 1,x A 2 ) 7! (h A 1,h A 2 ) x 1 33

34 x 1 w (2) 1 w (1) 1 w (1) 2 h (1) b (2) x w (2) 2 2 h (2) b (1) x 2 A (x A 1,x A 2 ) h 2 A (h A 1,h A 2 ) x 1 h 1 (x A 1,x A 2 ) 7! (h A 1,h A 2 ) 34

35 x 1 w (2) 1 w (1) 1 w (1) 2 h (1) b (2) w (2) x 2 2 h (2) b (1) x 2 A h 2 A x 1 Neighbourhood relationship is conserved h 1 35

36 continuous mapping x 2 h 2 x 1 h 1 (x A 1,x A 2 ) 7! (h A 1,h A 2 ) 36

37 x 1 h 1 h 2 x 2 h 3 x 2 A h 3 A (x A 1,x A 2 ) (h A 1,h A 2,h A 3 ) h 2 x 1 h 1 37

38 x 1 h 1 h 2 x 2 h 3 x 2 x 1 38

39 x 1 w (2) 1 w (1) 1 b (2) x 2 h (2) w (2) 2 w (1) 2 b (1) h (1) w (3) 1 w (3) 2 x 2 o x 1 h 2 h 1 h (1) = (w (1) 1 x 1 + w (1) 2 x 2 + b (1) ) h (2) = (w (2) 1 x 1 + w (2) 2 x 2 + b (2) ) o = (w (3) 1 h(1) + w (3) 2 h(2) + b (3) ) 39 o

40 Some real life examples courtesy of our TA Connie Kou source code will be available online 40

41 The Angle Data The XOR Data The Ring Data 41

42 The Angle Data - ReLu 42

43 The Angle Data - ReLu 43

44 The Angle Data - ReLu 44

45 The Angle Data - ReLu 45

46 The Angle Data - ReLu 46

47 The Angle Data - ReLu 47

48 The Angle Data - ReLu 48

49 The Angle Data - ReLu 49

50 The Angle Data - ReLu 50

51 The Angle Data - ReLu 51

52 The Angle Data - ReLu 52

53 The Angle Data - Sigmoid 53

54 The Angle Data - Sigmoid 54

55 The Angle Data - Sigmoid 55

56 The Angle Data - Sigmoid 56

57 The Angle Data - Sigmoid 57

58 The Angle Data - Sigmoid 3 hidden nodes 58

59 The Angle Data - Sigmoid 59

60 The Angle Data - Sigmoid 60

61 The XOR Data - ReLU 61

62 The XOR Data - ReLU 62

63 The XOR Data - ReLU 63

64 The XOR Data - ReLU 64

65 The XOR Data - ReLU 65

66 The XOR Data - ReLU 66

67 The XOR Data - ReLU 67

68 The XOR Data - Sigmoid 68

69 The XOR Data - Sigmoid 69

70 The XOR Data - Sigmoid 70

71 The Ring Data - Sigmoid how the manifold is being fold? 71

72 The Ring Data - Sigmoid 72

73 73

74 74

75 75

76 76

77 77

78 78

79 79

80 80

81 81

82 Spiral 82

83 How about this network? Fully connected 83

84 Feed backward 84

85 A general objective Tweak the output of neural network to be as similar to the label as possible. o y How to tweak? Adjust the weights 85

86 Given an example (x, y) x o o! y For this example, the neural network behaves well 86

87 After the first example, give another example (x 0,?) x 0 o 0 Now we say for a well behaved neural network o 0 = y 0 87

88 How good is the trained network? x 0 o 0 88

89 Square error Given data points (x i,y i ),i=1, n l(y i,o i )=ky i o i k 2 C = 1 n X l(y i,o i ) i o 1,y 1 o 2,y 2... x n,x n 1, x 2,x 1 o n,y n Adjust the weights until C = 0 (or close to zero) 89

90 Cross entropy cost function Two class example y i 2 {0, 1} C = 1 n P i y i log[o(x i )] + (1 y i ) log[1 o(x i )] Three class example y i 2 {(1, 0, 0), (0, 1, 0), (0, 0, 1)} m =exp(v 1 )+exp(v 2 )+exp(v 3 ) o i =( exp(v 1) m, exp(v 2) m, exp(v 3) m ) l(y i,o i )= y i log(o i )... output v 1 v2 v 3 C = 1 n X i l(y i,o i ) 90

91 How to adjust the weights? Conceptually, this will work: Try all possible weights combinations, for all weights combinations, calculate the cost. Then pick the weight combination that has the lowest cost. Practically, there are too many weights combinations to try. 91

92 Gradient descend x w1 w2 w2 w3 v1 v2 v3 v 1 = (z 1 )= (w 1 x + b 1 ) v 2 = (z 2 )= (w 2 v 1 + b 2 ) v 3 = (z 3 )= (w 3 v 2 + b j =0 92

93 j = 1 n = 2 n X i X i o i ) j (y i o i j We just j w j (t + 1) = w j 93

94 x w1 w2 w2 w3 v1 v2 v3 = o v 1 = (z 1 )= (w 1 x + b 1 ) v 2 = (z 2 )= (w 2 v 1 + b 2 ) v 3 = (z 3 )= (w 3 v 2 + b = 0 (z 3 )v = 0 (z 3 )w 2 = 0 (z 3 )w 3 0 (z 2 )v = 0 (z 3 )w 2 = 0 (z 3 )w 3 0 (z 2 )w 1 = 0 (z 3 )w 3 0 (z 2 )w 2 0 (z

95 x w1 w2 w2 w3 v1 v2 v3 = = 0 (z 3 )v = 0 (z 3 )w 2 = 0 (z 3 )w 3 0 (z 2 )v = 0 (z 3 )w 2 = 0 (z 3 )w 3 0 (z 2 )w 1 = 0 (z 3 )w 3 0 (z 2 )w 2 0 (z 1 1 problem!! : long mathematical expression leads to large computational time for deep network 95

96 Compute and store strategy x w1 w2 w2 w3 v1 v2 v3 = 3 = 0 (z w 3 0 (z w 2 0 (z 1 ) 96

97 Compute and store strategy x w1 w2 w2 w3 v1 v2 v3 = v v x j? homework question

98 w2 3 v1 v3 w35 w01 x w14 w02 v2 w23 w24 v4 w45 v5 = (z 4 )w 45 98

99 w2 3 v1 v3 w35 w01 x w14 w02 v2 w23 w24 v4 w45 v5 = (z 2 )w (z 2 )w 23 99

100 @v 3 w2 w13 v1 v3 w35 w01 x w14 w02 v2 w23 w24 v4 w45 v5 = 4 This is call the back propagation algorithm 100

101 101

102 102

103 103

104 104

105 105

106 106

107 107

108 108

109 Cost function is a surface 109

110 x 1 w1 b=0 o l(y i,o i )=ky i o i k 2 x 2 w2 C(w 1,w 2 )= 1 n X i l(y i,o i )+0.2(w w 2 2) x 2 cost x w1 w2

111 cost w2 w1 w j (t + 1) = w j 111

112 How to use data set? 1.Train a network 2.Keep it for future use 3.When previously unseen data comes in without labels, use the network to predict There is a problem here! How can we know when we keep the network at step (2), it is good enough to fulfil the task of step (3)? We got to find some ways to test it before we use the network 112

113 Training and Validation set Given an annotated data set, (x i,y i ),i=1, n randomly sample x% of it and keep aside. Train the network on the remaining (100-x)% Validate the network with the x% of the data that you have not yet show to the network Network is ready to use if validation results is good. Else, start troubleshoot. 113

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Machine Learning for Large-Scale Data Analysis and Decision Making 80-629-17A Neural Networks Week #6 Today Neural Networks A. Modeling B. Fitting C. Deep neural networks Today s material is (adapted)