>TensorFlow and deep learning_

Similar documents
introduction to convolutional networks using tensorflow

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

Crash Course on TensorFlow! Vincent Lepetit!

Deep Learning In An Afternoon

Introduction to TensorFlow

(Artificial) Neural Networks in TensorFlow

Tensor Flow. Tensors: n-dimensional arrays Vector: 1-D tensor Matrix: 2-D tensor

(Artificial) Neural Networks in TensorFlow

Stephen Scott.

Deep Learning In An Afternoon

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

TensorFlow: A Framework for Scalable Machine Learning

TensorFlow. Dan Evans

Introduction to TensorFlow

Deep Neural Networks (1) Hidden layers; Back-propagation

Machine Learning for Computer Vision 8. Neural Networks and Deep Learning. Vladimir Golkov Technical University of Munich Computer Vision Group

Recurrent Neural Network

Recurrent Neural Network

Convolutional Neural Networks. Srikumar Ramalingam

CSCI 315: Artificial Intelligence through Deep Learning

TF Mutiple Hidden Layers: Regression on Boston Data Batched, Parameterized, with Dropout

Deep Learning: Pre- Requisites. Understanding gradient descent, autodiff, and softmax

Jakub Hajic Artificial Intelligence Seminar I

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

Deep Learning Lab Course 2017 (Deep Learning Practical)

Deep Neural Networks (1) Hidden layers; Back-propagation

Introduction to Convolutional Neural Networks 2018 / 02 / 23

Based on the original slides of Hung-yi Lee

MNIST Example Kailai Xu September 8, This is one of the series notes on deep learning. The short note and code is based on [1].

CSC321 Lecture 5: Multilayer Perceptrons

Introduction to Neural Networks

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Neural Networks. David Rosenberg. July 26, New York University. David Rosenberg (New York University) DS-GA 1003 July 26, / 35

CSC 498R: Internet of Things 2

What Do Neural Networks Do? MLP Lecture 3 Multi-layer networks 1

Deep Feedforward Networks. Lecture slides for Chapter 6 of Deep Learning Ian Goodfellow Last updated

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

Machine Learning for Large-Scale Data Analysis and Decision Making A. Neural Networks Week #6

Artificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino

Introduction to Machine Learning (67577)

CSC 578 Neural Networks and Deep Learning

CSE 190 Fall 2015 Midterm DO NOT TURN THIS PAGE UNTIL YOU ARE TOLD TO START!!!!

@SoyGema GEMA PARREÑO PIQUERAS

Source localization in an ocean waveguide using supervised machine learning

Introduction to Convolutional Neural Networks (CNNs)

Neural Networks and Deep Learning

Regularization in Neural Networks

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Deep Feedforward Networks. Han Shao, Hou Pong Chan, and Hongyi Zhang

text classification 3: neural networks

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Convolutional Neural Network. Hung-yi Lee

Classification with Perceptrons. Reading:

CS 1674: Intro to Computer Vision. Final Review. Prof. Adriana Kovashka University of Pittsburgh December 7, 2016

Convolutional Neural Networks II. Slides from Dr. Vlad Morariu

Machine Learning. Boris

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

EE-559 Deep learning Recurrent Neural Networks

Understanding Neural Networks : Part I

Neural networks (NN) 1

SGD and Deep Learning

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Neural networks and support vector machines

ECE521: Inference Algorithms and Machine Learning University of Toronto. Assignment 1: k-nn and Linear Regression

Deep Learning (CNNs)

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Recurrent Neural Networks with Flexible Gates using Kernel Activation Functions

CS60010: Deep Learning

CSC 411 Lecture 10: Neural Networks

Introduction to Neural Networks

Convolutional Neural Networks

Convolutional Neural Network Architecture

Apprentissage, réseaux de neurones et modèles graphiques (RCP209) Neural Networks and Deep Learning

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Neural Networks Lecturer: J. Matas Authors: J. Matas, B. Flach, O. Drbohlav

Introduction to Deep Learning CMPT 733. Steven Bergner

Online Videos FERPA. Sign waiver or sit on the sides or in the back. Off camera question time before and after lecture. Questions?

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Neural Network Tutorial & Application in Nuclear Physics. Weiguang Jiang ( 蒋炜光 ) UTK / ORNL

Dark knowledge. Geoffrey Hinton, Oriol Vinyals & Jeff Dean) Google

Supporting Information

Introduction to (Convolutional) Neural Networks

Artificial Neural Networks

WaveNet: A Generative Model for Raw Audio

ECE521 Lectures 9 Fully Connected Neural Networks

Topics in AI (CPSC 532L): Multimodal Learning with Vision, Language and Sound. Lecture 3: Introduction to Deep Learning (continued)

From perceptrons to word embeddings. Simon Šuster University of Groningen

Tasks ADAS. Self Driving. Non-machine Learning. Traditional MLP. Machine-Learning based method. Supervised CNN. Methods. Deep-Learning based

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

Deep Learning: Self-Taught Learning and Deep vs. Shallow Architectures. Lecture 04

Tips for Deep Learning

Sequence Modeling with Neural Networks

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018

Lecture 35: Optimization and Neural Nets

DeepLearning on FPGAs

HOMEWORK #4: LOGISTIC REGRESSION

CS224N: Natural Language Processing with Deep Learning Winter 2017 Midterm Exam

Comments. Assignment 3 code released. Thought questions 3 due this week. Mini-project: hopefully you have started. implement classification algorithms

Final Examination CS 540-2: Introduction to Artificial Intelligence

Transcription:

>TensorFlow and deep learning_ without a PhD deep Science! #Tensorflow deep Code... @martin_gorner

Hello World: handwritten digits classification - MNIST? MNIST = Mixed National Institute of Standards and Technology - Download the dataset at http://yann.lecun.com/exdb/mnist/

Very simple model: softmax classification 784 pixels 28x28 pixels... softmax weighted sum of all pixels + bias... 0 1 2 9 neuron outputs 3

In matrix notation, 100 images at a time 10 columns X : 100 images, one per line, flattened x x x x x x x w783,0 L0,0 L1,0 L2,0 L3,0 L4,0 784 pixels w0,1 w1,1 w2,1 w3,1 w4,1 w5,1 w6,1 w7,1 w8,1 L99,0 w0,2 w0,3 w1,2 w1,3 w2,2 w2,3 w3,2 w3,3 w4,2 w4,3 w5,2 w5,3 w6,2 w6,3 w7,2 w7,3 w8,2 w8,3 w783,1 w783,2 L0,1 L1,1 L2,1 L3,1 L4,1 L0,2 L0,3 L1,2 L1,3 L2,2 L2,3 L3,2 L3,3 L4,2 L4,3 L99,1 L99,2 w0,9 w1,9 w2,9 w3,9 w4,9 w5,9 w6,9 w7,9 w8,9 broadcast 784 lines x w0,0 w1,0 w2,0 w3,0 w4,0 w5,0 w6,0 w7,0 w8,0 w783,9 L0,9 L1,9 L2,9 L3,9 L4,9 L99,9 + + b 0 b1 b2 b3 b9 Same biases on all lines

Softmax, on a batch of images Predictions Images Weights Biases Y[100, 10] X[100, 748] W[748,10] b[10] applied line by line matrix multiply broadcast on all lines tensor shapes in [ ] 5

Now in TensorFlow (Python) tensor shapes: X[100, 748] W[748,10] b[10] Y = tf.nn.softmax(tf.matmul(x, W) + b) matrix multiply broadcast on all lines 6

Success? 0 1 2 3 4 5 6 7 8 9 0 0 0 0 0 0 1 0 0 0 actual probabilities, one-hot encoded Cross entropy: this is a 6 computed probabilities 0.1 0.2 0.1 0.3 0.2 0.1 0.9 0.2 0.1 0.1 0 1 2 3 4 5 6 7 8 9 7

Demo

92%

TensorFlow - initialisation import tensorflow as tf this will become the batch size, 100 X = tf.placeholder(tf.float32, [None, 28, 28, 1]) W = tf.variable(tf.zeros([784, 10])) b = tf.variable(tf.zeros([10])) 28 x 28 grayscale images init = tf.initialize_all_variables() Training = computing variables W and b 10

TensorFlow - success metrics flattening images # model Y = tf.nn.softmax(tf.matmul(tf.reshape(x, [-1, 784]), W) + b) # placeholder for correct answers Y_ = tf.placeholder(tf.float32, [None, 10]) one-hot encoded # loss function cross_entropy = -tf.reduce_sum(y_ * tf.log(y)) one-hot decoding # % of correct answers found in batch is_correct = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(is_correct, tf.float32)) 11

TensorFlow - training learning rate optimizer = tf.train.gradientdescentoptimizer(0.003) train_step = optimizer.minimize(cross_entropy) loss function 12

TensorFlow - run! sess = tf.session() sess.run(init) for i in range(1000): # load batch of images and correct answers batch_x, batch_y = mnist.train.next_batch(100) train_data={x: batch_x, Y_: batch_y} running a Tensorflow computation, feeding placeholders # train sess.run(train_step, feed_dict=train_data) # success? Tip: do this every 100 iterations a,c = sess.run([accuracy, cross_entropy], feed_dict=train_data) # success on test data? test_data={x: mnist.test.images, Y_: mnist.test.labels} a,c = sess.run([accuracy, cross_entropy, It], feed=test_data)

TensorFlow - full python code training step import tensorflow as tf initialisation X = tf.placeholder(tf.float32, [None, 28, 28, 1]) W = tf.variable(tf.zeros([784, 10])) b = tf.variable(tf.zeros([10])) init = tf.initialize_all_variables() model # model Y=tf.nn.softmax(tf.matmul(tf.reshape(X,[-1, 784]), W) + b) optimizer = tf.train.gradientdescentoptimizer(0.003) train_step = optimizer.minimize(cross_entropy) sess = tf.session() sess.run(init) for i in range(10000): # load batch of images and correct answers batch_x, batch_y = mnist.train.next_batch(100) train_data={x: batch_x, Y_: batch_y} # placeholder for correct answers Y_ = tf.placeholder(tf.float32, [None, 10]) # train sess.run(train_step, feed_dict=train_data) # loss function cross_entropy = -tf.reduce_sum(y_ * tf.log(y)) # success? add code to print it a,c = sess.run([accuracy, cross_entropy], feed=train_data) # % of correct answers found in batch is_correct = tf.equal(tf.argmax(y,1), tf.argmax(y_,1)) accuracy = tf.reduce_mean(tf.cast(is_correct,tf.float32)) # success on test data? test_data={x:mnist.test.images, Y_:mnist.test.labels} a,c = sess.run([accuracy, cross_entropy], feed=test_data) success metrics Run 14

Cookbook Softmax Cross-entropy Mini-batch ML

! p e e d Go

Let s try 5 fully-connected layers! ;-) overkill 784 200 100 sigmoïd function 60 30 softmax 10 0 1 2... 9 17

TensorFlow - initialisation K L M N = = = = 200 100 60 30 weights initialised with random values W1 = tf.variable(tf.truncated_normal([28*28, K],stddev=0.1)) B1 = tf.variable(tf.zeros([k]) W2 = tf.variable(tf.truncated_normal([k, L], stddev=0.1)) B2 = tf.variable(tf.zeros([l]) W3 B3 W4 B4 W5 B5 = = = = = = tf.variable(tf.truncated_normal([l, M], stddev=0.1)) tf.variable(tf.zeros([m]) tf.variable(tf.truncated_normal([m, N], stddev=0.1)) tf.variable(tf.zeros([n]) tf.variable(tf.truncated_normal([n, 10], stddev=0.1)) tf.variable(tf.zeros([10])) 18

TensorFlow - the model weights and biases X = tf.reshape(x, [-1, 28*28]) Y1 = tf.nn.sigmoid(tf.matmul(x, W1) + B1) Y2 = tf.nn.sigmoid(tf.matmul(y1, W2) + B2) Y3 = tf.nn.sigmoid(tf.matmul(y2, W3) + B3) Y4 = tf.nn.sigmoid(tf.matmul(y3, W4) + B4) Y = tf.nn.softmax(tf.matmul(y4, W5) + B5) 19

Demo - slow start? 20

Relu!

RELU RELU = Rectified Linear Unit Y = tf.nn.relu(tf.matmul(x, W) + b) 22

RELU 23

Demo - noisy accuracy curve? yuck!

Slow down... Learning rate decay

Learning rate decay Learning rate 0.003 at start then dropping exponentially to 0.0001 26

Demo - dying neurons Dy ing... 27

Dropout

Dropout pkeep = tf.placeholder(tf.float32) TRAINING pkeep=0.75 EVALUATION pkeep=1 Yf = tf.nn.relu(tf.matmul(x, W) + B) Y = tf.nn.dropout(yf, pkeep) 29

Dropout Dy d ea D ing... with dropout 30

Demo

98% 32

All the party tricks 98.2% 97.9% sustained peak Sigmoid,decaying RELU, learning learningrate learning rate= =0.003 0.003 rate 0.003 -> 0.0001 and dropout 0.75 33

Overfitting Cross-entropy loss Overfitting 34

Overfitting?!? Too many neurons BAD Network Not enough DATA

Convolutional layer convolutional subsampling +padding convolutional subsampling convolutional subsampling stride W1[4, 4, 3] W2[4, 4, 3] W[4, 4, 3, 2] filter size input channels output channels 36

Hacker s tip ALL Convolutional

Convolutional neural network + biases on all layers 28x28x1 28x28x4 14x14x8 7x7x12 200 10 convolutional layer, 4 channels W1[5, 5, 1, 4] stride 1 convolutional layer, 8 channels W2[4, 4, 4, 8] stride 2 convolutional layer, 12 channels W3[4, 4, 8, 12] stride 2 fully connected layer W4[7x7x12, 200] softmax readout layer W5[200, 10]

Tensorflow - initialisation filter size K=4 L=8 M=12 W1 B1 W2 B2 W3 B3 input channels = tf.variable(tf.truncated_normal([5, output channels 5, 1, K],stddev=0.1)) = tf.variable(tf.ones([k])/10) = tf.variable(tf.truncated_normal([5, 5, K, L],stddev=0.1)) = tf.variable(tf.ones([l])/10) = tf.variable(tf.truncated_normal([4, 4, L, M],stddev=0.1)) = tf.variable(tf.ones([m])/10) weights initialised with random values N=200 W4 B4 W5 B5 = = = = tf.variable(tf.truncated_normal([7*7*m, N],stddev=0.1)) tf.variable(tf.ones([n])/10) tf.variable(tf.truncated_normal([n, 10],stddev=0.1)) tf.variable(tf.zeros([10])/10)

Tensorflow - the model input image batch X[100, 28, 28, 1] weights stride biases Y1 = tf.nn.relu(tf.nn.conv2d(x, W1, strides=[1, 1, 1, 1], padding='same') + B1) Y2 = tf.nn.relu(tf.nn.conv2d(y1, W2, strides=[1, 2, 2, 1], padding='same') + B2) Y3 = tf.nn.relu(tf.nn.conv2d(y2, W3, strides=[1, 2, 2, 1], padding='same') + B3) YY = tf.reshape(y3, shape=[-1, 7 * 7 * M]) Y4 = tf.nn.relu(tf.matmul(yy, W4) + B4) Y = tf.nn.softmax(tf.matmul(y4, W5) + B5) flatten all values for fully connected layer Y3 [100, 7, 7, 12] YY [100, 7x7x12] 40

Demo

98.9% 42

WTFH?????? 43

Bigger convolutional network + dropout + biases on all layers 28x28x1 28x28x6 convolutional layer, 6 W1[6, 6, 1, 6] stride 1 14x14x12 convolutional layer, 12 channels W2[5, 5, 6, 12] stride 2 convolutional layer, 24 W3[4, 4, 12, 24] stride 2 7x7x24 200 10 channels +DROPOUT p=0.75 channels fully connected layer W4[7x7x24, 200] softmax readout layer W5[200, 10]

Demo

99.3% 46

YEAH! with dropout 47

Learning rate decay Relu! Softmax Cross-entropy Mini-batch Go deep Dropout ALL Convolutional Overfitting?!? Too many neurons BAD Network Not enough DATA Cartoon images copyright: alexpokusay / 123RF stock photos

Have fun! - cloud.google.com Cloud ML ALPHA your TensorFlow models trained in Google s cloud, fast. Pre-trained models: tensorflow.org Martin Görner plus.google.com/+martingorner This presentation: goo.gl/phexe7 Cloud Vision API Cloud Speech API ALPHA Google Developer relations @martin_gorner All code snippets are on GitHub: github.com/martingorner/tensorflowmnist-tutorial Google Translate API That s all folks... 49

Workshop Keyboard shortcuts for the visualisation GUI: 1st graph only 2nd graph only 3rd graph only 4th graph only 5th graph only 6th graph only graphs 1 and 2 graphs 4 and 5 graphs 3 and 6 displaying all graphs 1... 2... 3... 4... 5... 6... 7... 8... 9... ESC or 0.. display display display display display display display display display back to SPACE... O... H... Ctrl-S... pause/resume box zoom mode (then use mouse) reset all zooms save current image 50

Starter code and solutions: github.com/martin-gorner/tensorflow-mnist-tutorial Workshop 1. Theory (sit back and listen) Softmax classifier, mini-batch, cross-entropy and how to implement them in Tensorflow (slides 1-14) 6. Practice Replace all your sigmoids with RELUs. Test. Then add learning rate decay from 0.003 to 0.0001 using the formula lr = lrmin+(lrmax-lrmin)*exp(-i/2000). Solution in: mnist_2.1_five_layers_relu_lrdecay.py 2. Practice Open file: mnist_1.0_softmax.py Run it, play with the visualisations (see instructions on previous slide), read and understand the code as well as the basic structure of a Tensorflow program. 3. Theory (sit back and listen) 7. Practice (if time allows) Add dropout on all layers using a value between 0.5 and 0.8 for pkeep. Solution in: mnist_2.2_five_layers_relu_lrdecay_dropout.py 8. Theory (sit back and listen) Hidden layers, sigmoid activation function (slides 16-19) Convolutional networks (slides 36-42) 4. Practice 9. Practice Start from the file you have and add one or two hidden layers. Use cross_entropy_with_logits to avoid numerical instabilities with log(0). Replace your model with a convolutional network, without dropout. Solution in: mnist_2.0_five_layers_sigmoid.py 5. Theory (sit back and listen) The neural network toolbox: RELUs, learning rate decay, dropout, overfitting (slides 20-35) Solution in: mnist_3.0_convolutional.py 10. Practice (if time allows) Try a bigger neural network (good hyperparameters on slide 44) and add dropout on the last layer. Solution in: mnist_3.0_convolutional_bigger_dropout.py