(Artificial) Neural Networks in TensorFlow

Similar documents
(Artificial) Neural Networks in TensorFlow

Recurrent Neural Network

Recurrent Neural Network

Introduction to TensorFlow

Crash Course on TensorFlow! Vincent Lepetit!

TensorFlow. Dan Evans

APPLIED DEEP LEARNING PROF ALEXIEI DINGLI

introduction to convolutional networks using tensorflow

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018

Perceptron. by Prof. Seungchul Lee Industrial AI Lab POSTECH. Table of Contents

MNIST Example Kailai Xu September 8, This is one of the series notes on deep learning. The short note and code is based on [1].

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

>TensorFlow and deep learning_

Neural networks (NN) 1

Linear Classification

Neural Networks with Applications to Vision and Language. Feedforward Networks. Marco Kuhlmann

Stephen Scott.

Neural Networks, Computation Graphs. CMSC 470 Marine Carpuat

TF Mutiple Hidden Layers: Regression on Boston Data Batched, Parameterized, with Dropout

TensorFlow: A Framework for Scalable Machine Learning

Learning Deep Architectures for AI. Part II - Vijay Chakilam

Neural Networks and Deep Learning

Machine Learning and Data Mining. Multi-layer Perceptrons & Neural Networks: Basics. Prof. Alexander Ihler

Neural Networks: Backpropagation

Introduction to Natural Computation. Lecture 9. Multilayer Perceptrons and Backpropagation. Peter Lewis

Artificial Intelligence

Machine Learning. Boris

Neural Networks. Intro to AI Bert Huang Virginia Tech

Multilayer Perceptrons and Backpropagation

ECE521 Lecture 7/8. Logistic Regression

Supervised Learning. 1. Optimization. without Scikit Learn. by Prof. Seungchul Lee Industrial AI Lab

Introduction to TensorFlow

Introduction to Neural Networks

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Supervised Learning. George Konidaris

Neural Networks. Nicholas Ruozzi University of Texas at Dallas

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

CSC 498R: Internet of Things 2

Error Backpropagation

DM534 - Introduction to Computer Science

BACKPROPAGATION. Neural network training optimization problem. Deriving backpropagation

Classification with Perceptrons. Reading:

4. Multilayer Perceptrons

Introduction to Neural Networks

Data Mining Part 5. Prediction

Notes on Back Propagation in 4 Lines

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

Artificial Neural Networks. MGS Lecture 2

CS 6501: Deep Learning for Computer Graphics. Basics of Neural Networks. Connelly Barnes

Need for Deep Networks Perceptron. Can only model linear functions. Kernel Machines. Non-linearity provided by kernels

Statistical Machine Learning from Data

MACHINE LEARNING AND PATTERN RECOGNITION Fall 2006, Lecture 4.1 Gradient-Based Learning: Back-Propagation and Multi-Module Systems Yann LeCun

Multilayer Perceptron

Nonlinear Classification

Neural Networks and the Back-propagation Algorithm

Grundlagen der Künstlichen Intelligenz

How to do backpropagation in a brain

AN INTRODUCTION TO NEURAL NETWORKS. Scott Kuindersma November 12, 2009

CSC321 Lecture 5: Multilayer Perceptrons

Deep Neural Networks (1) Hidden layers; Back-propagation

Deep Neural Networks (1) Hidden layers; Back-propagation

HOMEWORK #4: LOGISTIC REGRESSION

ARTIFICIAL NEURAL NETWORK PART I HANIEH BORHANAZAD

Tensor Flow. Tensors: n-dimensional arrays Vector: 1-D tensor Matrix: 2-D tensor

Neural Networks Learning the network: Backprop , Fall 2018 Lecture 4

Computational Graphs, and Backpropagation

Introduction to Neural Networks

CSE 352 (AI) LECTURE NOTES Professor Anita Wasilewska. NEURAL NETWORKS Learning

Machine Learning. Neural Networks. (slides from Domingos, Pardo, others)

Logistic Regression & Neural Networks

y(x n, w) t n 2. (1)

Lecture 4: Perceptrons and Multilayer Perceptrons

COMP 551 Applied Machine Learning Lecture 14: Neural Networks

Deep neural networks and fraud detection

Welcome to the Machine Learning Practical Deep Neural Networks. MLP Lecture 1 / 18 September 2018 Single Layer Networks (1) 1

CSE 417T: Introduction to Machine Learning. Lecture 11: Review. Henry Chai 10/02/18

Machine Learning (CSE 446): Neural Networks

Neural networks. Chapter 20. Chapter 20 1

Jakub Hajic Artificial Intelligence Seminar I

CS 179: LECTURE 16 MODEL COMPLEXITY, REGULARIZATION, AND CONVOLUTIONAL NETS

(Feed-Forward) Neural Networks Dr. Hajira Jabeen, Prof. Jens Lehmann

Computational statistics

CSC 411 Lecture 10: Neural Networks

Artificial Neural Networks

Backpropagation: The Good, the Bad and the Ugly

Artificial Neural Networks

Neural networks. Chapter 19, Sections 1 5 1

@SoyGema GEMA PARREÑO PIQUERAS

Neural Networks biological neuron artificial neuron 1

Variational Inference in TensorFlow. Danijar Hafner Stanford CS University College London, Google Brain

Classification goals: Make 1 guess about the label (Top-1 error) Make 5 guesses about the label (Top-5 error) No Bounding Box

Lecture 2: Learning with neural networks

AI Programming CS F-20 Neural Networks

A Tutorial On Backward Propagation Through Time (BPTT) In The Gated Recurrent Unit (GRU) RNN

Plan. Perceptron Linear discriminant. Associative memories Hopfield networks Chaotic networks. Multilayer perceptron Backpropagation

Introduction to Convolutional Neural Networks (CNNs)

Lab 5: 16 th April Exercises on Neural Networks

Feedforward Neural Nets and Backpropagation

MIDTERM: CS 6375 INSTRUCTOR: VIBHAV GOGATE October,

Deep Learning: a gentle introduction

Transcription:

(Artificial) Neural Networks in TensorFlow By Prof. Seungchul Lee Industrial AI Lab http://isystems.unist.ac.kr/ POSTECH Table of Contents I. 1. Recall Supervised Learning Setup II. 2. Artificial Neural Networks I. 2.1. Perceptron for h(θ) or h(ω) II. 2.2. Multi-layer Perceptron = Artificial Neural Networks (ANN) III. 3. Training Neural Networks I. 3.1. Optimization II. 3.2. Loss Function III. 3.3. Learning IV. 3.4. Deep Learning Libraries IV. 4. TensorFlow I. 4.1. Import Library II. 4.2. Load MNIST Data III. 4.3. Build a Model IV. 4.4. Define the ANN's Shape V. 4.5. Define Weights, Biases and Network VI. 4.6. Define Cost, Initializer and Optimizer VII. 4.7. Summary of Model VIII. 4.8. Define Configuration IX. 4.9. Optimization X. 4.10. Test

1. Recall Supervised Learning Setup Input features x (i) Ouput y (i) R n Model parameters Hypothesis function Loss function θ R k : y h θ R n l : y y R + Machine learning optimization problem min θ m i=1 (possibly plus some additional regularization) l( h θ ( x ), ) (i) y (i) But, many specialized domains required highly engineered special features

2. Artificial Neural Networks h(θ) 2.1. Perceptron for or h(ω) Neurons compute the weighted sum of their inputs A neuron is activated or fired when the sum a is positive a o = + + ω 0 ω 1 x 1 = σ( + + ) ω 0 ω 1 x 1 A step function is not differentiable One layer is often not enough

2.2. Multi-layer Perceptron = Artificial Neural Networks (ANN) multi-neurons differentiable activation function

in a compact representation multi-layer perceptron

Transformation Affine (or linear) transformation and nonlinear activation layer (notations are mixed: g = σ, ω = θ, = b ) ω 0 o(x) = g ( x + b) Nonlinear activation functions ( g = σ) θ T g(x) = 1 1 + e x g(x) = tanh(x) g(x) = max(0, x) Structure A single layer is not enough to be able to represent complex relationship between input and output perceptrons with many layers and units o 2 = σ 2 ( θ T 2 o 1 + b 2 ) = σ 2 ( θ T 2 σ 1 ( θ T 1 x + b 1 ) + b 2 )

Linear Classifier Perceptron tries to separate the two classes of data by dividing them with a line Neural Networks The hidden layer learns a representation so that the data is linearly separable

3. Training Neural Networks = Learning or estimating weights and biases of multi-layer perceptron from training data 3.1. Optimization 3 key components 1. objective function f( ) 2. decision variable or unknown θ 3. constraints g( ) In mathematical expression min θ subject to f(θ) g i (θ) 0, i = 1,, m 3.2. Loss Function Measures error between target values and predictions min θ m i=1 l( h θ ( x ), ) (i) y (i) Example Squared loss (for regression): 1 N N i=1 ( h θ ( x ) ) (i) y (i) 2 Cross entropy (for classification): N 1 y log( ( )) +(1 ) log(1 ( )) (i) h θ x (i) y (i) h θ x (i) N i=1

3.3. Learning Backpropagation Forward propagation the initial information propagates up to the hidden units at each layer and finally produces output Backpropagation allows the information from the cost to flow backwards through the network in order to compute the gradients Chain Rule Computing the derivative of the composition of functions f(g(x) ) = f (g(x)) g (x) dz dx dz dw dz du dz = dy dz dy dz dy dy dx dy = ( ) dx dy dx dw dx dw = ( ) dx Backpropagation Update weights recursively dw du

(Stochastic) Gradient Descent Negative gradients points directly downhill of the cost function We can decrease the cost by moving in the direction of the negative gradient ( is a learning rate) α θ := θ α θ ( h θ ( x ), ) (i) y (i)

Optimization procedure It is not easy to numerically compute gradients in network in general. The good news: people have already done all the "hardwork" of developing numerical solvers (or libraries) There are a wide range of tools Summary Learning weights and biases from data using gradient descent

3.4. Deep Learning Libraries Caffe Platform: Linux, Mac OS, Windows Written in: C++ Interface: Python, MATLAB Theano Platform: Cross-platform Written in: Python Interface: Python Tensorflow Platform: Linux, Mac OS, Windows Written in: C++, Python Interface: Python, C/C++, Java, Go, R 4. TensorFlow TensorFlow (https://www.tensorflow.org) is an open-source software library for deep learning. Computational Graph tf.constant tf.variable tf.placeholder

In [1]: import tensorflow as tf a = tf.constant([1, 2, 3]) b = tf.constant([4, 5, 6]) A = a + b B = a * b In [2]: A Out[2]: <tf.tensor 'add:0' shape=(3,) dtype=int32> In [3]: B Out[3]: <tf.tensor 'mul:0' shape=(3,) dtype=int32> To run any of the three defined operations, we need to create a session for that graph. The session will also allocate memory to store the current value of the variable.

In [4]: sess = tf.session() sess.run(a) Out[4]: array([5, 7, 9], dtype=int32) In [5]: sess.run(b) Out[5]: array([ 4, 10, 18], dtype=int32) tf.variable is regarded as the decision variable in optimization. We should initialize variables to use tf.variable. In [6]: w = tf.variable([1, 1]) In [7]: init = tf.global_variables_initializer() sess.run(init) In [8]: sess.run(w) Out[8]: array([1, 1], dtype=int32) The value of tf.placeholder must be fed using the feed_dict optional argument to Session.run(). In [9]: x = tf.placeholder(tf.float32, [2, 2]) In [10]: sess.run(x, feed_dict={x : [[1,2],[3,4]]}) Out[10]: array([[ 1., 2.], [ 3., 4.]], dtype=float32)

ANN with TensorFlow MNIST (Mixed National Institute of Standards and Technology database) database Handwritten digit database 28 28 gray scaled image Flattened matrix into a vector of 28 28 = 784 feed a gray image to ANN

our network model min θ subject to f(θ) g i (θ) 0 θ := θ α θ ( h θ ( x ), ) (i) y (i) 4.1. Import Library In [11]: # Import Library import numpy as np import matplotlib.pyplot as plt import tensorflow as tf

4.2. Load MNIST Data Download MNIST data from tensorflow tutorial example In [12]: from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("mnist_data/", one_hot=true) Extracting MNIST_data/train-images-idx3-ubyte.gz Extracting MNIST_data/train-labels-idx1-ubyte.gz Extracting MNIST_data/t10k-images-idx3-ubyte.gz Extracting MNIST_data/t10k-labels-idx1-ubyte.gz In [13]: train_x, train_y = mnist.train.next_batch(10) img = train_x[3,:].reshape(28,28) plt.figure(figsize=(5,3)) plt.imshow(img,'gray') plt.title("label : {}".format(np.argmax(train_y[3]))) plt.xticks([]) plt.yticks([]) plt.show() One hot encoding In [14]: print ('Train labels : {}'.format(train_y[3, :])) Train labels : [ 0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]

4.3. Build a Model First, the layer performs several matrix multiplication to produce a set of linear activations y j = ( ) + y = i ω ij x i ω T x + b b j # hidden1 = tf.matmul(x, weights['hidden1']) + biases['hidden1'] hidden1 = tf.add(tf.matmul(x, weights['hidden1']), biases['hidden1']) Second, each linear activation is running through a nonlinear activation function hidden1 = tf.nn.relu(hidden1)

Third, predict values with an affine transformation # output = tf.matmul(hidden1, weights['output']) + biases['output'] output = tf.add(tf.matmul(hidden1, weights['output']), biases['output']) 4.4. Define the ANN's Shape Input size Hidden layer size The number of classes

In [15]: n_input = 28*28 n_hidden1 = 100 n_output = 10 4.5. Define Weights, Biases and Network Define parameters based on predefined layer size Initialize with normal distribution with and μ = 0 σ = 0.1 In [16]: weights = { 'hidden1' : tf.variable(tf.random_normal([n_input, n_hidden1], stddev = 0.1)), 'output' : tf.variable(tf.random_normal([n_hidden1, n_output], stddev = 0.1)), } biases = { 'hidden1' : tf.variable(tf.random_normal([n_hidden1], stddev = 0.1)), 'output' : tf.variable(tf.random_normal([n_output], stddev = 0.1)), } x = tf.placeholder(tf.float32, [None, n_input]) y = tf.placeholder(tf.float32, [None, n_output]) In [17]: # Define Network def build_model(x, weights, biases): # first hidden layer hidden1 = tf.add(tf.matmul(x, weights['hidden1']), biases['hidden1']) # non linear activate function hidden1 = tf.nn.relu(hidden1) # Output layer with linear activation output = tf.add(tf.matmul(hidden1, weights['output']), biases['output']) return output 4.6. Define Cost, Initializer and Optimizer Loss Classification: Cross entropy Equivalent to apply logistic regression Initializer Initialize all the empty variables Optimizer 1 log( ( )) + (1 ) log(1 ( )) y (i) h θ x (i) y (i) h θ x (i) N i=1 AdamOptimizer: the most popular optimizer N

In [18]: # Define Cost pred = build_model(x, weights, biases) loss = tf.nn.softmax_cross_entropy_with_logits(logits=pred, labels=y) loss = tf.reduce_mean(loss) # optimizer = tf.train.gradientdescentoptimizer(learning_rate).minimize(cost) LR = 0.0001 optm = tf.train.adamoptimizer(lr).minimize(loss) init = tf.global_variables_initializer() 4.7. Summary of Model 4.8. Define Configuration Define parameters for training ANN n_batch: batch size for stochastic gradient descent n_iter: the number of learning steps n_prt: check loss for every n_prt iteration

In [19]: n_batch = 50 n_iter = 2500 n_prt = 250 # Batch Size # Learning Iteration # Print Cycle 4.9. Optimization In [20]: # Run initialize # config = tf.configproto(allow_soft_placement=true) # GPU Allocating policy # sess = tf.session(config=config) sess = tf.session() sess.run(init) # Training cycle for epoch in range(n_iter): train_x, train_y = mnist.train.next_batch(n_batch) sess.run(optm, feed_dict={x: train_x, y: train_y}) if epoch % n_prt == 0: c = sess.run(loss, feed_dict={x : train_x, y : train_y}) print ("Iter : {}".format(epoch)) print ("Cost : {}".format(c)) Iter : 0 Cost : 2.49772047996521 Iter : 250 Cost : 1.4581751823425293 Iter : 500 Cost : 0.7719646692276001 Iter : 750 Cost : 0.5604625344276428 Iter : 1000 Cost : 0.4211314022541046 Iter : 1250 Cost : 0.5066109299659729 Iter : 1500 Cost : 0.32460322976112366 Iter : 1750 Cost : 0.412553608417511 Iter : 2000 Cost : 0.26275649666786194 Iter : 2250 Cost : 0.481355220079422 4.10. Test

In [21]: test_x, test_y = mnist.test.next_batch(100) my_pred = sess.run(pred, feed_dict={x : test_x}) my_pred = np.argmax(my_pred, axis=1) labels = np.argmax(test_y, axis=1) accr = np.mean(np.equal(my_pred, labels)) print("accuracy : {}%".format(accr*100)) Accuracy : 96.0% In [22]: test_x, test_y = mnist.test.next_batch(1) logits = sess.run(tf.nn.softmax(pred), feed_dict={x : test_x}) predict = np.argmax(logits) plt.imshow(test_x.reshape(28,28), 'gray') plt.xticks([]) plt.yticks([]) plt.show() print('prediction : {}'.format(predict)) np.set_printoptions(precision=2, suppress=true) print('probability : {}'.format(logits.ravel())) Prediction : 6 Probability : [ 0.01 0.01 0.07 0. 0.01 0.01 0.89 0. 0.01 0. ] In [23]: %%javascript $.getscript('https://kmahelona.github.io/ipython_notebook_goodies/ipython_notebook_toc. js')