TensorFlow Presentation references material from https://www.tensorflow.org/get_started/get_started and Data Science From Scratch by Joel Grus, 25, O Reilly, Ch. 8 Dan Evans
TensorFlow www.tensorflow.org An open-source software library for machine learning A system for building and training neural networks to detect and decipher patterns and correlations, analogous to (but not the same as) human learning and reasoning Used for both research and production at Google, often replacing its closed-source predecessor, DistBelief Developed by the Google Brain team for internal Google use, it was released under the Apache 2. open source license in November, 25 https://en.wikipedia.org/wiki/tensorflow
TensorFlow API s Core API is Python (also considered most flexible) Additional supported API s for C++, Java, Go Community API s for C#, Haskell, Julia, Ruby, Rust, Scala
TensorFlow Installation Before TensorFlow installation, install () Windows: Python 3.5.x or 3.6.x (TF GPU support available) (2) Mac: Python 2.7 or 3.3+ (3) Ubuntu: Python 2.7 or 3.x (TF GPU support available) Pick installation method virtualenv (2,3) native pip (,2,3) Docker (2,3) Anaconda (Python Data Sciences platform) (,2,3) Follow the simple command line install instructions on the web My Mac install used the virtualenv
Why TensorFlow? Perceptrons A perceptron is the simplest neural network It takes n inputs, computes a weighted sum, and fires if the sum is greater than or equal to p = wi+w2i2+... + wnin + bin+ outp = if x >= else (known as a step function) b, the bias, is a normalizing constant which keeps as the threshold; the input to b (in+) is always implicitly p is the dot product of the vectors [w, w2,..., wn, b][i, i2,..., in, ]
Perceptrons(2) Consider the three perceptrons pa = [2, 2, -3], po = [2, 2, -], and p~ab = [-2, 2, -] Table shows dot product and threshold output with and inputs Input [,] [,] [.] [,] pa po 2+2-3= 2+2-=3 p~ab -2+2-=- 2+-3=- 2+-= -2+-=-3 +2-3=- +2-= +2-= +-3=-3 +-=- +-=-
AND-OR Decision Space.5. [,] [,] Input 2 AND Boundary.5 OR Boundary. [,] [,] -.5 -.5..5 Input..5
Training Perceptrons Start with estimated weights and calculate the results from the training set of inputs Use the error outputs to reestimate the weights Make successive passes until the weights converge to produce the correct output for the training set A good training algorithm will converge rapidly pa Pass [,] [,] [.] [,] [,,] +-=2 [.5,.5,-] 2.5+.5-.=2 [2,2,-2.5] 3 2+2-2.5=.5 [2.5,2.5,-3] 4 2.5+2.5-3=2 +-=.5+-=.5 2+-2.5=.5 2.5+-3=-.5 +-= +.5-=.5 +2-2.5=-.5 +2.5-3=-.5 +-= +-=- +-2.5=-2.5 +-3=-3
Layers More complicated neural networks take the output of one layer of perceptrons as input to the next (hidden) layer Deep learning uses many-layered neural networks Consider exclusive-or (xor) which is true if only one of its two operands are true Logically a xor b = not (a and b) and (a or b) a xor b = not (AND) and (OR)
Layers(2) Graph a b pa po Variables Input Layer Out/In Layer 2 Out a pa po pa po p~ao p~ao p~ao b pa po p~ao output pa po p~ao
TensorFlow - Tensors The central unit of data in TensorFlow is the tensor (perceptron weights) A tensor is a set of primitive values shaped into an array of any number of dimensions. A tensor's rank is its number of dimensions [rows, columns, layers, ] 3 #rank tensor; this is a scalar with shape [] [.,2.,3.] #rank tensor - vector with shape [3] [[.,2.,3.], [4.,5.,6.]] #rank 2 tensor - matrix with shape [2,3] [[[.,2.,3.]], [[7.,8.,9.]]] #rank 3 tensor with shape [2,,3]
A [2,,3] Tensor Layers(3) 3 2 Rows(2) 7 7 8 9 Columns()
Computational Graph A series of TensorFlow operations arranged into a graph of nodes and edges Each node takes zero or more tensors as inputs and produces a tensor as an output A constant node takes no inputs, and outputs a value (tensor) it stores internally Constant Tensors with floating point values are created with the constant() method
A Two-node Computational Graph import tensorflow as tf node = tf.constant(3., dtype=tf.float32) node2 = tf.constant(4.) # also tf.float32 implicitly print(node, node2) The final print statement displays the two - dimensional nodes as objects and produces Tensor("Const:", shape=(), dtype=float32) Tensor("Const_:", shape=(), dtype=float32)
Sessions To evaluate the nodes, run the computational graph within a session A session encapsulates the control and state of the TensorFlow runtime Create a Session object and invoke its run method to evaluate the computational graph s nodes, node and node2 sess = tf.session() print(sess.run([node, node2])) When the graph is evaluated, the result is a new [2] tensor: [3., 4.]
Operations Nodes can be combined using operations, producing a new node Add the two constant nodes to produce a node (and a new graph): node3 = tf.add(node, node2) print("node3:", node3) print("sess.run(node3):", sess.run(node3)) The last two print statements produce node3: Tensor("Add:", shape=(), dtype=float32) sess.run(node3): 7.
TensorBoard TensorFlow provides a utility called TensorBoard that can display a picture of the computational graph TensorBoard visualizes the graph as:
Placeholders The node3 graph always produces a constant result, but a graph can be parameterized to accept external inputs, known as placeholders a = tf.placeholder(tf.float32) b = tf.placeholder(tf.float32) adder_node = a + b # + provides a shortcut for tf.add(a, b) adder_node acts like a function (or a lambda) which takes two input parameters (a and b) and performs an operation on them The graph can be evaluated multiple times, for example using dictionary literals to define the placeholders by name print(sess.run(adder_node, {a: 3, b: 4.5})) # a and b are [] tensors print(sess.run(adder_node, {a: [, 3], b: [2, 4]})) # a and b are [2] tensors Resulting output 7.5 [ 3. 7.]
adder_node in TensorBoard
Enhance the Graph Make a computational graph more complex by adding another operation add_and_triple = adder_node * 3. print(sess.run(add_and_triple, {a: 3, b: 4.5})) This code produces the output 22.5 Note that a and b are parameters to adder_node
Variables Variables are defined with initial values and types W = tf.variable([2,2,-3],dtype=tf.float32) W is defined but is not yet set After all global variables have been defined, get the initialization function and execute it using the run method of the session init = tf.global_variables_initializer() sess.run(init)
AND Perceptron Define the input parameter and the and node x = tf.placeholder(tf.float32) Define the and node and_node = tf.to_float(tf.less_equal(.,tf.reduce_sum(w*x,))) Compute the vector dot product of W and the parameter x (the same shape [3] as W) Compare the results element-wise to zero producing True or False Convert True or False to a float or
Evaluation The perceptron requires three inputs, the two operands and the bias input which is always Run the model with four cases print(sess.run(and_node, {x: [[,,], [,,],[,,],[,,]]})) Resulting output [....]
Training Start with estimated weights and calculate the results from a training set of inputs (inputs with known outputs) Use the errors (known as the deltas) to determine new weights that will reduce the deltas in the training set Make successive passes, modifying the weights each time until they converge to produce the correct output for the training set
Training(2) Training operates in the realm of calculus (continuous functions) where one of the most effective tools for weight convergence is the gradient If you are standing on the side of a hill (a continuous twodimensional surface, a function of latitude and longitude), the gradient is the direction of the steepest ascent (or descent) from your position Taking the gradient of the error function provides a guideline for a guess at the next set of weights
Training - Converting the Step Function The step function used in the and_node is not continuous and does not have a derivative Instead, we use the sigmoid (S-shaped) logistic function to give a fuzzy (less than.5) or (greater than.5) and_node = tf.sigmoid(tf.reduce_sum(w*x,))
Training(3) Create an error function that computes the sum of the square errors from each of the training set outputs - this is the function to be minimized during the training y=tf.placeholder(tf.float32) diff=tf.reduce_sum(tf.square(and_node - y)) Get a gradient optimizer - the parameter is the rate of movement along the gradient for each step opt=tf.train.gradientdescentoptimizer(.) Get a function from the optimizer that minimizes the error function train=opt.minimize(diff)
Train the Perceptron Assign arbitrary values to the weights, then run the training for 2 passes - x is the training set, y is the expected output of each member of the training set sess.run(tf.assign(w,[,,-])) for i in range(2): sess.run(train,{x:[[,,],[,,],[,,],[,,]],y: [,,,]}) Evaluate and display the trained weights print(sess.run(w)) [.968336.968336-3.22287] print(sess.run(and_node,{x:[[,,],[,,],[,,],[,,]]})) [.692664.24862868.24862868.46882 ]
More Complicated Neural Networks @...@ @...@ @...@ Each 5x5 digit image can provide 25 simple inputs to 26-dimensional perceptrons Each of the can provide an input to each of -dimensional preceptors in the next layer..@....@....@....@....@.. @... @...@ @...@ @... @... @...@ @...@ @...@ @...@ The output might ultimately be a -dimensional vector [,,,,,,,,,,] (e.g. a 3) (th position indicates unclassifiable) There are interesting questions about how many neurons (perceptrons) in a layer are needed and how many layers are useful. Reductions in computational requirements without compromising classification are important. @@@@..@@@@.@@@..@@@@...@..@@@@ @@@@..@@@...@@..@@@. Variations that should all be recognized as a 3
TensorFlow Conclusions Provides an extensive platform for machine learning Provides operations that match the concepts of neural networks Suppresses the multi-dimensional computational detail in a natural way Easy to install and use on either Windows, Mac, or Linux