Theano: A Few Examples

Size: px

Start display at page:

Download "Theano: A Few Examples"

Cornelia Kelley
6 years ago
Views:

1 October 21, 2015

2 Theano in a Nutshell Python library for creating, optimizing and evaluating mathematical expressions. Designed for Machine-Learning applications by the LISA Lab in Montreal, Canada, one of the world leaders in neural network research. Particularly good with multi-dimensional arrays, the basis of most neural-net representations. Automatic derivation of gradients (derivatives), a central aspect of many core neural-net computations. Seamless exploitation of GPU s for up to 140 X performance improvements. NOT a drag-and-drop construction kit for neural nets, but a programming paradigm (that takes a little getting used to). The source of info:

3 Theano Variables Theano variables have the same Python type, TensorVariable. Each variables type slot houses its Theano type, of which there are many. Theano variables have (optional) names that serve no important purpose in Theano but are useful for the user. import theano import theano. tensor as T >>> x = T. dscalar ( x ) >>> type ( x ) <class theano. tensor. var. TensorVariable > >>> x. type TensorType ( f l o a t 6 4, s c a l a r ) >>> m = T. dmatrix ( MyMatrix ) >>> type (m) <class theano. tensor. var. TensorVariable > >>> m. type TensorType ( f l o a t 6 4, m a t r i x ) >>> m. name MyMatrix

4 Theano Functions import theano import theano. tensor as T def theg1 ( ) : w = T. dscalar ( w ) x = T. dscalar ( x ) y = T. dscalar ( y ) z = w x + y f = theano. f u n c t i o n ( [ w, x, y ], z ) return f z declared as a TensorVariable of Theano type dscalar (double-fp scalar). theano.function compiles the entire expression connecting inputs [w,x,y] to output(s), z. >>> type ( f ) <class theano. compile. function module. Function > >>> f ( 1, 2, 3 ) array ( 5. 0 )

5 A Theano Expression Graph for z name=w TensorType(float64, scalar) name=x TensorType(float64, scalar) 0 TensorType(float64, scalar) 1 TensorType(float64, scalar) name=y TensorType(float64, scalar) Elemwise{mul,no_inplace} 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) Elemwise{add,no_inplace} TensorType(float64, scalar) TensorType(float64, scalar) id=4

6 Expression Graph for Compiled Function f name=w TensorType(float64, scalar) name=x TensorType(float64, scalar) name=y TensorType(float64, scalar) 0 TensorType(float64, scalar) 1 TensorType(float64, scalar) 2 TensorType(float64, scalar) Elemwise{Composite{((i0 * i1) + i2)}} TensorType(float64, scalar) TensorType(float64, scalar)

7 Calculating Derivatives def theg2 ( ) : w = T. dscalar ( w ) x = T. dscalar ( x ) y = T. dscalar ( y ) z = 7 w x + y f = theano. f u n c t i o n ( [ w, x, y ], z ) dz = T. grad ( z, x ) g = theano. f u n c t i o n ( [ w, x, y ], dz ) return g T.grad calculates the derivative of z with respect to x and stores it in dz, a scalar theano variable. g is the function object that computes dz, given w, x and y.

8 Graphing the variable, dz val=7 TensorType(int8, scalar) name=w TensorType(float64, scalar) 0 TensorType(int8, scalar) 1 TensorType(float64, scalar) name=x TensorType(float64, scalar) Elemwise{mul,no_inplace} 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) name=y TensorType(float64, scalar) Elemwise{mul,no_inplace} id=1 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) val=1.0 TensorType(float64, scalar) Elemwise{add,no_inplace} 1 TensorType(float64, scalar) 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) Elemwise{second,no_inplace} 0 TensorType(float64, scalar) Elemwise{mul} TensorType(float64, scalar) TensorType(float64, scalar) id=9

9 Graphing the Derivative Function, g val=7.0 TensorType(float64, scalar) name=w TensorType(float64, scalar) 0 TensorType(float64, scalar) 1 TensorType(float64, scalar) Elemwise{mul,no_inplace} TensorType(float64, scalar) TensorType(float64, scalar) >>> g = theg2 ( ) >>> g (5,1,20) array ( ) # 7w = 7 5

10 Working with Vectors and Matrices def theg3 ( ) : w = T. dmatrix ( weights ) v = T. dvector ( upstream a c t i v a t i o n s ) b = T. dvector ( biases ) x = T. dot ( v,w) + b # T. dot = dot product x. name = i n t e g r a t e d s i g n a l s f = theano. f u n c t i o n ( [ v,w, b ], x ) return f >>> f = theg3 ( ) >>> v = [ 1, 1 ] >>> b = [. 5,.3] >>> w = [ [ 2, 4 ], [ 3, 5 ] ] >>> f ( v,w, b ) array ( [ 5.5, 8. 7 ] )

11 Graphing the variable, x name=upstream activations name=weights TensorType(float64, matrix) 0 1 TensorType(float64, matrix) name=biases dot 1 0 Elemwise{add,no_inplace} name=integrated signals

12 Graphing the compiled function, f name=weights TensorType(float64, matrix) TensorType(float64, matrix) InplaceDimShuffle{1,0} TensorType(float64, matrix) name=biases val=1.0 TensorType(float64, scalar) name=upstream activations name=weights.t TensorType(float64, matrix) 0 1 TensorType(float64, scalar) 4 TensorType(float64, scalar) 3 2 TensorType(float64, matrix) CGemv{no_inplace} name=integrated signals This is compressed and simplified (due to compilation) but also complicated by an additional operation, DimShuffle, on the weight matrix, where {1,0} transpose.

13 Matrix Operations with Memory Create a shared accumulator variable: s = theano.shared(...) Update as a side-effect of the function:...updates=[(s,s+x)] def theg4 ( n =10): w = T. dmatrix ( weights ) v = T. dvector ( upstream a c t i v a t i o n s ) b = T. dvector ( biases ) s = theano. shared ( np. zeros ( 2 ) ) x = T. dot ( v,w) + b x. name = i n t e g r a t e d s i g n a l s f = theano. f u n c t i o n ( [ v,w, b ], x, updates = [ ( s, s+x ) ] ) w0 = np. random. uniform (.1,.1, size = ( 2, 2 ) ) b0 = [ 1, 1 ] for i in range ( n ) : # C a l l f many times f ( [ 1 + i / n,1 i / n ], w0, b0 ) return ( f, s )

14 Expression Graph including Update name=weights TensorType(float64, matrix) TensorType(float64, matrix) InplaceDimShuffle{1,0} TensorType(float64, matrix) name=biases val=1.0 TensorType(float64, scalar) name=upstream activations name=weights.t TensorType(float64, matrix) 0 1 TensorType(float64, scalar) 4 TensorType(float64, scalar) 3 2 TensorType(float64, matrix) CGemv{no_inplace} name=integrated signals 0 1 Elemwise{Add}[(0, 0)] Note addition of the update, lower left.

15 Including an Activation Function and Error Term Use Theano s NNET module for activation functions import theano. tensor. nnet as Tann def theg5 ( t a r g e t = [ 1, 1 ] ) : w = theano. shared ( np. random. uniform (.1,.1, size = ( 2, 2 ) ) ) v = T. dvector ( V ) b = theano. shared ( np. ones ( 2 ) ) x = Tann. sigmoid ( T. dot ( v,w) + b ) w. name = w ; x. name = x e r r o r = T.sum( ( t a r g e t x ) 2) de = T. grad ( e r r o r,w) return ( x, de )

16 Expression Graph of x name=v name=w TensorType(float64, matrix) 0 1 TensorType(float64, matrix) id=3 dot 1 0 Elemwise{add,no_inplace} sigmoid name=x

17 Expression Graph of error name=v name=w TensorType(float64, matrix) 0 1 TensorType(float64, matrix) dot id=5 0 1 Elemwise{add,no_inplace} sigmoid name=x val=[1 1] TensorType(int64, vector) val=2 TensorType(int8, scalar) 1 0 TensorType(int64, vector) TensorType(int8, scalar) Elemwise{sub,no_inplace} DimShuffle{x} 0 1 TensorType(int8, (True,)) Elemwise{pow,no_inplace} Sum{acc_dtype=float64} TensorType(float64, scalar) TensorType(float64, scalar)

18 Expression Graph of d(error)/d(weight) name=w TensorType(float64, matrix) name=v 1 TensorType(float64, matrix) 0 dot id= DimShuffle{0} Elemwise{add,no_inplace} val=1.0 TensorType(float32, scalar) TensorType(float32, scalar) name=v.t sigmoid Elemwise{scalar_sigmoid} DimShuffle{x} 1 0 TensorType(float32, (True,)) DimShuffle{0,x} val=1 TensorType(int8, scalar) val=2 TensorType(int8, scalar) name=x val=[1 1] TensorType(int64, vector) Elemwise{sub} id=11 Elemwise{scalar_sigmoid} id=9 TensorType(int8, scalar) TensorType(int8, scalar) 1 0 TensorType(int64, vector) DimShuffle{x} id=3 DimShuffle{x} id=4 Elemwise{sub,no_inplace} 1 TensorType(int8, (True,)) 0 TensorType(int8, (True,)) 1 TensorType(int8, (True,)) 0 Elemwise{sub} 0 Elemwise{pow,no_inplace} 1 TensorType(int8, (True,)) Elemwise{pow} val=1.0 TensorType(float64, scalar) Sum{acc_dtype=float64} 1 TensorType(float64, scalar) 0 TensorType(float64, scalar) 1 TensorType(int8, (True,)) 0 Elemwise{second,no_inplace} TensorType(float64, scalar) 1 DimShuffle{x} id= TensorType(float64, (True,)) 1 Elemwise{second} 0 TensorType(float64, col) 0 Elemwise{mul} 0 Elemwise{mul} id=20 Elemwise{neg} 0 Elemwise{mul} id=22 0 Elemwise{mul} id=23 DimShuffle{x,0} 1 TensorType(float64, row) dot id=25 TensorType(float64, matrix) TensorType(float64, matrix)

19 An Autoencoder Neural Network Hidden-Layer Activation Pattern = Encoding of the Input Target Output = Input Output = Input, but it must pass through the hidden layer. Hidden activation pattern = compression.

20 The Autoencoder Class nb = # bits = # input nodes nh = # hidden nodes lr = learning rate def g e n a l l b i t c a s e s ( num bits ) : def b i t s ( n ) : s = bin ( n ) [ 2 : ] return [ i n t ( b ) for b in 0 ( num bits len ( s ) ) + s ] return [ b i t s ( i ) for i in range (2 num bits ) ] class autoencoder ( ) : def i n i t ( s e l f, nb=3,nh=2, l r =. 1 ) : s e l f. cases = g e n a l l b i t c a s e s ( nb ) s e l f. l r a t e = l r s e l f. b u i l d a n n ( nb, nh, l r )

21 The Core Theano Build def b u i l d a n n ( s e l f, nb, nh, l r ) : w1 = theano. shared ( np. random. uniform (.1,.1, size =(nb, nh ) ) ) w2 = theano. shared ( np. random. uniform (.1,.1, size =(nh, nb ) ) ) input = T. dvector ( i n p u t ) b1 = theano. shared ( np. random. uniform (.1,.1, size=nh ) ) b2 = theano. shared ( np. random. uniform (.1,.1, size=nb ) ) x1 = Tann. sigmoid ( T. dot ( input, w1) + b1 ) x2 = Tann. sigmoid ( T. dot ( x1, w2) + b2 ) e r r o r = T.sum( ( input x2 ) 2) params = [ w1, b1, w2, b2 ] g r a d i e n t s = T. grad ( e r r o r, params ) backprop acts = [ ( p, p s e l f. l r a t e g ) for p, g in zip ( params, g r a d i e n t s ) ] s e l f. p r e d i c t o r = theano. f u n c t i o n ( [ input ], [ x2, x1 ] ) s e l f. t r a i n e r = theano. f u n c t i o n ( [ input ], e r r o r, updates=backprop acts )

22 Training the Autoencoder def d o t r a i n i n g ( s e l f, epochs =100): e r r o r s = [ ] for i in range ( epochs ) : e r r o r = 0 for c in s e l f. cases : e r r o r += s e l f. t r a i n e r ( c ) e r r o r s. append ( e r r o r ) return e r r o r s

23 Testing the Autoencoder For this example, the main purpose of testing is to find the hidden-node activation patterns for each input case. Ideally, they should be well separated in 2-d space. def d o t e s t i n g ( s e l f ) : h i d d e n a c t i v a t i o n s = [ ] for c in s e l f. cases :, hact = s e l f. p r e d i c t o r ( c ) h i d d e n a c t i v a t i o n s. append ( hact ) return h i d d e n a c t i v a t i o n s Evolving Separation of Hidden-Layer Patterns Final Hidden-Layer Patterns

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review