Hands-on Lab: Deep Learning with the Theano Python Library

Size: px

Start display at page:

Download "Hands-on Lab: Deep Learning with the Theano Python Library"

Rosa Palmer
6 years ago
Views:

1 Hands-on Lab: Deep Learning with the Python Library Frédéric Bastien Montreal Institute for Learning Algorithms Université de Montréal Montréal, Canada Presentation prepared with Pierre Luc Carrier and Arnaud Bergeron GTC 2016

2 Slides PDF of the slides: github repo of this presentation 2 / 58

3 Introduction Logistic Regression Convolution 3 / 58

4 High level Python <- {NumPy/SciPy/libgpuarray} <- <- {...} Python: OO coding language Numpy: n-dimensional array object and scientic computing toolbox SciPy: sparse matrix objects and more scientic computing functionality libgpuarray: n-dimensional array object in C for CUDA and OpenCL : compiler/symbolic graph manipulation (Not a machine learning framework/software) {...}: Many libraries built on top of 1 / 58

5 What provides Lazy evaluation for performance support Symbolic dierentiation Automatic speed and stability optimization 2 / 58

6 High level Many [machine learning] library build on top of Keras blocks lasagne sklearn-theano PyMC 3 theano-rnn Morb... 3 / 58

7 Goal of the stack Fast to develop Fast to run 4 / 58

8 Some models build with Some models that have been build with. Neural Networks Convolutional Neural Networks, AlexNet, OverFeat, GoogLeNet RNN, CTC, LSTM, GRU NADE, RNADE, MADE Autoencoders Generative Adversarial Nets SVMs many variations of above models and more 5 / 58

9 Project status? Mature: has been developed and used since January 2008 (8 yrs old) Driven hundreds of research papers Good user documentation Active mailing list with worldwide participants Core technology for Silicon-Valley start-ups Many contributors (some from outside our institute) Used to teach many university classes Has been used for research at big compagnies 0.8 released 21th of March, 2016 : deeplearning.net/software/theano/ Deep Learning Tutorials: deeplearning.net/tutorial/ 6 / 58

10 Python General-purpose high-level OO interpreted language Emphasizes code readability Comprehensive standard library Dynamic type and memory management Easily extensible with C Slow execution Popular in web development and scientic communities 7 / 58

11 NumPy/SciPy NumPy provides an n-dimensional numeric array in Python Perfect for high-performance computing Slices of arrays are views (no copying) NumPy provides Elementwise computations Linear algebra, Fourier transforms Pseudorandom number generators (many distributions) SciPy provides lots more, including Sparse matrices More linear algebra Solvers and optimization algorithms Matlab-compatible I/O I/O and signal processing for images and audio 8 / 58

12 Introduction Logistic Regression Convolution 9 / 58

13 Description High-level domain-specic language for numeric computation. Syntax as close to NumPy as possible Compiles most common expressions to C for CPU and/or Limited expressivity means more opportunities for optimizations Strongly typed -> compiles to C Array oriented -> easy parallelism Support for looping and branching in expressions No subroutines -> global optimization Automatic speed and numerical stability optimizations 10 / 58

14 Description (2) Symbolic dierentiation and R op (Hessian Free Optimization) Can reuse other technologies for best performance CUDA, CuBLAS, CuDNN, BLAS, SciPy, PyCUDA, Cython, Numba,... Sparse matrices (CPU only) Extensive unit-testing and self-verication Extensible (You can create new operations as needed) Works on Linux, OS X and Windows Multi- New back-end: Float16 new back-end (need cuda 7.5) Multi dtypes Multi- support in the same process 11 / 58

15 Simple example import theano # d e c l a r e s y m b o l i c v a r i a b l e a = theano. t e n s o r. v e c t o r ( "a" ) # b u i l d s y m b o l i c e x p r e s s i o n b = a + a 10 # c o m p i l e f u n c t i o n f = theano. f u n c t i o n ( [ a ], b ) # E x e c u t e w i t h n u m e r i c a l v a l u e p r i n t f ( [ 0, 1, 2 ] ) # p r i n t s ` a r r a y ( [ 0, 2, ] ) ` 12 / 58

16 Simple example 13 / 58

17 Overview of library is many things Language Compiler Python library 14 / 58

18 Scalar math Some example of scalar operations: import theano from theano import t e n s o r as T x = T. s c a l a r ( ) y = T. s c a l a r ( ) z = x + y w = z x a = T. s q r t (w) b = T. exp ( a ) c = a b d = T. l o g ( c ) 15 / 58

19 Vector math from theano import t e n s o r as T x = T. v e c t o r ( ) y = T. v e c t o r ( ) # S c a l a r math a p p l i e d e l e m e n t w i s e a = x y # V e c t o r dot p r o d u c t b = T. dot ( x, y ) # B r o a d c a s t i n g ( a s NumPy, v e r y p o w e r f u l ) c = a + b 16 / 58

20 Matrix math from theano import t e n s o r as T x = T. m a t r i x ( ) y = T. m a t r i x ( ) a = T. v e c t o r ( ) # Matrix m a t r i x p r o d u c t b = T. dot ( x, y ) # Matrix v e c t o r p r o d u c t c = T. dot ( x, a ) 17 / 58

21 Tensors Using : Dimensionality dened by length of broadcastable argument Can add (or do other elemwise op) two tensors with same dimensionality Duplicate tensors along broadcastable axes to make size match from theano import t e n s o r as T t e n s o r 3 = T. TensorType ( b r o a d c a s t a b l e =( F a l s e, F a l s e, F a l s e ), d t y p e= ' f l o a t 3 2 ' ) x = T. t e n s o r 3 ( ) 18 / 58

22 Reductions from theano i m p o r t t e n s o r as T t e n s o r 3 = T. TensorType ( b r o a d c a s t a b l e =( F a l s e, F a l s e, F a l s e ), d t y p e= ' f l o a t 3 2 ' ) x = t e n s o r 3 ( ) t o t a l = x. sum ( ) m a r g i n a l s = x. sum ( a x i s =(0, 2 ) ) mx = x. max ( a x i s =1) 19 / 58

23 Dimshue from theano i m p o r t t e n s o r as T t e n s o r 3 = T. TensorType ( b r o a d c a s t a b l e =( F a l s e, F a l s e, F a l s e ) ) x = t e n s o r 3 ( ) y = x. d i m s h u f f l e ( ( 2, 1, 0 ) ) a = T. m a t r i x ( ) b = a. T # Same a s b c = a. d i m s h u f f l e ( ( 0, 1 ) ) # Adding t o l a r g e r t e n s o r d = a. d i m s h u f f l e ( ( 0, 1, ' x ' ) ) e = a + d 20 / 58

24 Indexing As NumPy! This mean slices and index selection return view # r e t u r n v i e w s, s u p p o r t e d on a_tensor [ i n t ] a_tensor [ int, i n t ] a_tensor [ s t a r t : s t o p : step, s t a r t : s t o p : s t e p ] a_tensor [ : : 1 ] # r e v e r s e t h e f i r s t d i m e n s i o n # Advanced i n d e x i n g, r e t u r n copy a_tensor [ an_index_vector ] # S u p p o r t e d on a_tensor [ an_index_vector, an_index_vector ] a_tensor [ int, an_index_vector ] a_tensor [ an_index_tensor,... ] 21 / 58

25 Compiling and running expression theano.function shared variables and updates compilation modes 22 / 58

26 theano.function >>> from theano import t e n s o r as T >>> x = T. s c a l a r ( ) >>> y = T. s c a l a r ( ) >>> from theano import f u n c t i o n >>> # f i r s t a r g i s l i s t o f SYMBOLIC i n p u t s >>> # s e c o n d a r g i s SYMBOLIC o u t p u t >>> f = f u n c t i o n ( [ x, y ], x + y ) >>> # C a l l i t w i t h NUMERICAL v a l u e s >>> # Get a NUMERICAL o u t p u t >>> f ( 1., 2. ) a r r a y ( 3. 0 ) 23 / 58

27 Shared variables It's hard to do much with purely functional programming shared variables add just a little bit of imperative programming A shared variable is a buer that stores a numerical value for a variable Can write to as many shared variables as you want, once each, at the end of the function Can modify value outside of function with get_value() and set_value() methods. 24 / 58

28 Shared variable example >>> from theano import s h a r e d >>> x = s h a r e d ( 0. ) >>> u p d a t e s = [ ( x, x + 1 ) ] >>> f = f u n c t i o n ( [ ], u p d a t e s=u p d a t e s ) >>> f ( ) >>> x. g e t _ v a l u e ( ) 1. 0 >>> x. s e t _ v a l u e ( ) >>> f ( ) >>> x. g e t _ v a l u e ( ) / 58

29 Compilation modes Can compile in dierent modes to get dierent kinds of programs Can specify these modes very precisely with arguments to theano.function Can use a few quick presets with environment variable ags 26 / 58

30 Example preset compilation modes FAST_RUN: default. Fastest execution, slowest compilation FAST_COMPILE: Fastest compilation, slowest execution. No C code. DEBUG_MODE: Adds lots of checks. Raises error messages in situations other modes regard as ne. optimizer=fast_compile: as mode=fast_compile, but with C code. theano.function(..., mode=fast_compile) THEANO_FLAGS=mode=FAST_COMPILE python script.py 27 / 58

31 There are macro that automatically build bigger graph for you. theano.grad Others Those functions can get called many times, for example to get the 2nd derivative. 28 / 58

32 The grad method >>> x = T. s c a l a r ( ' x ' ) >>> y = 2. x >>> g = T. grad ( y, x ) # P r i n t t h e not o p t i m i z e d g r a p h >>> theano. p r i n t i n g. p y d o t p r i n t ( g ) 29 / 58

33 The grad method >>> x = T. s c a l a r ( ' x ' ) >>> y = 2. x >>> g = T. grad ( y, x ) # P r i n t t h e o p t i m i z e d g r a p h >>> f = theano. f u n c t i o n ( [ x ], g ) >>> theano. p r i n t i n g. p y d o t p r i n t ( f ) 30 / 58

34 Others R_op, L_op for Hessian Free Optimization hessian jacobian clone the graph with replacement you can navigate the graph if you need (go from the result of computation to its input, recursively) 31 / 58

35 Enabling 's current back-end only supports 32 bit on libgpuarray (new-backend) supports all dtype CUDA supports oat64, but it is slow on gamer s 32 / 58

36 : ags ags allow to congure. Can be set via a conguration le or an environment variable. To enable : Set device=gpu (or a specic gpu, like gpu0) Set oatx=oat32 Optional: warn_oat64={'ignore', 'warn', 'raise', 'pdb'} Instead of ags, user can call theano.sandbox.cuda.cuda('gpu0') 33 / 58

37 oatx Allow to change the dtype between oat32 and oat64. T.fscalar, T.fvector, T.fmatrix are all 32 bit T.dscalar, T.dvector, T.dmatrix are all 64 bit T.scalar, T.vector, T.matrix resolve to oatx oatx is oat64 by default, set it to oat32 for 34 / 58

38 CuDNN V3 and V4 is supported. It is enabled automatically if available. ag to get an error if can't be used: dnn.enabled=true ag to disable it: dnn.enabled=false 35 / 58

39 DebugMode, NanGuardMode Error message theano.printing.debugprint 36 / 58

40 Error message: code import numpy a s np import theano import theano. t e n s o r as T x = T. v e c t o r ( ) y = T. v e c t o r ( ) z = x + x z = z + y f = theano. f u n c t i o n ( [ x, y ], z ) f ( np. ones ( ( 2, ) ), np. ones ( ( 3, ) ) ) 37 / 58

41 Error message: 1st part T r a c e b a c k ( most r e c e n t c a l l l a s t ) : [... ] V a l u e E r r o r : I n p u t d i m e n s i o n mis match. ( i n p u t [ 0 ]. s h a p e [ 0 ] = 3, i n p u t [ 1 ]. s h a p e [ 0 ] = 2 ) A p p l y node t h a t c a u s e d t h e e r r o r : E l e m w i s e { add, n o _ i n p l a c e }(< TensorType ( f l o a t 6 4, v e c t o r ) >, <TensorType ( f l o a t 6 4, v e c t o r ) >, <TensorType ( f l o a t 6 4, v e c t o r )>) I n p u t s t y p e s : [ TensorType ( f l o a t 6 4, v e c t o r ), TensorType ( f l o a t 6 4, v e c t o r ), TensorType ( f l o a t 6 4, v e c t o r ) ] I n p u t s s h a p e s : [ ( 3, ), ( 2, ), ( 2, ) ] I n p u t s s t r i d e s : [ ( 8, ), ( 8, ), ( 8, ) ] I n p u t s v a l u e s : [ a r r a y ( [ 1., 1., 1. ] ), a r r a y ( [ 1., 1. ] ), a r r a y ( [ 1., 1. ] ) ] O u t p u t s c l i e n t s : [ [ ' o u t p u t ' ] ] 38 / 58

42 Error message: 2st part HINT: Re-running with most optimization disabled could give you a back-traces when this node was created. This can be done with by setting the ags optimizer=fast_compile. If that does not work, optimizations can be disabled with optimizer=none. HINT: Use the ag exception_verbosity=high for a debugprint of this apply node. 39 / 58

43 Error message: traceback Traceback ( most r e c e n t c a l l l a s t ) : F i l e " t e s t. py ", l i n e 9, i n <module> f ( np. o n e s ( ( 2, ) ), np. o n e s ( ( 3, ) ) ) F i l e "/u/ b a s t i e n f / r e p o s / t h e a n o / c o m p i l e / f u n c t i o n _ m o d u l e. py ", l i n e 589, i n call s e l f. f n. t h u n k s [ s e l f. f n. p o s i t i o n _ o f _ e r r o r ] ) F i l e "/u/ b a s t i e n f / r e p o s / t h e a n o / c o m p i l e / f u n c t i o n _ m o d u l e. py ", l i n e 579, i n call o u t p u t s = s e l f. f n ( ) 40 / 58

44 Error message: optimizer=fast_compile B a c k t r a c e when t h e node i s c r e a t e d : F i l e " t e s t. py ", l i n e 7, in <module> z = z + y 41 / 58

45 debugprint >>> from theano. p r i n t i n g import d e b u g p r i n t >>> d e b u g p r i n t ( a ) Elemwise {mul, n o _ i n p l a c e } [ id A ] ' ' T e n s o r C o n s t a n t { 2. 0 } [ id B ] Elemwise {add, n o _ i n p l a c e } [ id C ] ' z ' < TensorType ( f l o a t 6 4, s c a l a r )> [ id D] < TensorType ( f l o a t 6 4, s c a l a r )> [ id E ] 42 / 58

46 Logistic Regression Convolution Introduction Logistic Regression Convolution 43 / 58

47 Logistic Regression Convolution Inputs # Load from d i s k and put i n s h a r e d v a r i a b l e. d a t a s e t s = load_data ( d a t a s e t ) train_set_x, t r a i n _ s e t _ y = d a t a s e t s [ 0 ] valid_set_x, v a l i d _ s e t _ y = d a t a s e t s [ 1 ] # a l l o c a t e s y m b o l i c v a r i a b l e s f o r t h e d a t a i n d e x = T. l s c a l a r ( ) # i n d e x t o a [ m i n i ] b a t c h # g e n e r a t e s y m b o l i c v a r i a b l e s f o r i n p u t m i n i b a t c h x = T. m a t r i x ( ' x ' ) # data, 1 row p e r image y = T. i v e c t o r ( ' y ' ) # l a b e l s 44 / 58

48 Logistic Regression Convolution Model n_in = n_out = 10 # w e i g h t s W = theano. s h a r e d ( numpy. z e r o s ( ( n_in, n_out ), d t y p e=theano. c o n f i g. f l o a t X ) ) # b i a s b = theano. s h a r e d ( numpy. z e r o s ( ( n_out, ), d t y p e=theano. c o n f i g. f l o a t X ) ) 45 / 58

49 Logistic Regression Convolution Computation # t h e f o r w a r d p a s s p_y_given_x = T. nnet. softmax (T. dot ( input, W) + b ) # c o s t we m i n i m i z e : t h e n e g a t i v e l o g l i k e l i h o o d l = T. l o g ( p_y_given_x ) c o s t = T. mean ( l [T. a r a n g e ( y. shape [ 0 ] ), y ] ) # t h e e r r o r y_pred = T. argmax ( p_y_given_x, a x i s =1) e r r = T. mean (T. neq ( y_pred, y ) ) 46 / 58

50 Logistic Regression Convolution Gradient and updates # compute t h e g r a d i e n t o f c o s t g_w, g_b = T. grad ( c o s t=c o s t, wrt=(w, b ) ) # model p a r a m e t e r s u p d a t e s r u l e s u p d a t e s = [ (W, W l e a r n i n g _ r a t e g_w), ( b, b l e a r n i n g _ r a t e g_b ) ] 47 / 58

51 Logistic Regression Convolution Training function # c o m p i l e a f u n c t i o n t h a t t r a i n t h e model train_model = theano. f u n c t i o n ( i n p u t s =[ i n d e x ], o u t p u t s =( c o s t, e r r ), u p d a t e s=updates, g i v e n s ={ x : t r a i n _ s e t _ x [ i n d e x b a t c h _ s i z e : ( i n d e x + 1) b a t c h _ s i z e ], y : t r a i n _ s e t _ y [ i n d e x b a t c h _ s i z e : ( i n d e x + 1) b a t c h _ s i z e ] } ) 48 / 58

52 Logistic Regression Convolution Introduction Logistic Regression Convolution 49 / 58

53 Logistic Regression Convolution Inputs # Load from d i s k and put i n s h a r e d v a r i a b l e. d a t a s e t s = load_data ( d a t a s e t ) train_set_x, t r a i n _ s e t _ y = d a t a s e t s [ 0 ] valid_set_x, v a l i d _ s e t _ y = d a t a s e t s [ 1 ] # a l l o c a t e s y m b o l i c v a r i a b l e s f o r t h e d a t a i n d e x = T. l s c a l a r ( ) # i n d e x t o a [ m i n i ] b a t c h x = T. m a t r i x ( ' x ' ) # t h e data, 1 row p e r image y = T. i v e c t o r ( ' y ' ) # l a b e l s # Reshape m a t r i x o f s h a p e ( b a t c h _ s i z e, 28 28) # t o a 4D t e n s o r, c o m p a t i b l e f o r c o n v o l u t i o n l a y e r 0 _ i n p u t = x. r e s h a p e ( ( b a t c h _ size, 1, 28, 2 8 ) ) 50 / 58

54 Logistic Regression Convolution Model image_shape=( batch_size, 1, 28, 28) f i l t e r _ s h a p e =( n k e r n s [ 0 ], 1, 5, 5) W_bound =... W = theano. s h a r e d ( numpy. a s a r r a y ( rng. u n i f o r m ( low= W_bound, h i g h=w_bound, s i z e=f i l t e r _ s h a p e ), d t y p e=theano. c o n f i g. f l o a t X ) ) # t h e b i a s i s a 1D t e n s o r # one b i a s p e r o u t p u t f e a t u r e map b_values = numpy. z e r o s ( ( f i l t e r _ s h a p e [ 0 ], ), d t y p e =... b = theano. s h a r e d ( b_values ) 51 / 58

55 Logistic Regression Convolution Computation # c o n v o l v e i n p u t f e a t u r e maps w i t h f i l t e r s conv_out = conv. conv2d ( input=x, f i l t e r s =W) # downsample each f e a t u r e map i n d i v i d u a l l y, # u s i n g m a x p o o l i n g pooled_out = downsample. max_pool_2d ( input=conv_out, ds =(2, 2 ), // p o o l s i z e i g n o r e _ b o r d e r=true ) o u t p u t = T. tanh ( pooled_out + b. d i m s h u f f l e ( ' x ', 0, ' x ', ' x ' ) ) 52 / 58

56 Introduction Logistic Regression Convolution 53 / 58

57 ipython notebook Introduction ( only exercises) lenet (small CNN model to quickly try it) 54 / 58

58 Connection instructions Navigate to nvlabs.qwiklab.com Login or create a new account Select the Instructor-Led Hands-on Labs class Find the lab called and click Start After a short wait, lab instance connection information will be shown Please ask Lab Assistants for help! 55 / 58

59 Further hands-on training Check out the Self-Paced labs at the conference: Deep Learning, CUDA, OpenACC, Tools and more! Just grab a seat and take any available lab Located in the lower-level outside of LL20C You will also receive Credits to take additional labs at nvidia.qwiklab.com Log in using the same account you used in this lab 56 / 58

60 Where to learn more GTC2016 sessions: S at a Glance: A Framework for Machine Learning Deep Learning Tutorials with : deeplearning.net/tutorial tutorial: deeplearning.net/software/tutorial website: deeplearning.net/software You can also see frameworks on top of like Blocks, Keras, Lasagne, / 58

61 Questions, acknowledgments Questions? Acknowledgments All people working or having worked at the LISA lab/mila institute All users/contributors Compute Canada, RQCHP, NSERC, NVIDIA, and Canada Research Chairs for providing funds, access to computing resources, hardware or libraries. 58 / 58

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review