AMLD Deep Learning in PyTorch. 2. PyTorch tensors

Size: px

Start display at page:

Download "AMLD Deep Learning in PyTorch. 2. PyTorch tensors"

Irene Stone
6 years ago
Views:

1 AMLD Deep Learning in PyTorch 2. PyTorch tensors François Fleuret February 10, 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE PyTorch s tensors François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 2 / 48

2 A tensor is a generalized matrix, a finite table of numerical values indexed along several discrete dimensions. A 1d tensor is a vector (e.g. a sound sample), A 2d tensor is a matrix (e.g. a grayscale image), A 3d tensor is a vector of identically sized matrix (e.g. a multi-channel image), A 4d tensor is a matrix of identically sized matrix (e.g. a sequence of multi-channel images), etc. Tensors are used to encode the signal to process, but also the internal states and parameters of the neural networks. Manipulating data through this constrained structure allows to use CPUs and GPUs at peak performance. Compounded data structures can represent more diverse data types. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 3 / 48 PyTorch is a Python library built on top of torch s THNN computational backend. Its main features are: Efficient tensor operations on CPU/GPU, automatic on-the-fly differentiation (autograd), optimizers, data I/O. Efficient tensor operations encompass both standard linear algebra and, as we will see later, deep-learning specific operations (convolution, pooling, etc.) A key specificity of PyTorch is the central role of autograd: tensor operations are specified dynamically as python operations. We will come back to this. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 4 / 48

3 >>> from torch import Tensor >>> x = Tensor (5) >>> x. size () torch. Size ([5]) >>> x. fill_ (1.125) [ torch. FloatTensor of size 5] >>> x. sum () >>> x. mean () >>> x. std () The default tensor type is torch.floattensor, but there are others with greater/lesser precision and on CPU/GPU. torch.tensor is an alias for that, and can be changed. In-place operations are suffixed with an underscore. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 5 / 48 torch.tensor.narrow creates a new tensor which is a sub-part of an existing tensor, by constraining one of the indexes. >>> a = Tensor (4, 5). zero_ () >>> a [ torch. FloatTensor of size 4x5] >>> a. narrow (1, 2, 2). fill_ (1.0) [ torch. FloatTensor of size 4x2] >>> a [ torch. FloatTensor of size 4x5] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 6 / 48

4 As in numpy, the : symbol allows to take slices of tensors >>> x = Tensor (3, 2). random_ (10) >>> x [ torch. FloatTensor of size 3x2] >>> x [0] 4 8 [ torch. FloatTensor of size 2] >>> x [0,:] 4 8 [ torch. FloatTensor of size 2] >>> x [:,0] [ torch. FloatTensor of size 3] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 7 / 48 PyTorch provides interfacing to standard linear operations, such as linear system solving or Eigen-decomposition. >>> y = Tensor (3). normal_ () >>> y [ torch. FloatTensor of size 3] >>> m = Tensor (3, 3). normal_ () >>> q, _ = torch. gels (y, m) >>> torch. mm(m, q) [ torch. FloatTensor of size 3x1] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 8 / 48

5 Example: linear regression François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 9 / 48 Given a list of points (x n, y n ) R R, n = 1,..., N, can we find the best line f (x; a, b) = ax + b going through the points, e.g. minimizing the mean square error argmin a,b 1 N N ( ) 2. axn + b y n }{{} f (x;a,b) n=1 Such a model would allow to predict the y associated to a new x, simply by calculating f (x; a, b). François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 10 / 48

6 bash > cat systolic - blood - pressure -vs - age. dat François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 11 / Systolic Blood Pressure (mmhg) Age (years) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 12 / 48

7 x 1 y 1 x 2 y 2.. x N y N }{{} data R N 2 x x x N 1.0 }{{} x R N 2 ( ) a b }{{} α R 2 1 y 1 y 2. y N }{{} y R N 1 import torch, numpy data = torch. from_numpy ( numpy. loadtxt ( systolic - blood - pressure -vs - age.dat )). float () nb = data. size (0) x, y = torch. Tensor (nb, 2), torch. Tensor (nb, 1) x [:,0] = data [:,0] x [:,1] = 1 y [:,0] = data [:,1] alpha, _ = torch. gels (y, x) a, b = alpha [0,0], alpha [1, 0] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 13 / Model Data Systolic Blood Pressure (mmhg) Age (years) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 14 / 48

8 Manipulating high-dimension signals François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 15 / 48 Data type CPU tensor GPU tensor 32-bit float torch.floattensor torch.cuda.floattensor 64-bit float torch.doubletensor torch.cuda.doubletensor 8-bit int (unsigned) torch.bytetensor torch.cuda.bytetensor 64-bit int (signed) torch.longtensor torch.cuda.longtensor >>> x = torch. LongTensor (12) >>> type (x) <class torch. LongTensor > >>> x = x. float () >>> type (x) <class torch. FloatTensor > >>> x = x. cuda () >>> type (x) <class torch. cuda. FloatTensor > Tensors of the torch.cuda types are physically in the GPU memory, and operations on them are done by the GPU. We will come back to that later. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 16 / 48

9 The default tensor type can be set with torch.set default tensor type and used through torch.tensor >>> x = torch. Tensor () >>> x [ torch. FloatTensor with no dimension ] >>> torch. set_default_tensor_type ( torch. LongTensor ) >>> x = torch. Tensor () >>> x [ torch. LongTensor with no dimension ] For concision we often start our code with from torch import Tensor and use Tensor in place of torch.tensor. Also, the new operator of a tensor allows to create one of same type >>> y = torch. ByteTensor (10) >>> u = y. new (3). fill_ (123) >>> u [ torch. ByteTensor of size 3] This is key to writing functions able to handle all the tensor types. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 17 / 48 2d tensor (e.g. grayscale image) 3d tensor (e.g. rgb image) [, ] [,, ] [, ] [,, ] [,, ] 4d tensor (e.g. sequence of rgb images) [,,, ] [,,, ] [,,, ]... [,,, ] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 18 / 48

10 Here are some examples from the vast library of tensor operations: Creation torch.tensor() torch.tensor(size) torch.tensor(sequence) torch.eye(n) torch.from numpy(ndarray) Indexing, Slicing, Joining, Mutating torch.tensor.view(*args) torch.tensor.expand(*sizes) torch.cat(inputs, dimension=0) torch.chunk(tensor, chunks, dim=0)[source] torch.index select(input, dim, index, out=none) torch.t(input, out=none) torch.transpose(input, dim0, dim1, out=none) Filling Tensor.fill (value) torch.bernoulli(input, out=none) torch.normal() François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 19 / 48 Pointwise math torch.abs(input, out=none) torch.add() torch.cos(input, out=none) torch.sigmoid(input, out=none) (+ many operators) Math reduction torch.dist(input, other, p=2, out=none) torch.mean() torch.norm() torch.std() torch.sum() BLAS and LAPACK Operations torch.eig(a, eigenvectors=false, out=none) torch.gels(b, A, out=none) torch.inverse(input, out=none) torch.mm(mat1, mat2, out=none) torch.mv(mat, vec, out=none) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 20 / 48

11 x = torch.longtensor([ [ 1, 3, 0 ], [ 2, 4, 6 ] ]) x.t() x.view(-1) x.view(3, -1) x.narrow(1, 1, 2) x.view(1, 2, 3).expand(3, 2, 3) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 21 / 48 x = torch.longtensor([ [ [ 1, 2, 1 ], [ 2, 1, 2 ] ], [ [ 3, 0, 3 ], [ 0, 3, 0 ] ] ]) x.narrow(0, 0, 1) x.narrow(2, 0, 2) x.transpose(0, 1) x.transpose(0, 2) x.transpose(1, 2) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 22 / 48

PyTorch offers simple interfaces to standard image data-bases. import import torch torchvision # Get the CIFAR10 train images, download if necessary cifar = torchvision. datasets. CIFAR10 (.

12 PyTorch offers simple interfaces to standard image data-bases. import import torch torchvision # Get the CIFAR10 train images, download if necessary cifar = torchvision. datasets. CIFAR10 (./ data / cifar10 /, train =True, download = True ) # Converts the numpy tensor into a PyTorch one x = torch. from_numpy ( cifar. train_data ). transpose (1, 3). transpose (2, 3) # Prints out some info print ( str ( type (x)), x. size (), x. min (), x. max ()) prints Files already downloaded and verified <class torch. ByteTensor > torch. Size ([50000, 3, 32, 32]) , François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 23 / 48 # Narrow to the first images, make the tensor Float, and move the # values in [-1, 1] x = x.narrow(0, 0, 48).float().div(255) # Save these samples as a single image torchvision.utils.save_image(x, images-cifar-4x12.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 24 / 48

PyTorch tensors 25 / 48 # Kill the green (1) and blue (2) channels x.narrow(1, 1, 2).fill_(-1) torchvision.

13 # Switch the row and column indexes x.transpose_(2, 3) torchvision.utils.save_image(x, images-cifar-4x12-rotated.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 25 / 48 # Kill the green (1) and blue (2) channels x.narrow(1, 1, 2).fill_(-1) torchvision.utils.save_image(x, images-cifar-4x12-rotated-and-red.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 26 / 48

14 Broadcasting François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 27 / 48 Broadcasting automagically expands dimensions of size 1 by replicating coefficients, when it is necessary to perform operations. >>> A = Tensor ([[1], [2], [3], [4]]) >>> A [ torch. FloatTensor of size 4x1] >>> B = Tensor ([[5, -5, 5, -5, 5]]) >>> B [ torch. FloatTensor of size 1x5] >>> C = A + B >>> C [ torch. FloatTensor of size 4x5] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 28 / 48

15 A = Tensor ([[1], [2], [3], [4]]) B = Tensor ([[5, -5, 5, -5, 5]]) C = A + B A B C = A + B Broadcasted François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 29 / 48 Precisely, broadcasting proceeds as follows: 1. If one of the tensors has fewer dimensions than the other, it is reshaped by adding as many dimensions of size 1 as necessary in the front; then 2. for every mismatch if one of the two sizes is one, the tensor is expanded along this axis by replicating coefficients. If there is a tensor size mismatch for one of the dimension and neither of them is one, the operation fails. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 30 / 48

16 >>> x = Tensor ([1, 2, 3, 4, 5]) >>> y = Tensor (3, 5). fill_ (2.0) >>> z = x + y >>> z [ torch. FloatTensor of size 3x5] >>> a = Tensor (3, 1, 5). fill_ (1.0) >>> b = Tensor (1, 3, 5). fill_ (2.0) >>> c = a * b + a >>> c (0,.,.) = (1,.,.) = (2,.,.) = [ torch. FloatTensor of size 3 x3x5 ] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 31 / 48 Tensor internals François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 32 / 48

17 A tensor is a view of a storage, which is a low-level 1d vector. >>> q = Tensor (2, 4). zero_ () >>> q. storage () [ torch. FloatStorage of size 8] >>> s = q. storage () >>> s [4] = 1.0 >>> s 1.0 [ torch. FloatStorage of size 8] >>> q [ torch. FloatTensor of size 2x4] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 33 / 48 Multiple tensors can share the same storage. It happens when using operations such as view(), expand() or transpose(). >>> r = q. view (2, 2, 2) >>> r (0,.,.) = (1,.,.) = [ torch. FloatTensor of size 2 x2x2 ] >>> r[1, 1, 0] = 1.0 >>> q [ torch. FloatTensor of size 2x4] >>> r. narrow (0, 1, 1). fill_ (3.0) (0,.,.) = [ torch. FloatTensor of size 1 x2x2 ] >>> q [ torch. FloatTensor of size 2x4] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 34 / 48

18 The first coefficient of a tensor is the one at storage offset() in storage(). To increment index k by 1, you have to move by stride(k) elements in the storage. >>> q = torch. arange (0, 20). storage () >>> x = torch. Tensor (). set_ (q, storage_offset = 5, size = (3, 2), stride = (4, 1)) >>> x [ torch. FloatTensor of size 3x2] offset stride(0) stride(0) q = x[0,0] x[0,1] x[1,0] x[1,1] x[2,0] x[2,1] stride(1) stride(1) stride(1) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 35 / 48 We can explicitly create different views of the same storage >>> n = torch. linspace (1, 4, 4) >>> n [ torch. FloatTensor of size 4] >>> Tensor (). set_ (n. storage (), 1, (3, 3), (0, 1)) [ torch. FloatTensor of size 3x3] >>> Tensor (). set_ (n. storage (), 1, (2, 4), (1, 0)) [ torch. FloatTensor of size 2x4] This is in particular how transpositions and broadcasting are implemented. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 36 / 48

19 This organization explains the following (maybe surprising) error >>> x = Tensor (100, 100) >>> y = x.t() >>> y. view ( -1) Traceback ( most recent call last ): File "<stdin >", line 1, in <module > RuntimeError : input is not contiguous at / home / fleuret / misc / git / pytorch / torch / lib / TH/ generic / THTensor.c :231 >>> y. stride () (1, 100) t() creates a tensor that shares the storage with the original tensor. It cannot be flattened into a 1d contiguous view without a memory copy. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 37 / 48 Using GPUs François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 38 / 48

20 The size of current state-of-the-art networks makes computation a critical issue, in particular for training and optimizing meta-parameters. Although they were historically developed for mass-market real-time CGI, their massively parallel architecture is extremely fitting to signal processing and high dimension linear algebra. Their use is instrumental in the success of deep-learning. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 39 / 48 GPU1 RAM GPU1 cores CPU RAM GPU2 RAM GPU2 cores CPU cores Disk and network A standard NVIDIA GTX 1080 has 2, 560 single-precision computing cores clocked at 1.6GHz, and deliver a peak performance of 9 TFlops. The precise structure of a GPU memory and how its cores communicate with it is a complicated topic that we will not cover here. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 40 / 48

21 TABLE 7. COMPARATIVE EXPERIMENT RESULTS (TIME PER MINI-BATCH IN SECOND) Desktop CPU (Threads used) Server CPU (Threads used) Single GPU G980 G1080 K80 Caffe CNTK FCN-S TF MXNet Torch Caffe CNTK AlexNet-S TF MXNet Torch Caffe CNTK RenNet-50 TF MXNet Torch Caffe CNTK FCN-R TF MXNet Torch Caffe CNTK AlexNet-R TF MXNet Torch Caffe CNTK RenNet-56 TF MXNet Torch Caffe CNTK LSTM TF MXNet Torch Note: The mini-batch sizes for FCN-S, AlexNet-S, ResNet-50, FCN-R, AlexNet-R, ResNet-56 and LSTM are 64, 16, 16, 1024, 1024, 128 and 128 respectively. TABLE 9. THE SIZE OF MINI-BATCH USED FOR DIFFERENT NETWORKS ON CPU PLATFORMS. TABLE 8. COMPARATIVE EXPERIMENT RESULTS François Fleuret BETWEEN SINGLE GPU AND MULTIPLEAMLD GPUS Deep Learning in PyTorch / 2. PyTorch tensors 41 / 48 Network Mini-batch size (TIME PER MINI-BATCH IN SECOND) FCN-S 64 AlexNet-S 16 ResNet FCN-R 1024 AlexNet-R 128 ResNet LSTM 256 # of GK Caffe CNTK FCN-R TF MXNet Torch Caffe CNTK AlexNet-R TF MXNet Torch Caffe CNTK RenNet-56 TF MXNet Torch Note: The mini-batch sizes for FCN-R, AlexNet-R and ResNet-56 are 4096, 1024 and 128 respectively. (Shi et al., 2016) AlexNet-S. The running time results of mini-batch size 16 with different number of threads are shown in Fig. 3. On i7-3820, Caffe surpasses the other four tools under 1, 2, 4 threads, but fails to run with 8 threads. CNTK and Torch have close performance with 1-4 threads and they also fail to execute with 8 threads. On E5-2630v3, Caffe achieves the best performance with 1, 2, 4, and 8 threads. Among all configurations, TensorFlow achieves the best performance by using 16 threads. ResNet-50. Since CNTK does not support batch nor- The current standard to program a GPU is through the CUDA ( Compute Unified Device Architecture ) model, defined by NVIDIA. Alternatives are OpenCL, backed by many CPU/GPU manufacturers, and more recently AMD s HIP ( Heterogeneous-compute Interface for Portability ). Google developed its own processor for deep learning dubbed TPU ( Tensor Processing Unit ) for in-house use. It is targeted at TensorFlow and offers excellent flops/watt performance. In practice, as of today ( ), NVIDIA hardware remains the default choice for deep learning, and CUDA is the reference framework in use. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 42 / 48

22 From a practical perspective, libraries interface the framework (e.g. PyTorch) with the computational backend (e.g. CPU or GPU) BLAS ( Basic Linear Algebra Subprograms ): vector/matrix products, and the cublas implementation for NVIDIA GPUs, LAPACK ( Linear Algebra Package ): linear system solving, Eigen-decomposition, etc. cudnn ( NVIDIA CUDA Deep Neural Network library ) computations specific to deep-learning on NVIDIA GPUs. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 43 / 48 The use of the GPUs in PyTorch is done by using the relevant tensor types. Tensors of torch.cuda types are in the GPU memory. Operations on them are done by the GPU and resulting tensors are stored in its memory. Data type CPU tensor GPU tensor 8-bit int (unsigned) torch.bytetensor torch.cuda.bytetensor 64-bit int (signed) torch.longtensor torch.cuda.longtensor 16-bit float torch.halftensor torch.cuda.halftensor 32-bit float torch.floattensor torch.cuda.floattensor 64-bit float torch.doubletensor torch.cuda.doubletensor François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 44 / 48

23 Apart from copy (), operations cannot mix different tensor types (CPU vs. GPU, or different numerical types): >>> x = torch. FloatTensor (3, 5). normal_ () >>> y = torch. cuda. FloatTensor (3, 5). normal_ () >>> x. copy_ (y) [ torch. FloatTensor of size 3x5] >>> x+y Traceback ( most recent call last ): File "<stdin >", line 1, in <module > File "/ home / fleuret / misc / anaconda3 / lib / python3.5/ site - packages / torch / tensor.py", line 293, in add return self. add ( other ) TypeError : add received an invalid combination of arguments - got ( torch. cuda. FloatTensor ), but expected one of: * ( float value ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( torch. FloatTensor other ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( torch. SparseFloatTensor other ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( float value, torch. FloatTensor other ) * ( float value, torch. SparseFloatTensor other ) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 45 / 48 Operations maintain the type of the tensors, so you generally do not need to worry about making your code generic regarding the tensor types. However, if you have to explicitly create a new tensor, the best is to use variables new() method. >>> def the_same_full_of_zeros_please (x):... return x. new (x. size ()). zero_ ()... >>> u = torch. FloatTensor (3, 5). normal_ () >>> the_same_full_of_zeros_please (u) [ torch. FloatTensor of size 3x5] >>> v = torch. cuda. DoubleTensor (5,2). fill_ (1.0) >>> the_same_full_of_zeros_please (v) [ torch. cuda. DoubleTensor of size 5x2 ( GPU 0)] Moving data between the CPU and the GPU memories is far slower than moving it inside the GPU memory. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 46 / 48

24 The method torch.cuda.is available() returns a Boolean value indicating if a GPU is available. The Tensor s method cuda() returns a clone on the GPU if the tensor is not already there or returns the tensor itself if it was already there, keeping the bit precision. Conversely the method cpu() makes a clone on the CPU if needed. They both keep the original tensor unchanged. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 47 / 48 If multiple GPUs are available, cross-gpus operations are not allowed by default, with the exception of copy (). An operation between tensors in the same GPU produces a results in the same GPU also. Each GPU has a numerical id, and torch.cuda.device(id) allows to specify where GPU tensors should be created by cuda(). An explicit GPU id can also be provided to the latter. torch.cuda.device of(obj) selects the device to that of the specified tensor or storage. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 48 / 48

25 References S. Shi, Q. Wang, P. Xu, and X. Chu. Benchmarking state-of-the-art deep learning software tools. CoRR, abs/ , François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 48 / 48

EE-559 Deep learning 1.4. Tensor basics and linear regression. François Fleuret Mon Mar 18 20:04:34 UTC 2019

EE-559 Deep learning 1.4. Tensor basics and linear regression François Fleuret https://fleuret.org/ee559/ Mon Mar 18 20:04:34 UTC 2019 A tensor is a generalized matrix, a finite table of numerical values