AMLD Deep Learning in PyTorch. 2. PyTorch tensors

Size: px
Start display at page:

Download "AMLD Deep Learning in PyTorch. 2. PyTorch tensors"

Transcription

1 AMLD Deep Learning in PyTorch 2. PyTorch tensors François Fleuret February 10, 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE PyTorch s tensors François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 2 / 48

2 A tensor is a generalized matrix, a finite table of numerical values indexed along several discrete dimensions. A 1d tensor is a vector (e.g. a sound sample), A 2d tensor is a matrix (e.g. a grayscale image), A 3d tensor is a vector of identically sized matrix (e.g. a multi-channel image), A 4d tensor is a matrix of identically sized matrix (e.g. a sequence of multi-channel images), etc. Tensors are used to encode the signal to process, but also the internal states and parameters of the neural networks. Manipulating data through this constrained structure allows to use CPUs and GPUs at peak performance. Compounded data structures can represent more diverse data types. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 3 / 48 PyTorch is a Python library built on top of torch s THNN computational backend. Its main features are: Efficient tensor operations on CPU/GPU, automatic on-the-fly differentiation (autograd), optimizers, data I/O. Efficient tensor operations encompass both standard linear algebra and, as we will see later, deep-learning specific operations (convolution, pooling, etc.) A key specificity of PyTorch is the central role of autograd: tensor operations are specified dynamically as python operations. We will come back to this. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 4 / 48

3 >>> from torch import Tensor >>> x = Tensor (5) >>> x. size () torch. Size ([5]) >>> x. fill_ (1.125) [ torch. FloatTensor of size 5] >>> x. sum () >>> x. mean () >>> x. std () The default tensor type is torch.floattensor, but there are others with greater/lesser precision and on CPU/GPU. torch.tensor is an alias for that, and can be changed. In-place operations are suffixed with an underscore. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 5 / 48 torch.tensor.narrow creates a new tensor which is a sub-part of an existing tensor, by constraining one of the indexes. >>> a = Tensor (4, 5). zero_ () >>> a [ torch. FloatTensor of size 4x5] >>> a. narrow (1, 2, 2). fill_ (1.0) [ torch. FloatTensor of size 4x2] >>> a [ torch. FloatTensor of size 4x5] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 6 / 48

4 As in numpy, the : symbol allows to take slices of tensors >>> x = Tensor (3, 2). random_ (10) >>> x [ torch. FloatTensor of size 3x2] >>> x [0] 4 8 [ torch. FloatTensor of size 2] >>> x [0,:] 4 8 [ torch. FloatTensor of size 2] >>> x [:,0] [ torch. FloatTensor of size 3] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 7 / 48 PyTorch provides interfacing to standard linear operations, such as linear system solving or Eigen-decomposition. >>> y = Tensor (3). normal_ () >>> y [ torch. FloatTensor of size 3] >>> m = Tensor (3, 3). normal_ () >>> q, _ = torch. gels (y, m) >>> torch. mm(m, q) [ torch. FloatTensor of size 3x1] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 8 / 48

5 Example: linear regression François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 9 / 48 Given a list of points (x n, y n ) R R, n = 1,..., N, can we find the best line f (x; a, b) = ax + b going through the points, e.g. minimizing the mean square error argmin a,b 1 N N ( ) 2. axn + b y n }{{} f (x;a,b) n=1 Such a model would allow to predict the y associated to a new x, simply by calculating f (x; a, b). François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 10 / 48

6 bash > cat systolic - blood - pressure -vs - age. dat François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 11 / Systolic Blood Pressure (mmhg) Age (years) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 12 / 48

7 x 1 y 1 x 2 y 2.. x N y N }{{} data R N 2 x x x N 1.0 }{{} x R N 2 ( ) a b }{{} α R 2 1 y 1 y 2. y N }{{} y R N 1 import torch, numpy data = torch. from_numpy ( numpy. loadtxt ( systolic - blood - pressure -vs - age.dat )). float () nb = data. size (0) x, y = torch. Tensor (nb, 2), torch. Tensor (nb, 1) x [:,0] = data [:,0] x [:,1] = 1 y [:,0] = data [:,1] alpha, _ = torch. gels (y, x) a, b = alpha [0,0], alpha [1, 0] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 13 / Model Data Systolic Blood Pressure (mmhg) Age (years) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 14 / 48

8 Manipulating high-dimension signals François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 15 / 48 Data type CPU tensor GPU tensor 32-bit float torch.floattensor torch.cuda.floattensor 64-bit float torch.doubletensor torch.cuda.doubletensor 8-bit int (unsigned) torch.bytetensor torch.cuda.bytetensor 64-bit int (signed) torch.longtensor torch.cuda.longtensor >>> x = torch. LongTensor (12) >>> type (x) <class torch. LongTensor > >>> x = x. float () >>> type (x) <class torch. FloatTensor > >>> x = x. cuda () >>> type (x) <class torch. cuda. FloatTensor > Tensors of the torch.cuda types are physically in the GPU memory, and operations on them are done by the GPU. We will come back to that later. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 16 / 48

9 The default tensor type can be set with torch.set default tensor type and used through torch.tensor >>> x = torch. Tensor () >>> x [ torch. FloatTensor with no dimension ] >>> torch. set_default_tensor_type ( torch. LongTensor ) >>> x = torch. Tensor () >>> x [ torch. LongTensor with no dimension ] For concision we often start our code with from torch import Tensor and use Tensor in place of torch.tensor. Also, the new operator of a tensor allows to create one of same type >>> y = torch. ByteTensor (10) >>> u = y. new (3). fill_ (123) >>> u [ torch. ByteTensor of size 3] This is key to writing functions able to handle all the tensor types. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 17 / 48 2d tensor (e.g. grayscale image) 3d tensor (e.g. rgb image) [, ] [,, ] [, ] [,, ] [,, ] 4d tensor (e.g. sequence of rgb images) [,,, ] [,,, ] [,,, ]... [,,, ] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 18 / 48

10 Here are some examples from the vast library of tensor operations: Creation torch.tensor() torch.tensor(size) torch.tensor(sequence) torch.eye(n) torch.from numpy(ndarray) Indexing, Slicing, Joining, Mutating torch.tensor.view(*args) torch.tensor.expand(*sizes) torch.cat(inputs, dimension=0) torch.chunk(tensor, chunks, dim=0)[source] torch.index select(input, dim, index, out=none) torch.t(input, out=none) torch.transpose(input, dim0, dim1, out=none) Filling Tensor.fill (value) torch.bernoulli(input, out=none) torch.normal() François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 19 / 48 Pointwise math torch.abs(input, out=none) torch.add() torch.cos(input, out=none) torch.sigmoid(input, out=none) (+ many operators) Math reduction torch.dist(input, other, p=2, out=none) torch.mean() torch.norm() torch.std() torch.sum() BLAS and LAPACK Operations torch.eig(a, eigenvectors=false, out=none) torch.gels(b, A, out=none) torch.inverse(input, out=none) torch.mm(mat1, mat2, out=none) torch.mv(mat, vec, out=none) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 20 / 48

11 x = torch.longtensor([ [ 1, 3, 0 ], [ 2, 4, 6 ] ]) x.t() x.view(-1) x.view(3, -1) x.narrow(1, 1, 2) x.view(1, 2, 3).expand(3, 2, 3) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 21 / 48 x = torch.longtensor([ [ [ 1, 2, 1 ], [ 2, 1, 2 ] ], [ [ 3, 0, 3 ], [ 0, 3, 0 ] ] ]) x.narrow(0, 0, 1) x.narrow(2, 0, 2) x.transpose(0, 1) x.transpose(0, 2) x.transpose(1, 2) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 22 / 48

12 PyTorch offers simple interfaces to standard image data-bases. import import torch torchvision # Get the CIFAR10 train images, download if necessary cifar = torchvision. datasets. CIFAR10 (./ data / cifar10 /, train =True, download = True ) # Converts the numpy tensor into a PyTorch one x = torch. from_numpy ( cifar. train_data ). transpose (1, 3). transpose (2, 3) # Prints out some info print ( str ( type (x)), x. size (), x. min (), x. max ()) prints Files already downloaded and verified <class torch. ByteTensor > torch. Size ([50000, 3, 32, 32]) , François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 23 / 48 # Narrow to the first images, make the tensor Float, and move the # values in [-1, 1] x = x.narrow(0, 0, 48).float().div(255) # Save these samples as a single image torchvision.utils.save_image(x, images-cifar-4x12.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 24 / 48

13 # Switch the row and column indexes x.transpose_(2, 3) torchvision.utils.save_image(x, images-cifar-4x12-rotated.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 25 / 48 # Kill the green (1) and blue (2) channels x.narrow(1, 1, 2).fill_(-1) torchvision.utils.save_image(x, images-cifar-4x12-rotated-and-red.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 26 / 48

14 Broadcasting François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 27 / 48 Broadcasting automagically expands dimensions of size 1 by replicating coefficients, when it is necessary to perform operations. >>> A = Tensor ([[1], [2], [3], [4]]) >>> A [ torch. FloatTensor of size 4x1] >>> B = Tensor ([[5, -5, 5, -5, 5]]) >>> B [ torch. FloatTensor of size 1x5] >>> C = A + B >>> C [ torch. FloatTensor of size 4x5] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 28 / 48

15 A = Tensor ([[1], [2], [3], [4]]) B = Tensor ([[5, -5, 5, -5, 5]]) C = A + B A B C = A + B Broadcasted François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 29 / 48 Precisely, broadcasting proceeds as follows: 1. If one of the tensors has fewer dimensions than the other, it is reshaped by adding as many dimensions of size 1 as necessary in the front; then 2. for every mismatch if one of the two sizes is one, the tensor is expanded along this axis by replicating coefficients. If there is a tensor size mismatch for one of the dimension and neither of them is one, the operation fails. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 30 / 48

16 >>> x = Tensor ([1, 2, 3, 4, 5]) >>> y = Tensor (3, 5). fill_ (2.0) >>> z = x + y >>> z [ torch. FloatTensor of size 3x5] >>> a = Tensor (3, 1, 5). fill_ (1.0) >>> b = Tensor (1, 3, 5). fill_ (2.0) >>> c = a * b + a >>> c (0,.,.) = (1,.,.) = (2,.,.) = [ torch. FloatTensor of size 3 x3x5 ] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 31 / 48 Tensor internals François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 32 / 48

17 A tensor is a view of a storage, which is a low-level 1d vector. >>> q = Tensor (2, 4). zero_ () >>> q. storage () [ torch. FloatStorage of size 8] >>> s = q. storage () >>> s [4] = 1.0 >>> s 1.0 [ torch. FloatStorage of size 8] >>> q [ torch. FloatTensor of size 2x4] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 33 / 48 Multiple tensors can share the same storage. It happens when using operations such as view(), expand() or transpose(). >>> r = q. view (2, 2, 2) >>> r (0,.,.) = (1,.,.) = [ torch. FloatTensor of size 2 x2x2 ] >>> r[1, 1, 0] = 1.0 >>> q [ torch. FloatTensor of size 2x4] >>> r. narrow (0, 1, 1). fill_ (3.0) (0,.,.) = [ torch. FloatTensor of size 1 x2x2 ] >>> q [ torch. FloatTensor of size 2x4] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 34 / 48

18 The first coefficient of a tensor is the one at storage offset() in storage(). To increment index k by 1, you have to move by stride(k) elements in the storage. >>> q = torch. arange (0, 20). storage () >>> x = torch. Tensor (). set_ (q, storage_offset = 5, size = (3, 2), stride = (4, 1)) >>> x [ torch. FloatTensor of size 3x2] offset stride(0) stride(0) q = x[0,0] x[0,1] x[1,0] x[1,1] x[2,0] x[2,1] stride(1) stride(1) stride(1) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 35 / 48 We can explicitly create different views of the same storage >>> n = torch. linspace (1, 4, 4) >>> n [ torch. FloatTensor of size 4] >>> Tensor (). set_ (n. storage (), 1, (3, 3), (0, 1)) [ torch. FloatTensor of size 3x3] >>> Tensor (). set_ (n. storage (), 1, (2, 4), (1, 0)) [ torch. FloatTensor of size 2x4] This is in particular how transpositions and broadcasting are implemented. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 36 / 48

19 This organization explains the following (maybe surprising) error >>> x = Tensor (100, 100) >>> y = x.t() >>> y. view ( -1) Traceback ( most recent call last ): File "<stdin >", line 1, in <module > RuntimeError : input is not contiguous at / home / fleuret / misc / git / pytorch / torch / lib / TH/ generic / THTensor.c :231 >>> y. stride () (1, 100) t() creates a tensor that shares the storage with the original tensor. It cannot be flattened into a 1d contiguous view without a memory copy. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 37 / 48 Using GPUs François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 38 / 48

20 The size of current state-of-the-art networks makes computation a critical issue, in particular for training and optimizing meta-parameters. Although they were historically developed for mass-market real-time CGI, their massively parallel architecture is extremely fitting to signal processing and high dimension linear algebra. Their use is instrumental in the success of deep-learning. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 39 / 48 GPU1 RAM GPU1 cores CPU RAM GPU2 RAM GPU2 cores CPU cores Disk and network A standard NVIDIA GTX 1080 has 2, 560 single-precision computing cores clocked at 1.6GHz, and deliver a peak performance of 9 TFlops. The precise structure of a GPU memory and how its cores communicate with it is a complicated topic that we will not cover here. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 40 / 48

21 TABLE 7. COMPARATIVE EXPERIMENT RESULTS (TIME PER MINI-BATCH IN SECOND) Desktop CPU (Threads used) Server CPU (Threads used) Single GPU G980 G1080 K80 Caffe CNTK FCN-S TF MXNet Torch Caffe CNTK AlexNet-S TF MXNet Torch Caffe CNTK RenNet-50 TF MXNet Torch Caffe CNTK FCN-R TF MXNet Torch Caffe CNTK AlexNet-R TF MXNet Torch Caffe CNTK RenNet-56 TF MXNet Torch Caffe CNTK LSTM TF MXNet Torch Note: The mini-batch sizes for FCN-S, AlexNet-S, ResNet-50, FCN-R, AlexNet-R, ResNet-56 and LSTM are 64, 16, 16, 1024, 1024, 128 and 128 respectively. TABLE 9. THE SIZE OF MINI-BATCH USED FOR DIFFERENT NETWORKS ON CPU PLATFORMS. TABLE 8. COMPARATIVE EXPERIMENT RESULTS François Fleuret BETWEEN SINGLE GPU AND MULTIPLEAMLD GPUS Deep Learning in PyTorch / 2. PyTorch tensors 41 / 48 Network Mini-batch size (TIME PER MINI-BATCH IN SECOND) FCN-S 64 AlexNet-S 16 ResNet FCN-R 1024 AlexNet-R 128 ResNet LSTM 256 # of GK Caffe CNTK FCN-R TF MXNet Torch Caffe CNTK AlexNet-R TF MXNet Torch Caffe CNTK RenNet-56 TF MXNet Torch Note: The mini-batch sizes for FCN-R, AlexNet-R and ResNet-56 are 4096, 1024 and 128 respectively. (Shi et al., 2016) AlexNet-S. The running time results of mini-batch size 16 with different number of threads are shown in Fig. 3. On i7-3820, Caffe surpasses the other four tools under 1, 2, 4 threads, but fails to run with 8 threads. CNTK and Torch have close performance with 1-4 threads and they also fail to execute with 8 threads. On E5-2630v3, Caffe achieves the best performance with 1, 2, 4, and 8 threads. Among all configurations, TensorFlow achieves the best performance by using 16 threads. ResNet-50. Since CNTK does not support batch nor- The current standard to program a GPU is through the CUDA ( Compute Unified Device Architecture ) model, defined by NVIDIA. Alternatives are OpenCL, backed by many CPU/GPU manufacturers, and more recently AMD s HIP ( Heterogeneous-compute Interface for Portability ). Google developed its own processor for deep learning dubbed TPU ( Tensor Processing Unit ) for in-house use. It is targeted at TensorFlow and offers excellent flops/watt performance. In practice, as of today ( ), NVIDIA hardware remains the default choice for deep learning, and CUDA is the reference framework in use. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 42 / 48

22 From a practical perspective, libraries interface the framework (e.g. PyTorch) with the computational backend (e.g. CPU or GPU) BLAS ( Basic Linear Algebra Subprograms ): vector/matrix products, and the cublas implementation for NVIDIA GPUs, LAPACK ( Linear Algebra Package ): linear system solving, Eigen-decomposition, etc. cudnn ( NVIDIA CUDA Deep Neural Network library ) computations specific to deep-learning on NVIDIA GPUs. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 43 / 48 The use of the GPUs in PyTorch is done by using the relevant tensor types. Tensors of torch.cuda types are in the GPU memory. Operations on them are done by the GPU and resulting tensors are stored in its memory. Data type CPU tensor GPU tensor 8-bit int (unsigned) torch.bytetensor torch.cuda.bytetensor 64-bit int (signed) torch.longtensor torch.cuda.longtensor 16-bit float torch.halftensor torch.cuda.halftensor 32-bit float torch.floattensor torch.cuda.floattensor 64-bit float torch.doubletensor torch.cuda.doubletensor François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 44 / 48

23 Apart from copy (), operations cannot mix different tensor types (CPU vs. GPU, or different numerical types): >>> x = torch. FloatTensor (3, 5). normal_ () >>> y = torch. cuda. FloatTensor (3, 5). normal_ () >>> x. copy_ (y) [ torch. FloatTensor of size 3x5] >>> x+y Traceback ( most recent call last ): File "<stdin >", line 1, in <module > File "/ home / fleuret / misc / anaconda3 / lib / python3.5/ site - packages / torch / tensor.py", line 293, in add return self. add ( other ) TypeError : add received an invalid combination of arguments - got ( torch. cuda. FloatTensor ), but expected one of: * ( float value ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( torch. FloatTensor other ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( torch. SparseFloatTensor other ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( float value, torch. FloatTensor other ) * ( float value, torch. SparseFloatTensor other ) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 45 / 48 Operations maintain the type of the tensors, so you generally do not need to worry about making your code generic regarding the tensor types. However, if you have to explicitly create a new tensor, the best is to use variables new() method. >>> def the_same_full_of_zeros_please (x):... return x. new (x. size ()). zero_ ()... >>> u = torch. FloatTensor (3, 5). normal_ () >>> the_same_full_of_zeros_please (u) [ torch. FloatTensor of size 3x5] >>> v = torch. cuda. DoubleTensor (5,2). fill_ (1.0) >>> the_same_full_of_zeros_please (v) [ torch. cuda. DoubleTensor of size 5x2 ( GPU 0)] Moving data between the CPU and the GPU memories is far slower than moving it inside the GPU memory. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 46 / 48

24 The method torch.cuda.is available() returns a Boolean value indicating if a GPU is available. The Tensor s method cuda() returns a clone on the GPU if the tensor is not already there or returns the tensor itself if it was already there, keeping the bit precision. Conversely the method cpu() makes a clone on the CPU if needed. They both keep the original tensor unchanged. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 47 / 48 If multiple GPUs are available, cross-gpus operations are not allowed by default, with the exception of copy (). An operation between tensors in the same GPU produces a results in the same GPU also. Each GPU has a numerical id, and torch.cuda.device(id) allows to specify where GPU tensors should be created by cuda(). An explicit GPU id can also be provided to the latter. torch.cuda.device of(obj) selects the device to that of the specified tensor or storage. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 48 / 48

25 References S. Shi, Q. Wang, P. Xu, and X. Chu. Benchmarking state-of-the-art deep learning software tools. CoRR, abs/ , François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 48 / 48

EE-559 Deep learning 1.4. Tensor basics and linear regression. François Fleuret Mon Mar 18 20:04:34 UTC 2019

EE-559 Deep learning 1.4. Tensor basics and linear regression. François Fleuret   Mon Mar 18 20:04:34 UTC 2019 EE-559 Deep learning 1.4. Tensor basics and linear regression François Fleuret https://fleuret.org/ee559/ Mon Mar 18 20:04:34 UTC 2019 A tensor is a generalized matrix, a finite table of numerical values

More information

EE-559 Deep learning 1.6. Tensor internals

EE-559 Deep learning 1.6. Tensor internals EE-559 Deep learning 1.6. Tensor internals François Fleuret https://fleuret.org/ee559/ Tue Feb 26 15:45:36 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE A tensor is a view of a [part of a] storage,

More information

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01

Pytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01 (Li Lab) National Science Foundation Center for Big Learning (CBL) Department of Electrical and Computer Engineering (ECE) Department of Computer & Information Science & Engineering (CISE) Pytorch Tutorial

More information

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai

ECE521 W17 Tutorial 1. Renjie Liao & Min Bai ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review

More information

A Practitioner s Guide to MXNet

A Practitioner s Guide to MXNet 1/34 A Practitioner s Guide to MXNet Xingjian Shi Hong Kong University of Science and Technology (HKUST) HKUST CSE Seminar, March 31st, 2017 2/34 Outline 1 Introduction Deep Learning Basics MXNet Highlights

More information

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018

INF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 INF 5860 Machine learning for image classification Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 OUTLINE Deep learning frameworks TensorFlow TensorFlow graphs TensorFlow session

More information

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015

Tips Geared Towards R. Adam J. Suarez. Arpil 10, 2015 Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done

More information

EE-559 Deep learning LSTM and GRU

EE-559 Deep learning LSTM and GRU EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter

More information

Numpy. Luis Pedro Coelho. October 22, Programming for Scientists. Luis Pedro Coelho (Programming for Scientists) Numpy October 22, 2012 (1 / 26)

Numpy. Luis Pedro Coelho. October 22, Programming for Scientists. Luis Pedro Coelho (Programming for Scientists) Numpy October 22, 2012 (1 / 26) Numpy Luis Pedro Coelho Programming for Scientists October 22, 2012 Luis Pedro Coelho (Programming for Scientists) Numpy October 22, 2012 (1 / 26) Historical Numeric (1995) Numarray (for large arrays)

More information

Hardware Acceleration of DNNs

Hardware Acceleration of DNNs Lecture 12: Hardware Acceleration of DNNs Visual omputing Systems Stanford S348V, Winter 2018 Hardware acceleration for DNNs Huawei Kirin NPU Google TPU: Apple Neural Engine Intel Lake rest Deep Learning

More information

More on Neural Networks

More on Neural Networks More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,

More information

CNTK Microsoft s Open Source Deep Learning Toolkit. Taifeng Wang Lead Researcher, Microsoft Research Asia 2016 GTC China

CNTK Microsoft s Open Source Deep Learning Toolkit. Taifeng Wang Lead Researcher, Microsoft Research Asia 2016 GTC China CNTK Microsoft s Open Source Deep Learning Toolkit Taifeng Wang Lead Researcher, Microsoft Research Asia 2016 GTC China Deep learning in Microsoft Cognitive Services https://how-old.net http://www.captionbot.ai

More information

Quantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA

Quantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA Quantum Artificial Intelligence and Machine : The Path to Enterprise Deployments Randall Correll randall.correll@qcware.com +1 (703) 867-2395 Palo Alto, CA 1 Bundled software and services Professional

More information

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs

MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs Lucien Ng The Chinese University of Hong Kong Kwai Wong The Joint Institute for Computational Sciences (JICS), UTK and ORNL Azzam Haidar,

More information

Deep Learning intro and hands-on tutorial

Deep Learning intro and hands-on tutorial Deep Learning intro and hands-on tutorial Π passalis@csd.auth.gr Ε. ώ Π ώ. Π ΠΘ 1 / 53 Deep Learning 2 / 53 Δ Έ ( ) ω π μ, ώπ ώ π ω (,...), π ώ π π π π ω ω ώπ ώ π, (biologically plausible) Δ π ώώ π ώ Ε

More information

TensorFlow: A Framework for Scalable Machine Learning

TensorFlow: A Framework for Scalable Machine Learning TensorFlow: A Framework for Scalable Machine Learning You probably Outline want to know... What is TensorFlow? Why did we create TensorFlow? How does Tensorflow Work? Example: Linear Regression Example:

More information

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team

High-performance processing and development with Madagascar. July 24, 2010 Madagascar development team High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar

More information

Hands-on Lab: Deep Learning with the Theano Python Library

Hands-on Lab: Deep Learning with the Theano Python Library Hands-on Lab: Deep Learning with the Python Library Frédéric Bastien Montreal Institute for Learning Algorithms Université de Montréal Montréal, Canada bastienf@iro.umontreal.ca Presentation prepared with

More information

EE-559 Deep learning 9. Autoencoders and generative models

EE-559 Deep learning 9. Autoencoders and generative models EE-559 Deep learning 9. Autoencoders and generative models François Fleuret https://fleuret.org/dlc/ [version of: May 1, 2018] ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE Embeddings and generative models

More information

Outline. Overview CNTK introduction. Educational resources Conclusions. Symbolic loop Batch scheduling Data parallel training

Outline. Overview CNTK introduction. Educational resources Conclusions. Symbolic loop Batch scheduling Data parallel training Outline Overview CNTK introduction Symbolic loop Batch scheduling Data parallel training Educational resources Conclusions Outline Overview CNTK introduction Symbolic loop Batch scheduling Data parallel

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

CSC321 Lecture 16: ResNets and Attention

CSC321 Lecture 16: ResNets and Attention CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the

More information

@SoyGema GEMA PARREÑO PIQUERAS

@SoyGema GEMA PARREÑO PIQUERAS @SoyGema GEMA PARREÑO PIQUERAS WHAT IS AN ARTIFICIAL NEURON? WHAT IS AN ARTIFICIAL NEURON? Image Recognition Classification using Softmax Regressions and Convolutional Neural Networks Languaje Understanding

More information

Accelerating Model Reduction of Large Linear Systems with Graphics Processors

Accelerating Model Reduction of Large Linear Systems with Graphics Processors Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex

More information

Memory-Augmented Attention Model for Scene Text Recognition

Memory-Augmented Attention Model for Scene Text Recognition Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

PATTERN RECOGNITION AND MACHINE LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 6: Neural Networks and Deep Learning January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology

More information

Lecture 2: Linear regression

Lecture 2: Linear regression Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued

More information

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)

Faster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) Faster Machine Learning via Low-Precision Communication & Computation Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) 2 How many bits do you need to represent a single number in machine

More information

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan

CENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan CENG 783 Special topics in Deep Learning AlchemyAPI Week 8 Sinan Kalkan Loss functions Many correct labels case: Binary prediction for each label, independently: L i = σ j max 0, 1 y ij f j y ij = +1 if

More information

Quantisation. Efficient implementation of convolutional neural networks. Philip Leong. Computer Engineering Lab The University of Sydney

Quantisation. Efficient implementation of convolutional neural networks. Philip Leong. Computer Engineering Lab The University of Sydney 1/51 Quantisation Efficient implementation of convolutional neural networks Philip Leong Computer Engineering Lab The University of Sydney July 2018 / PAPAA Workshop Australia 2/51 3/51 Outline 1 Introduction

More information

Tensor Contractions with Extended BLAS Kernels on CPU and GPU

Tensor Contractions with Extended BLAS Kernels on CPU and GPU Tensor Contractions with Extended BLAS Kernels on CPU and GPU Cris Cecka Senior Research Scientist NVIDIA Research, Santa Clara, California Joint work with Yang Shi, U.N. Niranjan, and Animashree Anandkumar

More information

MODELS USING MATRICES WITH PYTHON

MODELS USING MATRICES WITH PYTHON MODELS USING MATRICES WITH PYTHON José M. Garrido Department of Computer Science May 2016 College of Computing and Software Engineering Kennesaw State University c 2015, J. M. Garrido Models Using Matrices

More information

11 Parallel programming models

11 Parallel programming models 237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages

More information

Distributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria

Distributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria Distributed Machine Learning: A Brief Overview Dan Alistarh IST Austria Background The Machine Learning Cambrian Explosion Key Factors: 1. Large s: Millions of labelled images, thousands of hours of speech

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Deep Learning. Convolutional Neural Networks Applications

Deep Learning. Convolutional Neural Networks Applications Deep Learning Using a Convolutional Neural Network Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich Supercomputing

More information

GRASS GIS Development APIs

GRASS GIS Development APIs GRASS GIS Development APIs Lifting the fog on the different ways to develop with and for GRASS Moritz Lennert Member of the GRASS GIS Project Steering Comittee PostGISomics Over 30 years of development

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Package magma. February 15, 2013

Package magma. February 15, 2013 Package magma February 15, 2013 Title Matrix Algebra on GPU and Multicore Architectures Version 0.2.2-1 Date 2010-08-27 Author Brian J Smith Maintainer Brian J Smith

More information

A CUDA Solver for Helmholtz Equation

A CUDA Solver for Helmholtz Equation Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College

More information

Hydra. A. Augusto Alves Jr and M.D. Sokoloff. University of Cincinnati

Hydra. A. Augusto Alves Jr and M.D. Sokoloff. University of Cincinnati Hydra A library for data analysis in massively parallel platforms A. Augusto Alves Jr and M.D. Sokoloff University of Cincinnati aalvesju@cern.ch Presented at the Workshop Perspectives of GPU computing

More information

Introduction to Benchmark Test for Multi-scale Computational Materials Software

Introduction to Benchmark Test for Multi-scale Computational Materials Software Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)

More information

>TensorFlow and deep learning_

>TensorFlow and deep learning_ >TensorFlow and deep learning_ without a PhD deep Science! #Tensorflow deep Code... @martin_gorner Hello World: handwritten digits classification - MNIST? MNIST = Mixed National Institute of Standards

More information

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA

Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum

More information

I. Numerical Computing

I. Numerical Computing I. Numerical Computing A. Lectures 1-3: Foundations of Numerical Computing Lecture 1 Intro to numerical computing Understand difference and pros/cons of analytical versus numerical solutions Lecture 2

More information

Machine Learning for Gravitational Wave signals classification in LIGO and Virgo

Machine Learning for Gravitational Wave signals classification in LIGO and Virgo Machine Learning for Gravitational Wave signals classification in LIGO and Virgo Elena Cuoco European Gravitational Observatory www.elenacuoco.com @elenacuoco 2 About me About me Working as Data Analyst

More information

arxiv: v1 [cs.lg] 26 Jul 2017

arxiv: v1 [cs.lg] 26 Jul 2017 Tensor Regression Networks Jean Kossaifi Amazon AI Imperial College London jean.kossaifi@imperial.ac.uk Zachary Lipton Amazon AI University of California, San Diego zlipton@cs.ucsd.edu arxiv:1707.08308v1

More information

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff Hydra A library for data analysis in massively parallel platforms A. Augusto Alves Jr and Michael D. Sokoloff University of Cincinnati aalvesju@cern.ch Presented at NVIDIA s GPU Technology Conference,

More information

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,

Index. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow, Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation

More information

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT

Agenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT versus 1 Agenda Deep Learning: Motivation Learning: Backpropagation Deep architectures I: Convolutional Neural Networks (CNN) Deep architectures II: Stacked Auto Encoders (SAE) Caffe Deep Learning Toolbox:

More information

Ilya A. Kaliman* and Anna I. Krylov. Introduction

Ilya A. Kaliman* and Anna I. Krylov. Introduction SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG New Algorithm for Tensor Contractions on Multi-Core CPUs, GPUs, and Accelerators Enables CCSD and EOM-CCSD Calculations with over 1000 Basis Functions on a Single

More information

Practical Free-Start Collision Attacks on 76-step SHA-1

Practical Free-Start Collision Attacks on 76-step SHA-1 Practical Free-Start Collision Attacks on 76-step SHA-1 Inria and École polytechnique, France Nanyang Technological University, Singapore Joint work with Thomas Peyrin and Marc Stevens CWI, Amsterdam 2015

More information

Julian Merten. GPU Computing and Alternative Architecture

Julian Merten. GPU Computing and Alternative Architecture Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Introduction to Deep Learning CMPT 733. Steven Bergner

Introduction to Deep Learning CMPT 733. Steven Bergner Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Maximum likelihood fits with TensorFlow

Maximum likelihood fits with TensorFlow Maximum likelihood fits with TensorFlow Josh Bendavid, M. Dunser (CERN) E. Di Marco, M. Cipriani (INFN/Roma) July 31, 2018 Josh Bendavid (CERN) TensorFlow Fits 1 Introduction Likelihood for for W helicity/rapidity

More information

Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries

Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support Scientific Computing Centre @ AUTH Presentation Outline

More information

introduction to convolutional networks using tensorflow

introduction to convolutional networks using tensorflow introduction to convolutional networks using tensorflow Jesús Fernández Bes, jfbes@ing.uc3m.es 8 de febrero de 2016 contents Install What is Tensorflow? Implementing Softmax Regression Deep Convolutional

More information

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA

S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,

More information

ALU A functional unit

ALU A functional unit ALU A functional unit that performs arithmetic operations such as ADD, SUB, MPY logical operations such as AND, OR, XOR, NOT on given data types: 8-,16-,32-, or 64-bit values A n-1 A n-2... A 1 A 0 B n-1

More information

Lecture 7 Convolutional Neural Networks

Lecture 7 Convolutional Neural Networks Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:

More information

Python & Numpy A tutorial

Python & Numpy A tutorial Python & Numpy A tutorial Devert Alexandre School of Software Engineering of USTC 13 February 2012 Slide 1/38 Table of Contents 1 Why Python & Numpy 2 First steps with Python 3 Fun with lists 4 Quick tour

More information

Practical Combustion Kinetics with CUDA

Practical Combustion Kinetics with CUDA Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides

More information

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)

Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep

More information

Introduction to TensorFlow

Introduction to TensorFlow Large Scale Data Analysis Using Deep Learning (Prof. U Kang) Introduction to TensorFlow 2017.04.17 Beunguk Ahn ( beunguk.ahn@gmail.com) 1 What is TensorFlow? Consturction Phase Execution Phase Examples

More information

Introduction to TensorFlow

Introduction to TensorFlow Introduction to TensorFlow Oliver Dürr Datalab-Lunch Seminar Series Winterthur, 17 Nov, 2016 1 Abstract Introduc)on to TensorFlow TensorFlow is a mul/purpose open source so2ware library for numerical computa/on

More information

CSE370: Introduction to Digital Design

CSE370: Introduction to Digital Design CSE370: Introduction to Digital Design Course staff Gaetano Borriello, Brian DeRenzi, Firat Kiyak Course web www.cs.washington.edu/370/ Make sure to subscribe to class mailing list (cse370@cs) Course text

More information

CS 542G: Conditioning, BLAS, LU Factorization

CS 542G: Conditioning, BLAS, LU Factorization CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles

More information

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning

CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network

More information

Image Processing in Numpy

Image Processing in Numpy Version: January 17, 2017 Computer Vision Laboratory, Linköping University 1 Introduction Image Processing in Numpy Exercises During this exercise, you will become familiar with image processing in Python.

More information

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code

On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance

More information

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS

Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures

More information

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)

<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation) Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Jug: Executing Parallel Tasks in Python

Jug: Executing Parallel Tasks in Python Jug: Executing Parallel Tasks in Python Luis Pedro Coelho EMBL 21 May 2013 Luis Pedro Coelho (EMBL) Jug 21 May 2013 (1 / 24) Jug: Coarse Parallel Tasks in Python Parallel Python code Memoization Luis Pedro

More information

Crash Course on TensorFlow! Vincent Lepetit!

Crash Course on TensorFlow! Vincent Lepetit! Crash Course on TensorFlow Vincent Lepetit 1 TensorFlow Created by Google for easily implementing Deep Networks; Library for Python, but can also be used with C and Java; Exists for Linux, Mac OSX, Windows;

More information

Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs

Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs 1 School of Information Engineering, China University of Geosciences (Beijing) Beijing, 100083, China E-mail: Yaolk1119@icloud.com Ailan

More information

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method

Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),

More information

Algorithms for Uncertainty Quantification

Algorithms for Uncertainty Quantification Technische Universität München SS 2017 Lehrstuhl für Informatik V Dr. Tobias Neckel M. Sc. Ionuț Farcaș April 26, 2017 Algorithms for Uncertainty Quantification Tutorial 1: Python overview In this worksheet,

More information

Level-3 BLAS on a GPU

Level-3 BLAS on a GPU Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón

More information

Solving PDEs with CUDA Jonathan Cohen

Solving PDEs with CUDA Jonathan Cohen Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear

More information

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed

More information

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014

Clojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,

More information

Improvements for Implicit Linear Equation Solvers

Improvements for Implicit Linear Equation Solvers Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often

More information

ECS289: Scalable Machine Learning

ECS289: Scalable Machine Learning ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training

More information

Introduction to Neural Networks

Introduction to Neural Networks Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models

More information

epistasis Documentation

epistasis Documentation epistasis Documentation Release 0.1 Zach Sailer May 19, 2017 Contents 1 Table of Contents 3 1.1 Setup................................................... 3 1.2 Basic Usage...............................................

More information

Lab Course: distributed data analytics

Lab Course: distributed data analytics Lab Course: distributed data analytics 01. Threading and Parallelism Nghia Duong-Trung, Mohsan Jameel Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany International

More information

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU

Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,

More information

Integrated Electricity Demand and Price Forecasting

Integrated Electricity Demand and Price Forecasting Integrated Electricity Demand and Price Forecasting Create and Evaluate Forecasting Models The many interrelated factors which influence demand for electricity cannot be directly modeled by closed-form

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

Machine Learning to Automatically Detect Human Development from Satellite Imagery

Machine Learning to Automatically Detect Human Development from Satellite Imagery Technical Disclosure Commons Defensive Publications Series April 24, 2017 Machine Learning to Automatically Detect Human Development from Satellite Imagery Matthew Manolides Follow this and additional

More information

COMS 6100 Class Notes

COMS 6100 Class Notes COMS 6100 Class Notes Daniel Solus September 20, 2016 1 General Remarks The Lecture notes submitted by the class have been very good. Integer division seemed to be a common oversight when working the Fortran

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

INITIAL INTEGRATION AND EVALUATION

INITIAL INTEGRATION AND EVALUATION INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52

More information

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018

TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in

More information