AMLD Deep Learning in PyTorch. 2. PyTorch tensors
|
|
- Irene Stone
- 6 years ago
- Views:
Transcription
1 AMLD Deep Learning in PyTorch 2. PyTorch tensors François Fleuret February 10, 2018 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE PyTorch s tensors François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 2 / 48
2 A tensor is a generalized matrix, a finite table of numerical values indexed along several discrete dimensions. A 1d tensor is a vector (e.g. a sound sample), A 2d tensor is a matrix (e.g. a grayscale image), A 3d tensor is a vector of identically sized matrix (e.g. a multi-channel image), A 4d tensor is a matrix of identically sized matrix (e.g. a sequence of multi-channel images), etc. Tensors are used to encode the signal to process, but also the internal states and parameters of the neural networks. Manipulating data through this constrained structure allows to use CPUs and GPUs at peak performance. Compounded data structures can represent more diverse data types. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 3 / 48 PyTorch is a Python library built on top of torch s THNN computational backend. Its main features are: Efficient tensor operations on CPU/GPU, automatic on-the-fly differentiation (autograd), optimizers, data I/O. Efficient tensor operations encompass both standard linear algebra and, as we will see later, deep-learning specific operations (convolution, pooling, etc.) A key specificity of PyTorch is the central role of autograd: tensor operations are specified dynamically as python operations. We will come back to this. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 4 / 48
3 >>> from torch import Tensor >>> x = Tensor (5) >>> x. size () torch. Size ([5]) >>> x. fill_ (1.125) [ torch. FloatTensor of size 5] >>> x. sum () >>> x. mean () >>> x. std () The default tensor type is torch.floattensor, but there are others with greater/lesser precision and on CPU/GPU. torch.tensor is an alias for that, and can be changed. In-place operations are suffixed with an underscore. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 5 / 48 torch.tensor.narrow creates a new tensor which is a sub-part of an existing tensor, by constraining one of the indexes. >>> a = Tensor (4, 5). zero_ () >>> a [ torch. FloatTensor of size 4x5] >>> a. narrow (1, 2, 2). fill_ (1.0) [ torch. FloatTensor of size 4x2] >>> a [ torch. FloatTensor of size 4x5] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 6 / 48
4 As in numpy, the : symbol allows to take slices of tensors >>> x = Tensor (3, 2). random_ (10) >>> x [ torch. FloatTensor of size 3x2] >>> x [0] 4 8 [ torch. FloatTensor of size 2] >>> x [0,:] 4 8 [ torch. FloatTensor of size 2] >>> x [:,0] [ torch. FloatTensor of size 3] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 7 / 48 PyTorch provides interfacing to standard linear operations, such as linear system solving or Eigen-decomposition. >>> y = Tensor (3). normal_ () >>> y [ torch. FloatTensor of size 3] >>> m = Tensor (3, 3). normal_ () >>> q, _ = torch. gels (y, m) >>> torch. mm(m, q) [ torch. FloatTensor of size 3x1] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 8 / 48
5 Example: linear regression François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 9 / 48 Given a list of points (x n, y n ) R R, n = 1,..., N, can we find the best line f (x; a, b) = ax + b going through the points, e.g. minimizing the mean square error argmin a,b 1 N N ( ) 2. axn + b y n }{{} f (x;a,b) n=1 Such a model would allow to predict the y associated to a new x, simply by calculating f (x; a, b). François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 10 / 48
6 bash > cat systolic - blood - pressure -vs - age. dat François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 11 / Systolic Blood Pressure (mmhg) Age (years) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 12 / 48
7 x 1 y 1 x 2 y 2.. x N y N }{{} data R N 2 x x x N 1.0 }{{} x R N 2 ( ) a b }{{} α R 2 1 y 1 y 2. y N }{{} y R N 1 import torch, numpy data = torch. from_numpy ( numpy. loadtxt ( systolic - blood - pressure -vs - age.dat )). float () nb = data. size (0) x, y = torch. Tensor (nb, 2), torch. Tensor (nb, 1) x [:,0] = data [:,0] x [:,1] = 1 y [:,0] = data [:,1] alpha, _ = torch. gels (y, x) a, b = alpha [0,0], alpha [1, 0] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 13 / Model Data Systolic Blood Pressure (mmhg) Age (years) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 14 / 48
8 Manipulating high-dimension signals François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 15 / 48 Data type CPU tensor GPU tensor 32-bit float torch.floattensor torch.cuda.floattensor 64-bit float torch.doubletensor torch.cuda.doubletensor 8-bit int (unsigned) torch.bytetensor torch.cuda.bytetensor 64-bit int (signed) torch.longtensor torch.cuda.longtensor >>> x = torch. LongTensor (12) >>> type (x) <class torch. LongTensor > >>> x = x. float () >>> type (x) <class torch. FloatTensor > >>> x = x. cuda () >>> type (x) <class torch. cuda. FloatTensor > Tensors of the torch.cuda types are physically in the GPU memory, and operations on them are done by the GPU. We will come back to that later. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 16 / 48
9 The default tensor type can be set with torch.set default tensor type and used through torch.tensor >>> x = torch. Tensor () >>> x [ torch. FloatTensor with no dimension ] >>> torch. set_default_tensor_type ( torch. LongTensor ) >>> x = torch. Tensor () >>> x [ torch. LongTensor with no dimension ] For concision we often start our code with from torch import Tensor and use Tensor in place of torch.tensor. Also, the new operator of a tensor allows to create one of same type >>> y = torch. ByteTensor (10) >>> u = y. new (3). fill_ (123) >>> u [ torch. ByteTensor of size 3] This is key to writing functions able to handle all the tensor types. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 17 / 48 2d tensor (e.g. grayscale image) 3d tensor (e.g. rgb image) [, ] [,, ] [, ] [,, ] [,, ] 4d tensor (e.g. sequence of rgb images) [,,, ] [,,, ] [,,, ]... [,,, ] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 18 / 48
10 Here are some examples from the vast library of tensor operations: Creation torch.tensor() torch.tensor(size) torch.tensor(sequence) torch.eye(n) torch.from numpy(ndarray) Indexing, Slicing, Joining, Mutating torch.tensor.view(*args) torch.tensor.expand(*sizes) torch.cat(inputs, dimension=0) torch.chunk(tensor, chunks, dim=0)[source] torch.index select(input, dim, index, out=none) torch.t(input, out=none) torch.transpose(input, dim0, dim1, out=none) Filling Tensor.fill (value) torch.bernoulli(input, out=none) torch.normal() François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 19 / 48 Pointwise math torch.abs(input, out=none) torch.add() torch.cos(input, out=none) torch.sigmoid(input, out=none) (+ many operators) Math reduction torch.dist(input, other, p=2, out=none) torch.mean() torch.norm() torch.std() torch.sum() BLAS and LAPACK Operations torch.eig(a, eigenvectors=false, out=none) torch.gels(b, A, out=none) torch.inverse(input, out=none) torch.mm(mat1, mat2, out=none) torch.mv(mat, vec, out=none) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 20 / 48
11 x = torch.longtensor([ [ 1, 3, 0 ], [ 2, 4, 6 ] ]) x.t() x.view(-1) x.view(3, -1) x.narrow(1, 1, 2) x.view(1, 2, 3).expand(3, 2, 3) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 21 / 48 x = torch.longtensor([ [ [ 1, 2, 1 ], [ 2, 1, 2 ] ], [ [ 3, 0, 3 ], [ 0, 3, 0 ] ] ]) x.narrow(0, 0, 1) x.narrow(2, 0, 2) x.transpose(0, 1) x.transpose(0, 2) x.transpose(1, 2) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 22 / 48
12 PyTorch offers simple interfaces to standard image data-bases. import import torch torchvision # Get the CIFAR10 train images, download if necessary cifar = torchvision. datasets. CIFAR10 (./ data / cifar10 /, train =True, download = True ) # Converts the numpy tensor into a PyTorch one x = torch. from_numpy ( cifar. train_data ). transpose (1, 3). transpose (2, 3) # Prints out some info print ( str ( type (x)), x. size (), x. min (), x. max ()) prints Files already downloaded and verified <class torch. ByteTensor > torch. Size ([50000, 3, 32, 32]) , François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 23 / 48 # Narrow to the first images, make the tensor Float, and move the # values in [-1, 1] x = x.narrow(0, 0, 48).float().div(255) # Save these samples as a single image torchvision.utils.save_image(x, images-cifar-4x12.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 24 / 48
13 # Switch the row and column indexes x.transpose_(2, 3) torchvision.utils.save_image(x, images-cifar-4x12-rotated.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 25 / 48 # Kill the green (1) and blue (2) channels x.narrow(1, 1, 2).fill_(-1) torchvision.utils.save_image(x, images-cifar-4x12-rotated-and-red.png, nrow = 12) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 26 / 48
14 Broadcasting François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 27 / 48 Broadcasting automagically expands dimensions of size 1 by replicating coefficients, when it is necessary to perform operations. >>> A = Tensor ([[1], [2], [3], [4]]) >>> A [ torch. FloatTensor of size 4x1] >>> B = Tensor ([[5, -5, 5, -5, 5]]) >>> B [ torch. FloatTensor of size 1x5] >>> C = A + B >>> C [ torch. FloatTensor of size 4x5] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 28 / 48
15 A = Tensor ([[1], [2], [3], [4]]) B = Tensor ([[5, -5, 5, -5, 5]]) C = A + B A B C = A + B Broadcasted François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 29 / 48 Precisely, broadcasting proceeds as follows: 1. If one of the tensors has fewer dimensions than the other, it is reshaped by adding as many dimensions of size 1 as necessary in the front; then 2. for every mismatch if one of the two sizes is one, the tensor is expanded along this axis by replicating coefficients. If there is a tensor size mismatch for one of the dimension and neither of them is one, the operation fails. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 30 / 48
16 >>> x = Tensor ([1, 2, 3, 4, 5]) >>> y = Tensor (3, 5). fill_ (2.0) >>> z = x + y >>> z [ torch. FloatTensor of size 3x5] >>> a = Tensor (3, 1, 5). fill_ (1.0) >>> b = Tensor (1, 3, 5). fill_ (2.0) >>> c = a * b + a >>> c (0,.,.) = (1,.,.) = (2,.,.) = [ torch. FloatTensor of size 3 x3x5 ] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 31 / 48 Tensor internals François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 32 / 48
17 A tensor is a view of a storage, which is a low-level 1d vector. >>> q = Tensor (2, 4). zero_ () >>> q. storage () [ torch. FloatStorage of size 8] >>> s = q. storage () >>> s [4] = 1.0 >>> s 1.0 [ torch. FloatStorage of size 8] >>> q [ torch. FloatTensor of size 2x4] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 33 / 48 Multiple tensors can share the same storage. It happens when using operations such as view(), expand() or transpose(). >>> r = q. view (2, 2, 2) >>> r (0,.,.) = (1,.,.) = [ torch. FloatTensor of size 2 x2x2 ] >>> r[1, 1, 0] = 1.0 >>> q [ torch. FloatTensor of size 2x4] >>> r. narrow (0, 1, 1). fill_ (3.0) (0,.,.) = [ torch. FloatTensor of size 1 x2x2 ] >>> q [ torch. FloatTensor of size 2x4] François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 34 / 48
18 The first coefficient of a tensor is the one at storage offset() in storage(). To increment index k by 1, you have to move by stride(k) elements in the storage. >>> q = torch. arange (0, 20). storage () >>> x = torch. Tensor (). set_ (q, storage_offset = 5, size = (3, 2), stride = (4, 1)) >>> x [ torch. FloatTensor of size 3x2] offset stride(0) stride(0) q = x[0,0] x[0,1] x[1,0] x[1,1] x[2,0] x[2,1] stride(1) stride(1) stride(1) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 35 / 48 We can explicitly create different views of the same storage >>> n = torch. linspace (1, 4, 4) >>> n [ torch. FloatTensor of size 4] >>> Tensor (). set_ (n. storage (), 1, (3, 3), (0, 1)) [ torch. FloatTensor of size 3x3] >>> Tensor (). set_ (n. storage (), 1, (2, 4), (1, 0)) [ torch. FloatTensor of size 2x4] This is in particular how transpositions and broadcasting are implemented. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 36 / 48
19 This organization explains the following (maybe surprising) error >>> x = Tensor (100, 100) >>> y = x.t() >>> y. view ( -1) Traceback ( most recent call last ): File "<stdin >", line 1, in <module > RuntimeError : input is not contiguous at / home / fleuret / misc / git / pytorch / torch / lib / TH/ generic / THTensor.c :231 >>> y. stride () (1, 100) t() creates a tensor that shares the storage with the original tensor. It cannot be flattened into a 1d contiguous view without a memory copy. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 37 / 48 Using GPUs François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 38 / 48
20 The size of current state-of-the-art networks makes computation a critical issue, in particular for training and optimizing meta-parameters. Although they were historically developed for mass-market real-time CGI, their massively parallel architecture is extremely fitting to signal processing and high dimension linear algebra. Their use is instrumental in the success of deep-learning. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 39 / 48 GPU1 RAM GPU1 cores CPU RAM GPU2 RAM GPU2 cores CPU cores Disk and network A standard NVIDIA GTX 1080 has 2, 560 single-precision computing cores clocked at 1.6GHz, and deliver a peak performance of 9 TFlops. The precise structure of a GPU memory and how its cores communicate with it is a complicated topic that we will not cover here. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 40 / 48
21 TABLE 7. COMPARATIVE EXPERIMENT RESULTS (TIME PER MINI-BATCH IN SECOND) Desktop CPU (Threads used) Server CPU (Threads used) Single GPU G980 G1080 K80 Caffe CNTK FCN-S TF MXNet Torch Caffe CNTK AlexNet-S TF MXNet Torch Caffe CNTK RenNet-50 TF MXNet Torch Caffe CNTK FCN-R TF MXNet Torch Caffe CNTK AlexNet-R TF MXNet Torch Caffe CNTK RenNet-56 TF MXNet Torch Caffe CNTK LSTM TF MXNet Torch Note: The mini-batch sizes for FCN-S, AlexNet-S, ResNet-50, FCN-R, AlexNet-R, ResNet-56 and LSTM are 64, 16, 16, 1024, 1024, 128 and 128 respectively. TABLE 9. THE SIZE OF MINI-BATCH USED FOR DIFFERENT NETWORKS ON CPU PLATFORMS. TABLE 8. COMPARATIVE EXPERIMENT RESULTS François Fleuret BETWEEN SINGLE GPU AND MULTIPLEAMLD GPUS Deep Learning in PyTorch / 2. PyTorch tensors 41 / 48 Network Mini-batch size (TIME PER MINI-BATCH IN SECOND) FCN-S 64 AlexNet-S 16 ResNet FCN-R 1024 AlexNet-R 128 ResNet LSTM 256 # of GK Caffe CNTK FCN-R TF MXNet Torch Caffe CNTK AlexNet-R TF MXNet Torch Caffe CNTK RenNet-56 TF MXNet Torch Note: The mini-batch sizes for FCN-R, AlexNet-R and ResNet-56 are 4096, 1024 and 128 respectively. (Shi et al., 2016) AlexNet-S. The running time results of mini-batch size 16 with different number of threads are shown in Fig. 3. On i7-3820, Caffe surpasses the other four tools under 1, 2, 4 threads, but fails to run with 8 threads. CNTK and Torch have close performance with 1-4 threads and they also fail to execute with 8 threads. On E5-2630v3, Caffe achieves the best performance with 1, 2, 4, and 8 threads. Among all configurations, TensorFlow achieves the best performance by using 16 threads. ResNet-50. Since CNTK does not support batch nor- The current standard to program a GPU is through the CUDA ( Compute Unified Device Architecture ) model, defined by NVIDIA. Alternatives are OpenCL, backed by many CPU/GPU manufacturers, and more recently AMD s HIP ( Heterogeneous-compute Interface for Portability ). Google developed its own processor for deep learning dubbed TPU ( Tensor Processing Unit ) for in-house use. It is targeted at TensorFlow and offers excellent flops/watt performance. In practice, as of today ( ), NVIDIA hardware remains the default choice for deep learning, and CUDA is the reference framework in use. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 42 / 48
22 From a practical perspective, libraries interface the framework (e.g. PyTorch) with the computational backend (e.g. CPU or GPU) BLAS ( Basic Linear Algebra Subprograms ): vector/matrix products, and the cublas implementation for NVIDIA GPUs, LAPACK ( Linear Algebra Package ): linear system solving, Eigen-decomposition, etc. cudnn ( NVIDIA CUDA Deep Neural Network library ) computations specific to deep-learning on NVIDIA GPUs. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 43 / 48 The use of the GPUs in PyTorch is done by using the relevant tensor types. Tensors of torch.cuda types are in the GPU memory. Operations on them are done by the GPU and resulting tensors are stored in its memory. Data type CPU tensor GPU tensor 8-bit int (unsigned) torch.bytetensor torch.cuda.bytetensor 64-bit int (signed) torch.longtensor torch.cuda.longtensor 16-bit float torch.halftensor torch.cuda.halftensor 32-bit float torch.floattensor torch.cuda.floattensor 64-bit float torch.doubletensor torch.cuda.doubletensor François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 44 / 48
23 Apart from copy (), operations cannot mix different tensor types (CPU vs. GPU, or different numerical types): >>> x = torch. FloatTensor (3, 5). normal_ () >>> y = torch. cuda. FloatTensor (3, 5). normal_ () >>> x. copy_ (y) [ torch. FloatTensor of size 3x5] >>> x+y Traceback ( most recent call last ): File "<stdin >", line 1, in <module > File "/ home / fleuret / misc / anaconda3 / lib / python3.5/ site - packages / torch / tensor.py", line 293, in add return self. add ( other ) TypeError : add received an invalid combination of arguments - got ( torch. cuda. FloatTensor ), but expected one of: * ( float value ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( torch. FloatTensor other ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( torch. SparseFloatTensor other ) didn t match because some of the arguments have invalid types : ( torch. cuda. FloatTensor ) * ( float value, torch. FloatTensor other ) * ( float value, torch. SparseFloatTensor other ) François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 45 / 48 Operations maintain the type of the tensors, so you generally do not need to worry about making your code generic regarding the tensor types. However, if you have to explicitly create a new tensor, the best is to use variables new() method. >>> def the_same_full_of_zeros_please (x):... return x. new (x. size ()). zero_ ()... >>> u = torch. FloatTensor (3, 5). normal_ () >>> the_same_full_of_zeros_please (u) [ torch. FloatTensor of size 3x5] >>> v = torch. cuda. DoubleTensor (5,2). fill_ (1.0) >>> the_same_full_of_zeros_please (v) [ torch. cuda. DoubleTensor of size 5x2 ( GPU 0)] Moving data between the CPU and the GPU memories is far slower than moving it inside the GPU memory. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 46 / 48
24 The method torch.cuda.is available() returns a Boolean value indicating if a GPU is available. The Tensor s method cuda() returns a clone on the GPU if the tensor is not already there or returns the tensor itself if it was already there, keeping the bit precision. Conversely the method cpu() makes a clone on the CPU if needed. They both keep the original tensor unchanged. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 47 / 48 If multiple GPUs are available, cross-gpus operations are not allowed by default, with the exception of copy (). An operation between tensors in the same GPU produces a results in the same GPU also. Each GPU has a numerical id, and torch.cuda.device(id) allows to specify where GPU tensors should be created by cuda(). An explicit GPU id can also be provided to the latter. torch.cuda.device of(obj) selects the device to that of the specified tensor or storage. François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 48 / 48
25 References S. Shi, Q. Wang, P. Xu, and X. Chu. Benchmarking state-of-the-art deep learning software tools. CoRR, abs/ , François Fleuret AMLD Deep Learning in PyTorch / 2. PyTorch tensors 48 / 48
EE-559 Deep learning 1.4. Tensor basics and linear regression. François Fleuret Mon Mar 18 20:04:34 UTC 2019
EE-559 Deep learning 1.4. Tensor basics and linear regression François Fleuret https://fleuret.org/ee559/ Mon Mar 18 20:04:34 UTC 2019 A tensor is a generalized matrix, a finite table of numerical values
More informationEE-559 Deep learning 1.6. Tensor internals
EE-559 Deep learning 1.6. Tensor internals François Fleuret https://fleuret.org/ee559/ Tue Feb 26 15:45:36 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE A tensor is a view of a [part of a] storage,
More informationPytorch Tutorial. Xiaoyong Yuan, Xiyao Ma 2018/01
(Li Lab) National Science Foundation Center for Big Learning (CBL) Department of Electrical and Computer Engineering (ECE) Department of Computer & Information Science & Engineering (CISE) Pytorch Tutorial
More informationECE521 W17 Tutorial 1. Renjie Liao & Min Bai
ECE521 W17 Tutorial 1 Renjie Liao & Min Bai Schedule Linear Algebra Review Matrices, vectors Basic operations Introduction to TensorFlow NumPy Computational Graphs Basic Examples Linear Algebra Review
More informationA Practitioner s Guide to MXNet
1/34 A Practitioner s Guide to MXNet Xingjian Shi Hong Kong University of Science and Technology (HKUST) HKUST CSE Seminar, March 31st, 2017 2/34 Outline 1 Introduction Deep Learning Basics MXNet Highlights
More informationINF 5860 Machine learning for image classification. Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018
INF 5860 Machine learning for image classification Lecture 5 : Introduction to TensorFlow Tollef Jahren February 14, 2018 OUTLINE Deep learning frameworks TensorFlow TensorFlow graphs TensorFlow session
More informationTips Geared Towards R. Adam J. Suarez. Arpil 10, 2015
Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done
More informationEE-559 Deep learning LSTM and GRU
EE-559 Deep learning 11.2. LSTM and GRU François Fleuret https://fleuret.org/ee559/ Mon Feb 18 13:33:24 UTC 2019 ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE The Long-Short Term Memory unit (LSTM) by Hochreiter
More informationNumpy. Luis Pedro Coelho. October 22, Programming for Scientists. Luis Pedro Coelho (Programming for Scientists) Numpy October 22, 2012 (1 / 26)
Numpy Luis Pedro Coelho Programming for Scientists October 22, 2012 Luis Pedro Coelho (Programming for Scientists) Numpy October 22, 2012 (1 / 26) Historical Numeric (1995) Numarray (for large arrays)
More informationHardware Acceleration of DNNs
Lecture 12: Hardware Acceleration of DNNs Visual omputing Systems Stanford S348V, Winter 2018 Hardware acceleration for DNNs Huawei Kirin NPU Google TPU: Apple Neural Engine Intel Lake rest Deep Learning
More informationMore on Neural Networks
More on Neural Networks Yujia Yan Fall 2018 Outline Linear Regression y = Wx + b (1) Linear Regression y = Wx + b (1) Polynomial Regression y = Wφ(x) + b (2) where φ(x) gives the polynomial basis, e.g.,
More informationCNTK Microsoft s Open Source Deep Learning Toolkit. Taifeng Wang Lead Researcher, Microsoft Research Asia 2016 GTC China
CNTK Microsoft s Open Source Deep Learning Toolkit Taifeng Wang Lead Researcher, Microsoft Research Asia 2016 GTC China Deep learning in Microsoft Cognitive Services https://how-old.net http://www.captionbot.ai
More informationQuantum Artificial Intelligence and Machine Learning: The Path to Enterprise Deployments. Randall Correll. +1 (703) Palo Alto, CA
Quantum Artificial Intelligence and Machine : The Path to Enterprise Deployments Randall Correll randall.correll@qcware.com +1 (703) 867-2395 Palo Alto, CA 1 Bundled software and services Professional
More informationMagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs
MagmaDNN High-Performance Data Analytics for Manycore GPUs and CPUs Lucien Ng The Chinese University of Hong Kong Kwai Wong The Joint Institute for Computational Sciences (JICS), UTK and ORNL Azzam Haidar,
More informationDeep Learning intro and hands-on tutorial
Deep Learning intro and hands-on tutorial Π passalis@csd.auth.gr Ε. ώ Π ώ. Π ΠΘ 1 / 53 Deep Learning 2 / 53 Δ Έ ( ) ω π μ, ώπ ώ π ω (,...), π ώ π π π π ω ω ώπ ώ π, (biologically plausible) Δ π ώώ π ώ Ε
More informationTensorFlow: A Framework for Scalable Machine Learning
TensorFlow: A Framework for Scalable Machine Learning You probably Outline want to know... What is TensorFlow? Why did we create TensorFlow? How does Tensorflow Work? Example: Linear Regression Example:
More informationHigh-performance processing and development with Madagascar. July 24, 2010 Madagascar development team
High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar
More informationHands-on Lab: Deep Learning with the Theano Python Library
Hands-on Lab: Deep Learning with the Python Library Frédéric Bastien Montreal Institute for Learning Algorithms Université de Montréal Montréal, Canada bastienf@iro.umontreal.ca Presentation prepared with
More informationEE-559 Deep learning 9. Autoencoders and generative models
EE-559 Deep learning 9. Autoencoders and generative models François Fleuret https://fleuret.org/dlc/ [version of: May 1, 2018] ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE Embeddings and generative models
More informationOutline. Overview CNTK introduction. Educational resources Conclusions. Symbolic loop Batch scheduling Data parallel training
Outline Overview CNTK introduction Symbolic loop Batch scheduling Data parallel training Educational resources Conclusions Outline Overview CNTK introduction Symbolic loop Batch scheduling Data parallel
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationCSC321 Lecture 16: ResNets and Attention
CSC321 Lecture 16: ResNets and Attention Roger Grosse Roger Grosse CSC321 Lecture 16: ResNets and Attention 1 / 24 Overview Two topics for today: Topic 1: Deep Residual Networks (ResNets) This is the state-of-the
More information@SoyGema GEMA PARREÑO PIQUERAS
@SoyGema GEMA PARREÑO PIQUERAS WHAT IS AN ARTIFICIAL NEURON? WHAT IS AN ARTIFICIAL NEURON? Image Recognition Classification using Softmax Regressions and Convolutional Neural Networks Languaje Understanding
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationMemory-Augmented Attention Model for Scene Text Recognition
Memory-Augmented Attention Model for Scene Text Recognition Cong Wang 1,2, Fei Yin 1,2, Cheng-Lin Liu 1,2,3 1 National Laboratory of Pattern Recognition Institute of Automation, Chinese Academy of Sciences
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationWelcome to MCS 572. content and organization expectations of the course. definition and classification
Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson
More informationPATTERN RECOGNITION AND MACHINE LEARNING
PATTERN RECOGNITION AND MACHINE LEARNING Slide Set 6: Neural Networks and Deep Learning January 2018 Heikki Huttunen heikki.huttunen@tut.fi Department of Signal Processing Tampere University of Technology
More informationLecture 2: Linear regression
Lecture 2: Linear regression Roger Grosse 1 Introduction Let s ump right in and look at our first machine learning algorithm, linear regression. In regression, we are interested in predicting a scalar-valued
More informationFaster Machine Learning via Low-Precision Communication & Computation. Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich)
Faster Machine Learning via Low-Precision Communication & Computation Dan Alistarh (IST Austria & ETH Zurich), Hantian Zhang (ETH Zurich) 2 How many bits do you need to represent a single number in machine
More informationCENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan
CENG 783 Special topics in Deep Learning AlchemyAPI Week 8 Sinan Kalkan Loss functions Many correct labels case: Binary prediction for each label, independently: L i = σ j max 0, 1 y ij f j y ij = +1 if
More informationQuantisation. Efficient implementation of convolutional neural networks. Philip Leong. Computer Engineering Lab The University of Sydney
1/51 Quantisation Efficient implementation of convolutional neural networks Philip Leong Computer Engineering Lab The University of Sydney July 2018 / PAPAA Workshop Australia 2/51 3/51 Outline 1 Introduction
More informationTensor Contractions with Extended BLAS Kernels on CPU and GPU
Tensor Contractions with Extended BLAS Kernels on CPU and GPU Cris Cecka Senior Research Scientist NVIDIA Research, Santa Clara, California Joint work with Yang Shi, U.N. Niranjan, and Animashree Anandkumar
More informationMODELS USING MATRICES WITH PYTHON
MODELS USING MATRICES WITH PYTHON José M. Garrido Department of Computer Science May 2016 College of Computing and Software Engineering Kennesaw State University c 2015, J. M. Garrido Models Using Matrices
More information11 Parallel programming models
237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages
More informationDistributed Machine Learning: A Brief Overview. Dan Alistarh IST Austria
Distributed Machine Learning: A Brief Overview Dan Alistarh IST Austria Background The Machine Learning Cambrian Explosion Key Factors: 1. Large s: Millions of labelled images, thousands of hours of speech
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationDeep Learning. Convolutional Neural Networks Applications
Deep Learning Using a Convolutional Neural Network Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich Supercomputing
More informationGRASS GIS Development APIs
GRASS GIS Development APIs Lifting the fog on the different ways to develop with and for GRASS Moritz Lennert Member of the GRASS GIS Project Steering Comittee PostGISomics Over 30 years of development
More informationDense Arithmetic over Finite Fields with CUMODP
Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,
More informationPackage magma. February 15, 2013
Package magma February 15, 2013 Title Matrix Algebra on GPU and Multicore Architectures Version 0.2.2-1 Date 2010-08-27 Author Brian J Smith Maintainer Brian J Smith
More informationA CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationHydra. A. Augusto Alves Jr and M.D. Sokoloff. University of Cincinnati
Hydra A library for data analysis in massively parallel platforms A. Augusto Alves Jr and M.D. Sokoloff University of Cincinnati aalvesju@cern.ch Presented at the Workshop Perspectives of GPU computing
More informationIntroduction to Benchmark Test for Multi-scale Computational Materials Software
Introduction to Benchmark Test for Multi-scale Computational Materials Software Shun Xu*, Jian Zhang, Zhong Jin xushun@sccas.cn Computer Network Information Center Chinese Academy of Sciences (IPCC member)
More information>TensorFlow and deep learning_
>TensorFlow and deep learning_ without a PhD deep Science! #Tensorflow deep Code... @martin_gorner Hello World: handwritten digits classification - MNIST? MNIST = Mixed National Institute of Standards
More informationAntti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA
S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum
More informationI. Numerical Computing
I. Numerical Computing A. Lectures 1-3: Foundations of Numerical Computing Lecture 1 Intro to numerical computing Understand difference and pros/cons of analytical versus numerical solutions Lecture 2
More informationMachine Learning for Gravitational Wave signals classification in LIGO and Virgo
Machine Learning for Gravitational Wave signals classification in LIGO and Virgo Elena Cuoco European Gravitational Observatory www.elenacuoco.com @elenacuoco 2 About me About me Working as Data Analyst
More informationarxiv: v1 [cs.lg] 26 Jul 2017
Tensor Regression Networks Jean Kossaifi Amazon AI Imperial College London jean.kossaifi@imperial.ac.uk Zachary Lipton Amazon AI University of California, San Diego zlipton@cs.ucsd.edu arxiv:1707.08308v1
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationHydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff
Hydra A library for data analysis in massively parallel platforms A. Augusto Alves Jr and Michael D. Sokoloff University of Cincinnati aalvesju@cern.ch Presented at NVIDIA s GPU Technology Conference,
More informationIndex. Santanu Pattanayak 2017 S. Pattanayak, Pro Deep Learning with TensorFlow,
Index A Activation functions, neuron/perceptron binary threshold activation function, 102 103 linear activation function, 102 rectified linear unit, 106 sigmoid activation function, 103 104 SoftMax activation
More informationAgenda. Digit Classification using CNN Digit Classification using SAE Visualization: Class models, filters and saliency 2 DCT
versus 1 Agenda Deep Learning: Motivation Learning: Backpropagation Deep architectures I: Convolutional Neural Networks (CNN) Deep architectures II: Stacked Auto Encoders (SAE) Caffe Deep Learning Toolbox:
More informationIlya A. Kaliman* and Anna I. Krylov. Introduction
SOFTWARE NEWS AND UPDATES WWW.C-CHEM.ORG New Algorithm for Tensor Contractions on Multi-Core CPUs, GPUs, and Accelerators Enables CCSD and EOM-CCSD Calculations with over 1000 Basis Functions on a Single
More informationPractical Free-Start Collision Attacks on 76-step SHA-1
Practical Free-Start Collision Attacks on 76-step SHA-1 Inria and École polytechnique, France Nanyang Technological University, Singapore Joint work with Thomas Peyrin and Marc Stevens CWI, Amsterdam 2015
More informationJulian Merten. GPU Computing and Alternative Architecture
Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationIntroduction to Deep Learning CMPT 733. Steven Bergner
Introduction to Deep Learning CMPT 733 Steven Bergner Overview Renaissance of artificial neural networks Representation learning vs feature engineering Background Linear Algebra, Optimization Regularization
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More informationMaximum likelihood fits with TensorFlow
Maximum likelihood fits with TensorFlow Josh Bendavid, M. Dunser (CERN) E. Di Marco, M. Cipriani (INFN/Roma) July 31, 2018 Josh Bendavid (CERN) TensorFlow Fits 1 Introduction Likelihood for for W helicity/rapidity
More informationEvaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries
Evaluation and Benchmarking of Highly Scalable Parallel Numerical Libraries Christos Theodosiou (ctheodos@grid.auth.gr) User and Application Support Scientific Computing Centre @ AUTH Presentation Outline
More informationintroduction to convolutional networks using tensorflow
introduction to convolutional networks using tensorflow Jesús Fernández Bes, jfbes@ing.uc3m.es 8 de febrero de 2016 contents Install What is Tensorflow? Implementing Softmax Regression Deep Convolutional
More informationS0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA
S0214 : GPU Based Stacking Sequence Generation For Composite Skins Using GA Date: 16th May 2012 Wed, 3pm to 3.25pm(Adv. Session) Sathyanarayana K., Manish Banga, and Ravi Kumar G. V. V. Engineering Services,
More informationALU A functional unit
ALU A functional unit that performs arithmetic operations such as ADD, SUB, MPY logical operations such as AND, OR, XOR, NOT on given data types: 8-,16-,32-, or 64-bit values A n-1 A n-2... A 1 A 0 B n-1
More informationLecture 7 Convolutional Neural Networks
Lecture 7 Convolutional Neural Networks CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University of Chicago April 17, 2017 We saw before: ŷ x 1 x 2 x 3 x 4 A series of matrix multiplications:
More informationPython & Numpy A tutorial
Python & Numpy A tutorial Devert Alexandre School of Software Engineering of USTC 13 February 2012 Slide 1/38 Table of Contents 1 Why Python & Numpy 2 First steps with Python 3 Fun with lists 4 Quick tour
More informationPractical Combustion Kinetics with CUDA
Funded by: U.S. Department of Energy Vehicle Technologies Program Program Manager: Gurpreet Singh & Leo Breton Practical Combustion Kinetics with CUDA GPU Technology Conference March 20, 2015 Russell Whitesides
More informationLecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets)
COS 402 Machine Learning and Artificial Intelligence Fall 2016 Lecture 8: Introduction to Deep Learning: Part 2 (More on backpropagation, and ConvNets) Sanjeev Arora Elad Hazan Recap: Structure of a deep
More informationIntroduction to TensorFlow
Large Scale Data Analysis Using Deep Learning (Prof. U Kang) Introduction to TensorFlow 2017.04.17 Beunguk Ahn ( beunguk.ahn@gmail.com) 1 What is TensorFlow? Consturction Phase Execution Phase Examples
More informationIntroduction to TensorFlow
Introduction to TensorFlow Oliver Dürr Datalab-Lunch Seminar Series Winterthur, 17 Nov, 2016 1 Abstract Introduc)on to TensorFlow TensorFlow is a mul/purpose open source so2ware library for numerical computa/on
More informationCSE370: Introduction to Digital Design
CSE370: Introduction to Digital Design Course staff Gaetano Borriello, Brian DeRenzi, Firat Kiyak Course web www.cs.washington.edu/370/ Make sure to subscribe to class mailing list (cse370@cs) Course text
More informationCS 542G: Conditioning, BLAS, LU Factorization
CS 542G: Conditioning, BLAS, LU Factorization Robert Bridson September 22, 2008 1 Why some RBF Kernel Functions Fail We derived some sensible RBF kernel functions, like φ(r) = r 2 log r, from basic principles
More informationCS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning
CS 229 Project Final Report: Reinforcement Learning for Neural Network Architecture Category : Theory & Reinforcement Learning Lei Lei Ruoxuan Xiong December 16, 2017 1 Introduction Deep Neural Network
More informationImage Processing in Numpy
Version: January 17, 2017 Computer Vision Laboratory, Linköping University 1 Introduction Image Processing in Numpy Exercises During this exercise, you will become familiar with image processing in Python.
More informationOn Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code
On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More information<Special Topics in VLSI> Learning for Deep Neural Networks (Back-propagation)
Learning for Deep Neural Networks (Back-propagation) Outline Summary of Previous Standford Lecture Universal Approximation Theorem Inference vs Training Gradient Descent Back-Propagation
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationJug: Executing Parallel Tasks in Python
Jug: Executing Parallel Tasks in Python Luis Pedro Coelho EMBL 21 May 2013 Luis Pedro Coelho (EMBL) Jug 21 May 2013 (1 / 24) Jug: Coarse Parallel Tasks in Python Parallel Python code Memoization Luis Pedro
More informationCrash Course on TensorFlow! Vincent Lepetit!
Crash Course on TensorFlow Vincent Lepetit 1 TensorFlow Created by Google for easily implementing Deep Networks; Library for Python, but can also be used with C and Java; Exists for Linux, Mac OSX, Windows;
More informationParallelism of MRT Lattice Boltzmann Method based on Multi-GPUs
Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs 1 School of Information Engineering, China University of Geosciences (Beijing) Beijing, 100083, China E-mail: Yaolk1119@icloud.com Ailan
More informationResearch on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),
More informationAlgorithms for Uncertainty Quantification
Technische Universität München SS 2017 Lehrstuhl für Informatik V Dr. Tobias Neckel M. Sc. Ionuț Farcaș April 26, 2017 Algorithms for Uncertainty Quantification Tutorial 1: Python overview In this worksheet,
More informationLevel-3 BLAS on a GPU
Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationDynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed
More informationClojure Concurrency Constructs, Part Two. CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014
Clojure Concurrency Constructs, Part Two CSCI 5828: Foundations of Software Engineering Lecture 13 10/07/2014 1 Goals Cover the material presented in Chapter 4, of our concurrency textbook In particular,
More informationImprovements for Implicit Linear Equation Solvers
Improvements for Implicit Linear Equation Solvers Roger Grimes, Bob Lucas, Clement Weisbecker Livermore Software Technology Corporation Abstract Solving large sparse linear systems of equations is often
More informationECS289: Scalable Machine Learning
ECS289: Scalable Machine Learning Cho-Jui Hsieh UC Davis Sept 27, 2015 Outline Linear regression Ridge regression and Lasso Time complexity (closed form solution) Iterative Solvers Regression Input: training
More informationIntroduction to Neural Networks
Introduction to Neural Networks Philipp Koehn 3 October 207 Linear Models We used before weighted linear combination of feature values h j and weights λ j score(λ, d i ) = j λ j h j (d i ) Such models
More informationepistasis Documentation
epistasis Documentation Release 0.1 Zach Sailer May 19, 2017 Contents 1 Table of Contents 3 1.1 Setup................................................... 3 1.2 Basic Usage...............................................
More informationLab Course: distributed data analytics
Lab Course: distributed data analytics 01. Threading and Parallelism Nghia Duong-Trung, Mohsan Jameel Information Systems and Machine Learning Lab (ISMLL) University of Hildesheim, Germany International
More informationMultiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU
Multiphase Flow Simulations in Inclined Tubes with Lattice Boltzmann Method on GPU Khramtsov D.P., Nekrasov D.A., Pokusaev B.G. Department of Thermodynamics, Thermal Engineering and Energy Saving Technologies,
More informationIntegrated Electricity Demand and Price Forecasting
Integrated Electricity Demand and Price Forecasting Create and Evaluate Forecasting Models The many interrelated factors which influence demand for electricity cannot be directly modeled by closed-form
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationMachine Learning to Automatically Detect Human Development from Satellite Imagery
Technical Disclosure Commons Defensive Publications Series April 24, 2017 Machine Learning to Automatically Detect Human Development from Satellite Imagery Matthew Manolides Follow this and additional
More informationCOMS 6100 Class Notes
COMS 6100 Class Notes Daniel Solus September 20, 2016 1 General Remarks The Lecture notes submitted by the class have been very good. Integer division seemed to be a common oversight when working the Fortran
More informationGPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic
GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationINITIAL INTEGRATION AND EVALUATION
INITIAL INTEGRATION AND EVALUATION OF SLATE PARALLEL BLAS IN LATTE Marc Cawkwell, Danny Perez, Arthur Voter Asim YarKhan, Gerald Ragghianti, Jack Dongarra, Introduction The aim of the joint milestone STMS10-52
More informationTENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS. Cris Cecka Senior Research Scientist, NVIDIA GTC 2018
TENSOR LAYERS FOR COMPRESSION OF DEEP LEARNING NETWORKS Cris Cecka Senior Research Scientist, NVIDIA GTC 2018 Tensors Computations and the GPU AGENDA Tensor Networks and Decompositions Tensor Layers in
More information