Code Generation for GPU Accelerators in the Domain of Image Preprocessing
|
|
- Amy Wilkinson
- 5 years ago
- Views:
Transcription
1 Code Generation for GPU Accelerators in the Domain of Image Preprocessing Oliver Reiche, Richard Membarth, Frank Hannig, and Jürgen Teich Hardware/Software Co-Design, University of Erlangen-Nuremberg Dagstuhl, April 4, 2013
2 Motivation: Medical Image Preprocessing Keep X-ray dosage and contrast agent as low as possible noisy images. Improve quality of medical images... remove noise detect edges compensate for detector defects... an implementation must be as efficient as possible. most algorithms are well known. What for do we need code generation?
3 Challenge: How To Target Multiple Architectures? Efficient code generation for different target architectures. Domain-specific Languages performance portable: high performance on different target hardware competitive: comparable performance to hand-written code productivity algorithm description at a high-level hide low-level details from programmer portability support different target architectures from the same algorithm description support different target languages from the same algorithm description Domain-specific languages offer both functional- and perfomance-portability.
4 Agenda HIPA cc Results Summary
5 HIPAcc
6 HIPAcc: The Heterogeneous Image Processing Acceleration Framework C++ embedded DSL Domain Source-to-Source Compiler Clang/LLVM Knowledge Architecture Knowledge CUDA OpenCL C/C++ Renderscript (GPU) (x86/gpu) (x86) (x86/arm/gpu) CUDA/OpenCL/Renderscript Runtime Library 3
7 Domain Analysis: Image Processing Kernel Categorization Identified three groups of kernels: Point operators [HPPC 11] each pixel is updated uninfluential of other pixels Local operators [IPDPS 12] centered at the pixel it is applied to [0,0] bounded to the neighborhood [ m,+m] [ n,+n] operator can be applied in parallel Global operators [ISPDC 12] pixels of the whole image contribute to result for instance, reduction operators [HPPC 11] Richard Membarth, Anton Lokhmotov, and Jürgen Teich. Generating GPU Code from a High-level Representation for Image Processing Kernels. In: Proceedings of the 5th Workshop on Highly Parallel Processing on a Chip (HPPC). Springer. Bordeaux, France, Aug. 30, 2011, pp DOI: / _31. [IPDPS 12] Richard Membarth et al. Generating Device-specific GPU Code for Local Operators in Medical Imaging. In: Proceedings of the 26th IEEE International Parallel & Distributed Processing Symposium (IPDPS). IEEE. Shanghai, China, May 21 25, 2012, pp DOI: /IPDPS [ISPDC 12] Richard Membarth et al. Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators based on a Domain-Specific Language for Medical Imaging. In: Proceedings of the 11th International Symposium on Parallel and Distributed Computing (ISPDC). IEEE. Munich, Germany, June 25 29, 2012, pp DOI: /ISPDC
8 HIPA cc : The Heterogeneous Image Processing Acceleration Framework Domain-specific Extensions IterationSpace defines ROI of the output image Accessor input ROI with filtering (nearest, bilinear, bicubic,... ) BoundaryCondition boundary handling modes Mask convolution mask Output image. Crop of output image. Crop of output image with offset. 5
9 HIPA cc : The Heterogeneous Image Processing Acceleration Framework Domain-specific Extensions IterationSpace defines ROI of the output image Accessor input ROI with filtering (nearest, bilinear, bicubic,... ) BoundaryCondition boundary handling modes Mask convolution mask Image and boundary. Image crop. Image crop with offset. Image offset. 5
10 HIPA cc : The Heterogeneous Image Processing Acceleration Framework Domain-specific Extensions IterationSpace defines ROI of the output image Accessor input ROI with filtering (nearest, bilinear, bicubic,... ) BoundaryCondition boundary handling modes Mask convolution mask F G H J K L N O P B C D F G H J K L N O P B C D F G H J K L E F G H I J K L M N O P A B C D E F G H I J K L M N O P A B C D E F G H I J K L Repeat E F G I J K M N O A B C E F G I J K M N O A B C E F G I J K A A A A A A A A A A A A E E E I I I M M M M M M M M M M M M A B C D A B C D A B C D A B C D E F G H I J K L M N O P M N O P M N O P M N O P Clamp D D D D D D D D D D D D H H H L L L P P P P P P P P P P P P K G C J F B I E A C B A G F E K J I O N M E I M F J N G K O I J K L E F G H A B C D A B C D E F G H I J K L M N O P M N O P I J K L E F G H Mirror B F J C G K D H L D C B H G F L K J P O N P L H O K G N J F Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q A B C D Q Q Q Q Q Q E F G H Q Q Q Q Q Q I J K L Q Q Q Q Q Q M N O P Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Q Constant 5
11 HIPA cc : The Heterogeneous Image Processing Acceleration Framework Domain-specific Extensions IterationSpace defines ROI of the output image Accessor input ROI with filtering (nearest, bilinear, bicubic,... ) BoundaryCondition boundary handling modes Mask convolution mask f(x,y) x y
12 HIPA cc Example: Gaussian Blur 1 /*... */ 2 Image <uchar > in(width, height ); 3 Image <float > out(width, height ); 4 Mask <float > mask (size, size ); 5 6 in = in_image ; 7 out = out_image ; 8 mask = filter_mask ; 9 10 BoundaryCondition bound (in, mask, BOUNDARY_CLAMP ); AccessorLF < uchar > acc( bound, width, height, 0, 0); IterationSpace <float > iter (out, width /2, height /2, width /4, height /4); GaussianBlur filter ( iter, acc, mask, size /2); 17 filter.execute (); out_image = out ; 6
13 HIPA cc Example: Gaussian Blur Kernel 1 class GaussianBlur : public Kernel < float > { 2 Mask <float > mask ; 3 Accessor < uchar > input ; 4 size_t range ; 5 6 public : 7 GaussianBlur ( IterationSpace < float > iter, Accessor < uchar > acc, 8 Mask < float > mask, size_t range ) 9 : Kernel ( iter ), input (acc), mask ( mask ), range ( range ) { 10 addaccessor ( acc ); 11 } void kernel () { 14 float sum =.0f; 15 for ( int yf = - range ; yf <= range ; ++ yf) 16 for ( int xf = - range ; xf <= range ; ++ xf) 17 sum += input (xf, yf) * mask (xf, yf); 18 output () = sum; 19 } 20 }; 7
14 HIPA cc Example: Gaussian Blur Kernel + Lambda Function 1 class GaussianBlur : public Kernel < float > { 2 Mask <float > mask ; 3 Accessor < uchar > input ; 4 size_t range ; 5 6 public : 7 GaussianBlur ( IterationSpace < float > iter, Accessor < uchar > acc, 8 Mask < float > mask, size_t range ) 9 : Kernel ( iter ), input (acc), mask ( mask ), range ( range ) { 10 addaccessor ( acc ); 11 } 12 Lambda function for convolution 13 void kernel () { 14 output () = convolve (mask, HipaccSUM, [&]() { 15 return input ( mask ) * mask (); 16 }); 17 } 18 }; 7
15 Efficient Code Generation for Boundary Handling A A A B C D A B C D D D A A A B C D A B C D D D A A A B C D A B C D D D BH_TL BH_T BH_TR E E E F G H E F G H H H I I I J K L I J K L L L M M M N O P M N O P P P BH_L BH_NO BH_R A A A B C D A B C D D D E E E F G H E F G H H H I I I J K L BH_BL I BH_B J K L BH_BR L L M M M N O P M N O P P P M M M N O P M N O P P P generates 10 different code variants minimize executed conditionals minimize divergence block index determines code variant limit necessary boundary handling with respect to mask size and image padding M M M N O P M N O P P P 8
16 Mapping of GPU Memory Accesses Image Preprocessing mostly load compute store memory bound Architecture Model Memory Type global memory constant memory texture memory surface memory local memory Optimizations memory access alignment unrolling target (e. g., Kepler35, SouthernIsland, Midgard,... ) 9
17 Results
18 Results Gaussian Blur 5 5 (separated) 12 vs. 238 lines of CUDA code Undef. Clamp Repeat Mirror Const. naïve crash OpenCV n/a RapidMind crash n/a Halide n/a 8.93 n/a n/a n/a NPP (8-bit) 6.86 n/a n/a n/a n/a HIPA cc CUDA Image of pixels on a Tesla C2050. Times in ms. Bilateral Grid Filter 62 vs. 386 lines of CUDA code GTX 680 i Handtuned CUDA n/a Halide HIPA cc CUDA n/a HIPA cc OpenCL Image of pixels. Times in ms. 10
19 Summary
20 Summary HIPA cc domain-specific language for image preprocessing optimizations tailored to application domain architecture model for GPU accelerators target-specific code generator for CUDA and OpenCL transformations based on domain knowledge architecture information provides: performance, productivity and portability Recent Work three new target backends Renderscript Renderscript GPU Filterscript support for Midgard architecture (ARM Mali T604) 11
21 Questions? HIPA cc framework sources released under Simplified BSD License. 12
22 Results: Gaussian Blur, 5 5 window, pixel Samsung Exynos 5 execution time [ms] CPU GPU CL RS FS CV SI Qualcomm Snapdragon S4 Pro execution time [ms] CPU GPU RS FS CV SI 13
A CUDA Solver for Helmholtz Equation
Journal of Computational Information Systems 11: 24 (2015) 7805 7812 Available at http://www.jofcis.com A CUDA Solver for Helmholtz Equation Mingming REN 1,2,, Xiaoguang LIU 1,2, Gang WANG 1,2 1 College
More informationarxiv: v1 [hep-lat] 7 Oct 2010
arxiv:.486v [hep-lat] 7 Oct 2 Nuno Cardoso CFTP, Instituto Superior Técnico E-mail: nunocardoso@cftp.ist.utl.pt Pedro Bicudo CFTP, Instituto Superior Técnico E-mail: bicudo@ist.utl.pt We discuss the CUDA
More informationOn Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code
On Portability, Performance and Scalability of a MPI OpenCL Lattice Boltzmann Code E Calore, S F Schifano, R Tripiccione Enrico Calore INFN Ferrara, Italy 7 th Workshop on UnConventional High Performance
More informationAntti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA
S7255: CUTT: A HIGH- PERFORMANCE TENSOR TRANSPOSE LIBRARY FOR GPUS Antti-Pekka Hynninen, 5/10/2017, GTC2017, San Jose CA MOTIVATION Tensor contractions are the most computationally intensive part of quantum
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationGPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic
GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago
More informationDynamic Scheduling for Work Agglomeration on Heterogeneous Clusters
Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed
More information11 Parallel programming models
237 // Program Design 10.3 Assessing parallel programs 11 Parallel programming models Many different models for expressing parallelism in programming languages Actor model Erlang Scala Coordination languages
More informationThe Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology
The Panel: What does the future look like for NPW application development? 17 th ECMWF Workshop on High Performance Computing in Meteorology 16:00-17:30 27 October 2016 Panelists John Michalakes (UCAR,
More informationAutomatic Star-tracker Optimization Framework. Andrew Tennenbaum The State University of New York at Buffalo
SSC17-VIII-6 Automatic Star-tracker Optimization Framework Andrew Tennenbaum The State University of New York at Buffalo aztennen@buffalo.edu Faculty Advisor: John Crassidis The State University of New
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationAccelerating Quantum Chromodynamics Calculations with GPUs
Accelerating Quantum Chromodynamics Calculations with GPUs Guochun Shi, Steven Gottlieb, Aaron Torok, Volodymyr Kindratenko NCSA & Indiana University National Center for Supercomputing Applications University
More informationUsing a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics
Using a CUDA-Accelerated PGAS Model on a GPU Cluster for Bioinformatics Jorge González-Domínguez Parallel and Distributed Architectures Group Johannes Gutenberg University of Mainz, Germany j.gonzalez@uni-mainz.de
More informationsri 2D Implicit Charge- and Energy- Conserving Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy
2D Implicit Charge- and Energy- Conserving sri Particle-in-cell Application Using CUDA Christopher Leibs Karthik Murthy Mentors Dana Knoll and Allen McPherson IS&T CoDesign Summer School 2012, Los Alamos
More informationEfficient algorithms for symmetric tensor contractions
Efficient algorithms for symmetric tensor contractions Edgar Solomonik 1 Department of EECS, UC Berkeley Oct 22, 2013 1 / 42 Edgar Solomonik Symmetric tensor contractions 1/ 42 Motivation The goal is to
More informationResearch on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method
NUCLEAR SCIENCE AND TECHNIQUES 25, 0501 (14) Research on GPU-accelerated algorithm in 3D finite difference neutron diffusion calculation method XU Qi ( 徐琪 ), 1, YU Gang-Lin ( 余纲林 ), 1 WANG Kan ( 王侃 ),
More informationIntroduction to numerical computations on the GPU
Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming
More informationIntroduction GNURadio software implementation RFNoC implementation The End. RFNoC: fosphor. How to apply RFNoC to RTSA display acceleration
How to apply RFNoC to RTSA display acceleration FOSDEM 2015, February 1st, 2015 About the speaker Linux and free software enthusiast since 1999 M.Sc. in C.S. + some E.E. General orientation towards low
More informationHydra: Generation and Tuning of parallel solutions for linear algebra equations. Alexandre X. Duchâteau University of Illinois at Urbana Champaign
Hydra: Generation and Tuning of parallel solutions for linear algebra equations Alexandre X. Duchâteau University of Illinois at Urbana Champaign Collaborators Thesis Advisors Denis Barthou (Labri/INRIA
More informationLecture 3: Linear Filters
Lecture 3: Linear Filters Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Images as functions Linear systems (filters) Convolution and correlation Discrete Fourier Transform (DFT)
More informationHistogram Processing
Histogram Processing The histogram of a digital image with gray levels in the range [0,L-] is a discrete function h ( r k ) = n k where r k n k = k th gray level = number of pixels in the image having
More informationGPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications
GPU Acceleration of Cutoff Pair Potentials for Molecular Modeling Applications Christopher Rodrigues, David J. Hardy, John E. Stone, Klaus Schulten, Wen-Mei W. Hwu University of Illinois at Urbana-Champaign
More informationS XMP LIBRARY INTERNALS. Niall Emmart University of Massachusetts. Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library
S6349 - XMP LIBRARY INTERNALS Niall Emmart University of Massachusetts Follow on to S6151 XMP: An NVIDIA CUDA Accelerated Big Integer Library High Performance Modular Exponentiation A^K mod P Where A,
More informationA Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters
A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!
More informationCOMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD
XVIII International Conference on Water Resources CMWR 2010 J. Carrera (Ed) c CIMNE, Barcelona, 2010 COMPARISON OF CPU AND GPU IMPLEMENTATIONS OF THE LATTICE BOLTZMANN METHOD James.E. McClure, Jan F. Prins
More informationLecture 3: Linear Filters
Lecture 3: Linear Filters Professor Fei Fei Li Stanford Vision Lab 1 What we will learn today? Images as functions Linear systems (filters) Convolution and correlation Discrete Fourier Transform (DFT)
More informationHybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS
Hybrid CPU/GPU Acceleration of Detection of 2-SNP Epistatic Interactions in GWAS Jorge González-Domínguez*, Bertil Schmidt*, Jan C. Kässens**, Lars Wienbrandt** *Parallel and Distributed Architectures
More informationMarawacc: A Framework for Heterogeneous Computing in Java
: A Framework for Heterogeneous Computing in Java Juan Fumero, Michel Steuwer, Christophe Dubach Code The University of Edinburgh UK Many-Core Developer Conference 2016 1 / 23 Code 2 / 23 Code 3 / 23 Code
More informationImage Filtering. Slides, adapted from. Steve Seitz and Rick Szeliski, U.Washington
Image Filtering Slides, adapted from Steve Seitz and Rick Szeliski, U.Washington The power of blur All is Vanity by Charles Allen Gillbert (1873-1929) Harmon LD & JuleszB (1973) The recognition of faces.
More informationLoG Blob Finding and Scale. Scale Selection. Blobs (and scale selection) Achieving scale covariance. Blob detection in 2D. Blob detection in 2D
Achieving scale covariance Blobs (and scale selection) Goal: independently detect corresponding regions in scaled versions of the same image Need scale selection mechanism for finding characteristic region
More informationArtificial Neural Networks D B M G. Data Base and Data Mining Group of Politecnico di Torino. Elena Baralis. Politecnico di Torino
Artificial Neural Networks Data Base and Data Mining Group of Politecnico di Torino Elena Baralis Politecnico di Torino Artificial Neural Networks Inspired to the structure of the human brain Neurons as
More informationDeep Learning. Convolutional Neural Networks Applications
Deep Learning Using a Convolutional Neural Network Dr. Ing. Morris Riedel Adjunct Associated Professor School of Engineering and Natural Sciences, University of Iceland Research Group Leader, Juelich Supercomputing
More informationThe Fast Multipole Method in molecular dynamics
The Fast Multipole Method in molecular dynamics Berk Hess KTH Royal Institute of Technology, Stockholm, Sweden ADAC6 workshop Zurich, 20-06-2018 Slide BioExcel Slide Molecular Dynamics of biomolecules
More informationPanorama des modèles et outils de programmation parallèle
Panorama des modèles et outils de programmation parallèle Sylvain HENRY sylvain.henry@inria.fr University of Bordeaux - LaBRI - Inria - ENSEIRB April 19th, 2013 1/45 Outline Introduction Accelerators &
More informationTips Geared Towards R. Adam J. Suarez. Arpil 10, 2015
Tips Geared Towards R Departments of Statistics North Carolina State University Arpil 10, 2015 1 / 30 Advantages of R As an interpretive and interactive language, developing an algorithm in R can be done
More informationEmbedded MCSoC Architecture and Period-Peak Detection (PPD) Algorithm for ECG/EKG Processing
The 19 th Intelligent System Symposium (FAN2009), Aizu-Wakamatsu, Sep.17-18, 2009 Embedded MCSoC Architecture and Period-Peak Detection (PPD) Algorithm for ECG/EKG Processing Yasuyoshi Haga, Abderazek
More informationAchieving scale covariance
Achieving scale covariance Goal: independently detect corresponding regions in scaled versions of the same image Need scale selection mechanism for finding characteristic region size that is covariant
More informationPractical Free-Start Collision Attacks on 76-step SHA-1
Practical Free-Start Collision Attacks on 76-step SHA-1 Inria and École polytechnique, France Nanyang Technological University, Singapore Joint work with Thomas Peyrin and Marc Stevens CWI, Amsterdam 2015
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationAccelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers
UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric
More informationScalable and Power-Efficient Data Mining Kernels
Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the
More informationDigital Image Processing COSC 6380/4393
Digital Image Processing COSC 6380/4393 Lecture 11 Oct 3 rd, 2017 Pranav Mantini Slides from Dr. Shishir K Shah, and Frank Liu Review: 2D Discrete Fourier Transform If I is an image of size N then Sin
More informationLaplacian Filters. Sobel Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters. Laplacian Filters
Sobel Filters Note that smoothing the image before applying a Sobel filter typically gives better results. Even thresholding the Sobel filtered image cannot usually create precise, i.e., -pixel wide, edges.
More informationReview Smoothing Spatial Filters Sharpening Spatial Filters. Spatial Filtering. Dr. Praveen Sankaran. Department of ECE NIT Calicut.
Spatial Filtering Dr. Praveen Sankaran Department of ECE NIT Calicut January 7, 203 Outline 2 Linear Nonlinear 3 Spatial Domain Refers to the image plane itself. Direct manipulation of image pixels. Figure:
More informationQuantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm)
Quantum Computer Simulation Using CUDA (Quantum Fourier Transform Algorithm) Alexander Smith & Khashayar Khavari Department of Electrical and Computer Engineering University of Toronto April 15, 2009 Alexander
More informationGPU Accelerated Markov Decision Processes in Crowd Simulation
GPU Accelerated Markov Decision Processes in Crowd Simulation Sergio Ruiz Computer Science Department Tecnológico de Monterrey, CCM Mexico City, México sergio.ruiz.loza@itesm.mx Benjamín Hernández National
More informationA model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)
A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal
More informationOptimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks
2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks Yufei Ma, Yu Cao, Sarma Vrudhula,
More informationCENG 783. Special topics in. Deep Learning. AlchemyAPI. Week 8. Sinan Kalkan
CENG 783 Special topics in Deep Learning AlchemyAPI Week 8 Sinan Kalkan Loss functions Many correct labels case: Binary prediction for each label, independently: L i = σ j max 0, 1 y ij f j y ij = +1 if
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More informationTwo case studies of Monte Carlo simulation on GPU
Two case studies of Monte Carlo simulation on GPU National Institute for Computational Sciences University of Tennessee Seminar series on HPC, Feb. 27, 2014 Outline 1 Introduction 2 Discrete energy lattice
More informationColorado School of Mines Image and Multidimensional Signal Processing
Image and Multidimensional Signal Processing Professor William Hoff Department of Electrical Engineering and Computer Science Spatial Filtering Main idea Spatial filtering Define a neighborhood of a pixel
More informationEfficient Molecular Dynamics on Heterogeneous Architectures in GROMACS
Efficient Molecular Dynamics on Heterogeneous Architectures in GROMACS Berk Hess, Szilárd Páll KTH Royal Institute of Technology GTC 2012 GROMACS: fast, scalable, free Classical molecular dynamics package
More informationStrassen s Algorithm for Tensor Contraction
Strassen s Algorithm for Tensor Contraction Jianyu Huang, Devin A. Matthews, Robert A. van de Geijn The University of Texas at Austin September 14-15, 2017 Tensor Computation Workshop Flatiron Institute,
More informationSunrise: Patrik Jonsson. Panchromatic SED Models of Simulated Galaxies. Lecture 2: Working with Sunrise. Harvard-Smithsonian Center for Astrophysics
Sunrise: Panchromatic SED Models of Simulated Galaxies Lecture 2: Working with Sunrise Patrik Jonsson Harvard-Smithsonian Center for Astrophysics Lecture outline Lecture 1: Why Sunrise? What does it do?
More informationSajid Anwar, Kyuyeon Hwang and Wonyong Sung
Sajid Anwar, Kyuyeon Hwang and Wonyong Sung Department of Electrical and Computer Engineering Seoul National University Seoul, 08826 Korea Email: sajid@dsp.snu.ac.kr, khwang@dsp.snu.ac.kr, wysung@snu.ac.kr
More informationCOMPLETE FEATURE COMPARISON LIST
COMPLETE FEATURE COMPARISON LIST General Moho Debut Moho Pro Advanced Bone Rigging Ultimate Bone Rigging Smart Bones Read Only Frame-By-Frame Animation Read Only Bezier Handles optimized for animation
More informationS Subdivide, Preprocess and Conquer: Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems
S4283 - Subdivide, : Micromagnetism FEM/BEM-Simulations on Single-Node/Multi-GPU Systems Elmar Westphal - Forschungszentrum Jülich GmbH 1 Contents Micromagnetism TetraMag, a FEM/BEM Micromagnetism Simulator
More informationGPU accelerated Arnoldi solver for small batched matrix
15. 09. 22 GPU accelerated Arnoldi solver for small batched matrix Samsung Advanced Institute of Technology Hyung-Jin Kim Contents - Eigen value problems - Solution - Arnoldi Algorithm - Target - CUDA
More informationStability of Recursive Gaussian Filtering for Piecewise Linear Bilateral Filtering
Stability of Recursive Gaussian Filtering for Piecewise Linear Bilateral Filtering Koichiro Watanabe, Yoshihiro Maeda, and Norishige Fukushima Nagoya Institute of Technology, Nagoya, Japan fukushima@nitech.ac.jp
More informationMODIFIED CENTRAL WEIGHTED VECTOR MEDIAN FILTER 1. STANDARD NOISE REDUCTION FILTERS
JOUNAL OF MEDICAL INFOMATICS & TECHNOLOGIES Vol.3/22, ISSN 1642-637 Bogdan SMOLKA * vector median filter, image enhancement, noise suppression MODIFIED CENTAL WEIGHTED VECTO MEDIAN FILTE A new filtering
More informationLinear Diffusion and Image Processing. Outline
Outline Linear Diffusion and Image Processing Fourier Transform Convolution Image Restoration: Linear Filtering Diffusion Processes for Noise Filtering linear scale space theory Gauss-Laplace pyramid for
More informationSampling in 1D ( ) Continuous time signal f(t) Discrete time signal. f(t) comb
Sampling in 2D 1 Sampling in 1D Continuous time signal f(t) Discrete time signal t ( ) f [ k] = f( kt ) = f( t) δ t kt s k s f(t) comb k 2 Nyquist theorem (1D) At least 2 sample/period are needed to represent
More informationTensorFlow: A Framework for Scalable Machine Learning
TensorFlow: A Framework for Scalable Machine Learning You probably Outline want to know... What is TensorFlow? Why did we create TensorFlow? How does Tensorflow Work? Example: Linear Regression Example:
More informationAnnotation Integration and Trade-off Analysis for Multimedia Applications
Annotation Integration and Trade-off Analysis for Multimedia Applications Radu Cornea, Alex Nicolau, Nikil Dutt School of Information & Computer Science University of California, Irvine Introduction and
More informationAuto-Tuning Complex Array Layouts for GPUs - Supplemental Material
BIN COUNT EGPGV,. This is the author version of the work. It is posted here by permission of Eurographics for your personal use. Not for redistribution. The definitive version is available at http://diglib.eg.org/.
More informationAdaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs
Adaptive Heterogeneous Computing with OpenCL: Harnessing hundreds of GPUs and CPUs Simon McIntosh-Smith simonm@cs.bris.ac.uk Head of Microelectronics Research University of Bristol, UK 1 ! Collaborators
More informationP214 Efficient Computation of Passive Seismic Interferometry
P214 Efficient Computation of Passive Seismic Interferometry J.W. Thorbecke* (Delft University of Technology) & G.G. Drijkoningen (Delft University of Technology) SUMMARY Seismic interferometry is from
More informationECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University
ECEN 651: Microprogrammed Control of Digital Systems Department of Electrical and Computer Engineering Texas A&M University Prof. Mi Lu TA: Ehsan Rohani Laboratory Exercise #4 MIPS Assembly and Simulation
More informationEfficient multigrid solvers for mixed finite element discretisations in NWP models
1/20 Efficient multigrid solvers for mixed finite element discretisations in NWP models Colin Cotter, David Ham, Lawrence Mitchell, Eike Hermann Müller *, Robert Scheichl * * University of Bath, Imperial
More informationKalman filtering with intermittent heavy tailed observations
Kalman filtering with intermittent heavy tailed observations Sabina Zejnilović Abstract In large wireless sensor networks, data can experience loss and significant delay which from the aspect of control
More informationParallel Sparse Tensor Decompositions using HiCOO Format
Figure sources: A brief survey of tensors by Berton Earnshaw and NVIDIA Tensor Cores Parallel Sparse Tensor Decompositions using HiCOO Format Jiajia Li, Jee Choi, Richard Vuduc May 8, 8 @ SIAM ALA 8 Outline
More informationJulian Merten. GPU Computing and Alternative Architecture
Future Directions of Cosmological Simulations / Edinburgh 1 / 16 Julian Merten GPU Computing and Alternative Architecture Institut für Theoretische Astrophysik Zentrum für Astronomie Universität Heidelberg
More informationParallelism of MRT Lattice Boltzmann Method based on Multi-GPUs
Parallelism of MRT Lattice Boltzmann Method based on Multi-GPUs 1 School of Information Engineering, China University of Geosciences (Beijing) Beijing, 100083, China E-mail: Yaolk1119@icloud.com Ailan
More informationWhat is Image Deblurring?
What is Image Deblurring? When we use a camera, we want the recorded image to be a faithful representation of the scene that we see but every image is more or less blurry, depending on the circumstances.
More informationAccelerating Model Reduction of Large Linear Systems with Graphics Processors
Accelerating Model Reduction of Large Linear Systems with Graphics Processors P. Benner 1, P. Ezzatti 2, D. Kressner 3, E.S. Quintana-Ortí 4, Alfredo Remón 4 1 Max-Plank-Institute for Dynamics of Complex
More informationHigh-performance processing and development with Madagascar. July 24, 2010 Madagascar development team
High-performance processing and development with Madagascar July 24, 2010 Madagascar development team Outline 1 HPC terminology and frameworks 2 Utilizing data parallelism 3 HPC development with Madagascar
More informationLinear Diffusion. E9 242 STIP- R. Venkatesh Babu IISc
Linear Diffusion Derivation of Heat equation Consider a 2D hot plate with Initial temperature profile I 0 (x, y) Uniform (isotropic) conduction coefficient c Unit thickness (along z) Problem: What is temperature
More informationLevel-3 BLAS on a GPU
Level-3 BLAS on a GPU Picking the Low Hanging Fruit Francisco Igual 1 Gregorio Quintana-Ortí 1 Robert A. van de Geijn 2 1 Departamento de Ingeniería y Ciencia de los Computadores. University Jaume I. Castellón
More informationBeam dynamics calculation
September 6 Beam dynamics calculation S.B. Vorozhtsov, Е.Е. Perepelkin and V.L. Smirnov Dubna, JINR http://parallel-compute.com Outline Problem formulation Numerical methods OpenMP and CUDA realization
More informationSolving PDEs with CUDA Jonathan Cohen
Solving PDEs with CUDA Jonathan Cohen jocohen@nvidia.com NVIDIA Research PDEs (Partial Differential Equations) Big topic Some common strategies Focus on one type of PDE in this talk Poisson Equation Linear
More informationPDEs in Image Processing, Tutorials
PDEs in Image Processing, Tutorials Markus Grasmair Vienna, Winter Term 2010 2011 Direct Methods Let X be a topological space and R: X R {+ } some functional. following definitions: The mapping R is lower
More informationSolving RODEs on GPU clusters
HIGH TEA @ SCIENCE Solving RODEs on GPU clusters Christoph Riesinger Technische Universität München March 4, 206 HIGH TEA @ SCIENCE, March 4, 206 Motivation - Parallel Computing HIGH TEA @ SCIENCE, March
More informationRandomized Selection on the GPU. Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory
Randomized Selection on the GPU Laura Monroe, Joanne Wendelberger, Sarah Michalak Los Alamos National Laboratory High Performance Graphics 2011 August 6, 2011 Top k Selection on GPU Output the top k keys
More informationConvolutional Neural Networks
Convolutional Neural Networks Books» http://www.deeplearningbook.org/ Books http://neuralnetworksanddeeplearning.com/.org/ reviews» http://www.deeplearningbook.org/contents/linear_algebra.html» http://www.deeplearningbook.org/contents/prob.html»
More informationPractical Free-Start Collision Attacks on full SHA-1
Practical Free-Start Collision Attacks on full SHA-1 Inria and École polytechnique, France Nanyang Technological University, Singapore Joint work with Thomas Peyrin and Marc Stevens Séminaire Cryptologie
More informationMAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors
MAGMA MIC 1.0: Linear Algebra Library for Intel Xeon Phi Coprocessors J. Dongarra, M. Gates, A. Haidar, Y. Jia, K. Kabir, P. Luszczek, and S. Tomov University of Tennessee, Knoxville 05 / 03 / 2013 MAGMA:
More informationMultimedia Databases. Previous Lecture. 4.1 Multiresolution Analysis. 4 Shape-based Features. 4.1 Multiresolution Analysis
Previous Lecture Multimedia Databases Texture-Based Image Retrieval Low Level Features Tamura Measure, Random Field Model High-Level Features Fourier-Transform, Wavelets Wolf-Tilo Balke Silviu Homoceanu
More informationParallel Longest Common Subsequence using Graphics Hardware
Parallel Longest Common Subsequence using Graphics Hardware John Kloetzli rian Strege Jonathan Decker Dr. Marc Olano Presented by: rian Strege 1 Overview Introduction Problem Statement ackground and Related
More informationLeone B.Bosi, Loriano Storchi et al. CCR Workshop Stato e Prospettive del Calcolo Scientifico
Leone B.Bosi, Loriano Storchi et al MaCGO Project - Einstein Telescope Project INFN Perugia CCR Workshop 2011 - Stato e Prospettive del Calcolo Scientifico Laboratori Nazionali di Legnaro 16-18 Febbraio
More informationSpatially adaptive alpha-rooting in BM3D sharpening
Spatially adaptive alpha-rooting in BM3D sharpening Markku Mäkitalo and Alessandro Foi Department of Signal Processing, Tampere University of Technology, P.O. Box FIN-553, 33101, Tampere, Finland e-mail:
More informationTobias Markus. January 21, 2015
Automata Advanced Seminar Computer Engineering January 21, 2015 (Advanced Seminar Computer Engineering ) Automata January 21, 2015 1 / 35 1 2 3 4 5 6 obias Markus (Advanced Seminar Computer Engineering
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationMachine Learning for Gravitational Wave signals classification in LIGO and Virgo
Machine Learning for Gravitational Wave signals classification in LIGO and Virgo Elena Cuoco European Gravitational Observatory www.elenacuoco.com @elenacuoco 2 About me About me Working as Data Analyst
More informationJonghwa Lee assistant engineer Samsung Electronics
Jonghwa Lee assistant engineer Samsung Electronics Contents Generic Thermal Framework Thermal zone device Cooling device Binding & Thermal instance Governors SYSFS interfaces Thermal management CPU Cooling
More informationMultimedia Databases. Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig
Multimedia Databases Wolf-Tilo Balke Philipp Wille Institut für Informationssysteme Technische Universität Braunschweig http://www.ifis.cs.tu-bs.de 4 Previous Lecture Texture-Based Image Retrieval Low
More informationOn the computation of the reciprocal of floating point expansions using an adapted Newton-Raphson iteration
On the computation of the reciprocal of floating point expansions using an adapted Newton-Raphson iteration Mioara Joldes, Valentina Popescu, Jean-Michel Muller ASAP 2014 1 / 10 Motivation Numerical problems
More informationCSE 473/573 Computer Vision and Image Processing (CVIP)
CSE 473/573 Computer Vision and Image Processing (CVIP) Ifeoma Nwogu inwogu@buffalo.edu Lecture 11 Local Features 1 Schedule Last class We started local features Today More on local features Readings for
More informationIMAGE ENHANCEMENT II (CONVOLUTION)
MOTIVATION Recorded images often exhibit problems such as: blurry noisy Image enhancement aims to improve visual quality Cosmetic processing Usually empirical techniques, with ad hoc parameters ( whatever
More information