SLIM. University of British Columbia
|
|
- Magdalene Boone
- 5 years ago
- Views:
Transcription
1 Accelerating an Iterative Helmholtz Solver Using Reconfigurable Hardware Art Petrenko M.Sc. Defence, April 9, 2014 Seismic Laboratory for Imaging and Modelling Department of Earth, Ocean and Atmospheric Sciences, UBC SLIM University of British Columbia 1
2 Oh by the way: I have a stutter. 2
3 Seismic Wave Simulation 3
4 Seismic Exploration for Oil and Gas 4
5 Full-waveform Inversion Seismic Wavefield (u) Earth model (m) 5
6 Full-waveform inversion is SLOW 6
7 The Accelerators Have Arrived Top 10 of Top 500 Supercomputers
8 FPGAs: Reconfigurable Hardware Accelerators 8
9 The Punchline 9
10 Modelling Seismic Waves Mathematical Formulation 10
11 s medium can be written as Modelling Seismic Waves: The Wave Equation 1 2 Ê m+ 2 u = q, z equation. As written above, the Helmho Frequency Laplacian Earth Model Seismic Source (right- hand side) Wavefield (iterate) t density isotropic medium which only su are modelled heuristically by allowing m t 11
12 Modelling Seismic Waves: Discretization [Operto, 2007] A(m, Ê)u = q 12
13 Solving the Helmholtz System 13
14 The Kaczmarz Algorithm [Kaczmarz, 1937] DKSWP(A, u, q, ) Adapted from [van Leeuwen, 2012] 14
15 The Kaczmarz Algorithm: Equivalent to SSOR-NE [Björck and Elfving, 1979] Double Kaczmarz sweep on the original system: Au = q One iteration of SSOR on the normal equations: AA y = q A y = u Both are computed as: u k+1 = u k + (b i ha i, u k i) a i ka i k 2 k :1! 2N i :1! N,N! 1 15
16 Kaczmarz + CG = CGMN [Björck & Elfving 1979] 16
17 CGMN: Solves for Fixed Point of Kaczmarz Row Projections DKSWP(A, u, q, ) =Q 1 Q N Q N Q 1 u + Rq = Qu + Rq. Assume u is a solution and re-arrange: (I Q)u = Rq, 17
18 Contribution of This Work 18
19 Compute Node Overview [Maxeler Technologies, 2011] Algorithm 1 CGMN (Björck and Elfving [4]) Input: A, u, q, 1: Rq Ω DKSWP(A, 0, q, ) 2: r Ω Rq u +DKSWP(A, u, 0, ) 3: p Ω r 4: while ÎrÎ 2 > tol do 5: s Ω (I Q)p = p DKSWP(A, p, 0, ) 6: ΩÎrÎ 2 / Èp, sí 7: u Ω u + p 8: r Ω r s Adapted from [Pell, 2013] 9: ΩÎrÎ 2 curr / ÎrÎ2 prev 10: ÎrÎ 2 prev ΩÎrÎ2 curr 11: p Ω r + p 12: end while Output: u Kernel: running on accelerator 19
20 Low levels of abstraction are scary 20
21 Design at high level of abstraction 21
22 Implementation Details 22
23 Layout of 3D Wavefields in 1D Memory 23
24 Buffering: Overcoming Latency of Memory Access 24
25 Pipelining: Overcoming Latency of Computation 25
26 Pipelining: Overcoming Latency of Computation 26
27 Pipelining: Overcoming Latency of Computation 27
28 Memory Access: 384 bytes / burst Number of bits in a real number Number of bits in a complex number Complex numbers per burst (single precision) (double precision)
29 Backward Sweep: Double Buffering 29
30 Number Representation Matrix row storage efficiency bits used in this work Number of bits in a real number 30
31 Results 31
32 End-to-end Execution Time one Intel Xeon E core Intel core + FPGA- based accelerator 32
33 Kaczmarz Sweeps: No Longer the Bottleneck other CGMN operations (inner products, vector addition, etc.) Kaczmarz sweeps 33
34 Effect of matrix row ordering on CGMN convergence Sequential (1 to N) Accelerator ordering 34
35 FPGA Resource Usage 35
36 Recent Work: Multiple Kaczmarz Sweeps / CGMN Iteration (432 x 240 x 25 system) Total Kaczmarz sweeps on accelerator CGMN on CPU host 36
37 Avoiding Future Communication Bottlenecks 38.4 GB/s: memory bandwidth limit 2 GB/s: PCIe bandwidth limit 37
38 The Next Step Problem: On-chip memory (4 MB) limits block size to 300 x 300 in the two faster dimensions. Solution: Implement domain decomposition for larger systems. 38
39 Straight-forward Extension Goal: Systematically use all 4 accelerators. Solution: Solve several forward problems at once. 39
40 Future Work Problem: Kaczmarz sweeps now account for only approximately 10% of CGMN time. Solution: Port all of CGMN to the DFE. 40
41 Future Work Fact: Reading A from memory limits optimizations like increasing FPGA frequency. Result: Read only earth model m and generate A on the DFE. 41
42 Future Work Problem: Domain size limited by memory size: 24 GB. Solution: Parallelize CGMN to CARP-CG [Gordon & Gordon, 2010]. 42
43 Conclusion Have implemented frequency-domain wave simulation using reconfigurable hardware. A speed-up of 2 x 1 Intel Xeon core results from a dataflow computing paradigm. 43
44 Acknowledgements Thank you to: Felix Herrmann, Henryk Modzelewski, Diego Oriato, Simon Tilbury, Tristan van Leeuwen, Eddie Hung, Lina Miao, Rafael Lago, my Master s commitee members: Michael Friedlander, Christian Schoof, my external examiner: Steve Wilton, Maxeler Technologies, and everyone in the SLIM group! This work was financially supported in part by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (RGPIN ) and the Collaborative Research and Development Grant DNOISE II (CDRP J ). This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BGP, BP, Chevron, ConocoPhillips, CGG, ION GXT, Petrobras, PGS, Statoil, Total SA, WesternGeco, Woodside. 44
45 References Å. Björck and T. Elfving. Accelerated projection methods for computing pseudoinverse solutions of systems of linear equations. BIT Numerical Mathematics, 19(2): , ISSN doi: /BF URL D. Gordon and R. Gordon. Component- averaged row projections: A robust, block- parallel scheme for sparse linear systems. SIAM Journal on Scientific Computing, 27(3): , doi: / URL D. Gordon and R. Gordon. CARP- CG: A robust and efficient parallel solver for linear systems, applied to strongly convection dominated PDEs. Parallel Computing, 36(9): , ISSN doi: /j.parco URL F. Grüll, M. Kunz, M. Hausmann, and U. Kebschull. An implementation of 3D electron tomography on FPGAs. In Reconfigurable Computing and FPGAs (ReConFig), 2012 International Conference on, pages 1 5, doi: /ReConFig S. Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen. Bulletin International de l Academie Polonaise des Sciences et des Lettres, 35: , S. Kaczmarz. Approximate solution of systems of linear equations. International Journal of Control, 57(6): , doi: / (translation) T. van Leeuwen, D. Gordon, R. Gordon, and F. J. Herrmann. Preconditioning the Helmholtz equation via row- projections. In EAGE technical program. EAGE, URL H. Meuer, E. Strohmaier, J. Dongarra, and H. Simon. Top 500 supercomputer sites, November URL S. Operto, J. Virieux, P. Amestoy, J.- Y. L Excellent, L. Giraud, and H. B. H. Ali. 3D finite- difference frequency- domain modeling of visco- acoustic wave propagation using a massively parallel direct solver: A feasibility study. Geophysics, 72(5):SM195 SM211, doi: / URL SM195.abstract. O. Pell, J. Bower, R. Dimond, O. Mencer, and M. J. Flynn. Finite- difference wave propagation modeling on special- purpose dataflow machines. Parallel and Distributed Systems, IEEE Transactions on, 24(5): , ISSN doi: /TPDS
Wavefield Reconstruction Inversion (WRI) a new take on wave-equation based inversion Felix J. Herrmann
Wavefield Reconstruction Inversion (WRI) a new take on wave-equation based inversion Felix J. Herrmann SLIM University of British Columbia van Leeuwen, T and Herrmann, F J (2013). Mitigating local minima
More informationSource estimation for frequency-domain FWI with robust penalties
Source estimation for frequency-domain FWI with robust penalties Aleksandr Y. Aravkin, Tristan van Leeuwen, Henri Calandra, and Felix J. Herrmann Dept. of Earth and Ocean sciences University of British
More informationFWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke
Consortium 2010 FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke SLIM University of British Columbia Full Waveform Inversion The Full Waveform
More informationApplication of matrix square root and its inverse to downward wavefield extrapolation
Application of matrix square root and its inverse to downward wavefield extrapolation Polina Zheglova and Felix J. Herrmann Department of Earth and Ocean sciences, University of British Columbia, Vancouver,
More informationSimultaneous estimation of wavefields & medium parameters
Simultaneous estimation of wavefields & medium parameters reduced-space versus full-space waveform inversion Bas Peters, Felix J. Herrmann Workshop W- 2, The limit of FWI in subsurface parameter recovery.
More informationSparsity-promoting migration with multiples
Sparsity-promoting migration with multiples Tim Lin, Ning Tu and Felix Herrmann SLIM Seismic Laboratory for Imaging and Modeling the University of British Columbia Courtesy of Verschuur, 29 SLIM Motivation..
More informationMitigating data gaps in the estimation of primaries by sparse inversion without data reconstruction
Mitigating data gaps in the estimation of primaries by sparse inversion without data reconstruction Tim T.Y. Lin SLIM University of British Columbia Talk outline Brief review of REPSI Data reconstruction
More informationStructured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 26, 2013
Structured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 6, 13 SLIM University of British Columbia Motivation 3D seismic experiments - 5D
More informationFull-Waveform Inversion with Gauss- Newton-Krylov Method
Full-Waveform Inversion with Gauss- Newton-Krylov Method Yogi A. Erlangga and Felix J. Herrmann {yerlangga,fherrmann}@eos.ubc.ca Seismic Laboratory for Imaging and Modeling The University of British Columbia
More informationMigration with Implicit Solvers for the Time-harmonic Helmholtz
Migration with Implicit Solvers for the Time-harmonic Helmholtz Yogi A. Erlangga, Felix J. Herrmann Seismic Laboratory for Imaging and Modeling, The University of British Columbia {yerlangga,fherrmann}@eos.ubc.ca
More informationCompressive sampling meets seismic imaging
Compressive sampling meets seismic imaging Felix J. Herrmann fherrmann@eos.ubc.ca http://slim.eos.ubc.ca joint work with Tim Lin and Yogi Erlangga Seismic Laboratory for Imaging & Modeling Department of
More informationUncertainty quantification for Wavefield Reconstruction Inversion
Uncertainty quantification for Wavefield Reconstruction Inversion Zhilong Fang *, Chia Ying Lee, Curt Da Silva *, Felix J. Herrmann *, and Rachel Kuske * Seismic Laboratory for Imaging and Modeling (SLIM),
More informationSeismic data interpolation and denoising using SVD-free low-rank matrix factorization
Seismic data interpolation and denoising using SVD-free low-rank matrix factorization R. Kumar, A.Y. Aravkin,, H. Mansour,, B. Recht and F.J. Herrmann Dept. of Earth and Ocean sciences, University of British
More informationUncertainty quantification for Wavefield Reconstruction Inversion
using a PDE free semidefinite Hessian and randomize-then-optimize method Zhilong Fang *, Chia Ying Lee, Curt Da Silva, Tristan van Leeuwen and Felix J. Herrmann Seismic Laboratory for Imaging and Modeling
More information(This is a sample cover image for this issue. The actual cover is not yet available at this time.)
(This is a sample cover image for this issue. The actual cover is not yet available at this time.) This article appeared in a journal published by Elsevier. The attached copy is furnished to the author
More informationFPGA Implementation of a Predictive Controller
FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan
More informationUncertainty quantification for inverse problems with a weak wave-equation constraint
Uncertainty quantification for inverse problems with a weak wave-equation constraint Zhilong Fang*, Curt Da Silva*, Rachel Kuske** and Felix J. Herrmann* *Seismic Laboratory for Imaging and Modeling (SLIM),
More informationOn the exponential convergence of. the Kaczmarz algorithm
On the exponential convergence of the Kaczmarz algorithm Liang Dai and Thomas B. Schön Department of Information Technology, Uppsala University, arxiv:4.407v [cs.sy] 0 Mar 05 75 05 Uppsala, Sweden. E-mail:
More informationBlock AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark
Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise
More informationCompressive Sensing Applied to Full-wave Form Inversion
Compressive Sensing Applied to Full-wave Form Inversion Felix J. Herrmann* fherrmann@eos.ubc.ca Joint work with Yogi Erlangga, and Tim Lin *Seismic Laboratory for Imaging & Modeling Department of Earth
More informationA 27-point scheme for a 3D frequency-domain scalar wave equation based on an average-derivative method
Geophysical Prospecting 04 6 58 77 doi: 0./365-478.090 A 7-point scheme for a 3D frequency-domain scalar wave equation based on an average-derivative method Jing-Bo Chen Key Laboratory of Petroleum Resources
More informationSUMMARY. The main contribution of this paper is threefold. First, we show that FWI based on the Helmholtz equation has the advantage
Randomized full-waveform inversion: a dimenstionality-reduction approach Peyman P. Moghaddam and Felix J. Herrmann, University of British Columbia, Canada SUMMARY Full-waveform inversion relies on the
More informationRecent results in curvelet-based primary-multiple separation: application to real data
Recent results in curvelet-based primary-multiple separation: application to real data Deli Wang 1,2, Rayan Saab 3, Ozgur Yilmaz 4, Felix J. Herrmann 2 1.College of Geoexploration Science and Technology,
More informationConvergence Rates for Greedy Kaczmarz Algorithms
onvergence Rates for Greedy Kaczmarz Algorithms Julie Nutini 1, Behrooz Sepehry 1, Alim Virani 1, Issam Laradji 1, Mark Schmidt 1, Hoyt Koepke 2 1 niversity of British olumbia, 2 Dato Abstract We discuss
More informationLattice Boltzmann simulations on heterogeneous CPU-GPU clusters
Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts
More informationSUMMARY. H (ω)u(ω,x s ;x) :=
Interpolating solutions of the Helmholtz equation with compressed sensing Tim TY Lin*, Evgeniy Lebed, Yogi A Erlangga, and Felix J Herrmann, University of British Columbia, EOS SUMMARY We present an algorithm
More informationMatrix Probing and Simultaneous Sources: A New Approach for Preconditioning the Hessian
Matrix Probing and Simultaneous Sources: A New Approach for Preconditioning the Hessian Curt Da Silva 1 and Felix J. Herrmann 2 1 Dept. of Mathematics 2 Dept. of Earth and Ocean SciencesUniversity of British
More informationP214 Efficient Computation of Passive Seismic Interferometry
P214 Efficient Computation of Passive Seismic Interferometry J.W. Thorbecke* (Delft University of Technology) & G.G. Drijkoningen (Delft University of Technology) SUMMARY Seismic interferometry is from
More informationAlgorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method
Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk
More informationTime domain sparsity promoting LSRTM with source estimation
Time domain sparsity promoting LSRTM with source estimation Mengmeng Yang, Philipp Witte, Zhilong Fang & Felix J. Herrmann SLIM University of British Columbia Motivation Features of RTM: pros - no dip
More informationERLANGEN REGIONAL COMPUTING CENTER
ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,
More informationSeismic wavefield inversion with curvelet-domain sparsity promotion
Seismic wavefield inversion with curvelet-domain sparsity promotion Felix J. Herrmann* fherrmann@eos.ubc.ca Deli Wang** wangdeli@email.jlu.edu.cn *Seismic Laboratory for Imaging & Modeling Department of
More informationBuilding a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI
Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering
More informationComparison between least-squares reverse time migration and full-waveform inversion
Comparison between least-squares reverse time migration and full-waveform inversion Lei Yang, Daniel O. Trad and Wenyong Pan Summary The inverse problem in exploration geophysics usually consists of two
More informationA High-Performance Parallel Hybrid Method for Large Sparse Linear Systems
Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,
More informationAccelerating linear algebra computations with hybrid GPU-multicore systems.
Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)
More informationSUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD
D frequency-domain seismic modeling with a Block Low-Rank algebraic multifrontal direct solver. C. Weisbecker,, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari, J.-Y. L Excellent, S. Operto, J. Virieux
More informationSome Geometric and Algebraic Aspects of Domain Decomposition Methods
Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition
More informationSPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics
SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS
More informationA robust multilevel approximate inverse preconditioner for symmetric positive definite matrices
DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric
More informationSparse LU Factorization on GPUs for Accelerating SPICE Simulation
Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,
More informationClaude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique
Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)
More informationIncomplete Cholesky preconditioners that exploit the low-rank property
anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université
More informationFaster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs
Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline
More informationElastic full waveform inversion for near surface imaging in CMP domain Zhiyang Liu*, Jie Zhang, University of Science and Technology of China (USTC)
Elastic full waveform inversion for near surface imaging in CMP domain Zhiyang Liu*, Jie Zhang, University of Science and Technology of China (USTC) Summary We develop an elastic full waveform inversion
More informationA simple Concept for the Performance Analysis of Cluster-Computing
A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University
More informationTechnology USA; 3 Dept. of Mathematics, University of California, Irvine. SUMMARY
Adrien Scheuer, Leonardo Zepeda-Núñez,3*), Russell J. Hewett 2 and Laurent Demanet Dept. of Mathematics and Earth Resources Lab, Massachusetts Institute of Technology; 2 Total E&P Research & Technology
More informationMultilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota
Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work
More informationJanus: FPGA Based System for Scientific Computing Filippo Mantovani
Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass
More informationAn H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages
PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical
More informationMeasuring freeze-out parameters on the Bielefeld GPU cluster
Measuring freeze-out parameters on the Bielefeld GPU cluster Outline Fluctuations and the QCD phase diagram Fluctuations from Lattice QCD The Bielefeld hybrid GPU cluster Freeze-out conditions from QCD
More informationA Fast High Order Algorithm for 3D Helmholtz Equation with Dirichlet Boundary
Applied and Computational Mathematics 2018; 7(4): 180-187 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20180704.11 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Fast High Order
More informationAccelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada
Accelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada SUMMARY To meet current-day challenges, exploration seismology increasingly relies on
More informationMulti-source inversion of TEM data: with field applications to Mt. Milligan
Multi-source inversion of TEM data: with field applications to Mt. Milligan D. Oldenburg, E. Haber, D. Yang Geophysical Inversion Facility, University of British Columbia, Vancouver, BC, Canada Summary
More informationTuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb
Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters Mike Showerman, Guochun Shi Steven Gottlieb Outline Background Lattice QCD and MILC GPU and Cray XK6 node architecture Implementation
More informationAcoustics Analysis of Speaker ANSYS, Inc. November 28, 2014
Acoustics Analysis of Speaker 1 Introduction ANSYS 14.0 offers many enhancements in the area of acoustics. In this presentation, an example speaker analysis will be shown to highlight some of the acoustics
More informationSUMMARY INTRODUCTION. f ad j (t) = 2 Es,r. The kernel
The failure mode of correlation focusing for model velocity estimation Hyoungsu Baek 1(*), Henri Calandra 2, and Laurent Demanet 1 1 Dept. of Mathematics and Earth Resources Lab, Massachusetts Institute
More informationHOW LFE WORKS SIMEN GAURE
HOW LFE WORKS SIMEN GAURE Abstract. Here is a proof for the demeaning method used in lfe, and a description of the methods used for solving the residual equations. As well as a toy-example. This is a preliminary
More informationThe Block Kaczmarz Algorithm Based On Solving Linear Systems With Arrowhead Matrices
Applied Mathematics E-Notes, 172017, 142-156 c ISSN 1607-2510 Available free at mirror sites of http://www.math.nthu.edu.tw/ amen/ The Block Kaczmarz Algorithm Based On Solving Linear Systems With Arrowhead
More informationParallelism in Structured Newton Computations
Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca
More informationW011 Full Waveform Inversion for Detailed Velocity Model Building
W011 Full Waveform Inversion for Detailed Velocity Model Building S. Kapoor* (WesternGeco, LLC), D. Vigh (WesternGeco), H. Li (WesternGeco) & D. Derharoutian (WesternGeco) SUMMARY An accurate earth model
More informationAn Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors
Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne
More informationAccelerating Transfer Entropy Computation
Accelerating Transfer Entropy Computation Shengjia Shao, Ce Guo and Wayne Luk Department of Computing Imperial College London United Kingdom E-mail: {shengjia.shao12, ce.guo10, w.luk}@imperial.ac.uk Stephen
More informationRobust solution of Poisson-like problems with aggregation-based AMG
Robust solution of Poisson-like problems with aggregation-based AMG Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire Paris, January 26, 215 Supported by the Belgian FNRS http://homepages.ulb.ac.be/
More informationEfficient implementation of the overlap operator on multi-gpus
Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator
More informationEnhancing Scalability of Sparse Direct Methods
Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.
More informationTR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems
TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a
More informationOpen-source finite element solver for domain decomposition problems
1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University
More informationWeather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012
Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,
More informationSparse solver 64 bit and out-of-core addition
Sparse solver 64 bit and out-of-core addition Prepared By: Richard Link Brian Yuen Martec Limited 1888 Brunswick Street, Suite 400 Halifax, Nova Scotia B3J 3J8 PWGSC Contract Number: W7707-145679 Contract
More informationNumerical Solution Techniques in Mechanical and Aerospace Engineering
Numerical Solution Techniques in Mechanical and Aerospace Engineering Chunlei Liang LECTURE 3 Solvers of linear algebraic equations 3.1. Outline of Lecture Finite-difference method for a 2D elliptic PDE
More informationINF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)
INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder
More informationarxiv: v1 [hep-lat] 8 Nov 2014
Staggered Dslash Performance on Intel Xeon Phi Architecture arxiv:1411.2087v1 [hep-lat] 8 Nov 2014 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli AT umail.iu.edu Steven
More informationSP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay
SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain
More informationPerformance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster
Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp
More informationAN HELMHOLTZ ITERATIVE SOLVER FOR THE THREE-DIMENSIONAL SEISMIC IMAGING PROBLEMS?
European Conference on Computational Fluid Dynamics ECCOMAS CFD 2006 P. Wesseling, E. Oñate, J. Périaux (Eds) c TU Delft, Delft The Netherlands, 2006 AN HELMHOLTZ ITERATIVE SOLVER FOR THE THREE-DIMENSIONAL
More informationComputing least squares condition numbers on hybrid multicore/gpu systems
Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning
More informationParallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano
Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic
More informationSUMMARY REVIEW OF THE FREQUENCY DOMAIN L2 FWI-HESSIAN
Efficient stochastic Hessian estimation for full waveform inversion Lucas A. Willemsen, Alison E. Malcolm and Russell J. Hewett, Massachusetts Institute of Technology SUMMARY In this abstract we present
More informationA frequency-space 2-D scalar wave extrapolator using extended 25-point finite-difference operator
GEOPHYSICS, VOL. 63, NO. 1 (JANUARY-FEBRUARY 1998); P. 289 296, 7 FIGS. A frequency-space 2-D scalar wave extrapolator using extended 25-point finite-difference operator Changsoo Shin and Heejeung Sohn
More informationA Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers
Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned
More informationSUMMARY. fractured reservoirs in the context of exploration seismology.
Modeling scattering of cross-well seismic waves using Radiative Transfer Theory Josimar A. Da Silva Jr, Oleg V. Poliannikov and Michael Fehler, Earth Resources Laboratory / M.I.T SUMMARY We model P and
More informationFighting the Curse of Dimensionality: Compressive Sensing in Exploration Seismology
Fighting the Curse of Dimensionality: Compressive Sensing in Exploration Seismology Herrmann, F.J.; Friedlander, M.P.; Yilmat, O. Signal Processing Magazine, IEEE, vol.29, no.3, pp.88-100 Andreas Gaich,
More informationA Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters
A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters Abal-Kassim Cheik Ahamed and Frédéric Magoulès Introduction By giving another way to see beneath the Earth, gravimetry
More informationDirect Self-Consistent Field Computations on GPU Clusters
Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd
More informationBlock Low-Rank (BLR) approximations to improve multifrontal sparse solvers
Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October
More informationHeterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry
Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)
More informationThe new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota
The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM Applied Linear Algebra Valencia, June 18-22, 2012 Introduction Krylov
More informationMultipole-Based Preconditioners for Sparse Linear Systems.
Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal
More informationSOLUTION of linear systems of equations of the form:
Proceedings of the Federated Conference on Computer Science and Information Systems pp. Mixed precision iterative refinement techniques for the WZ factorization Beata Bylina Jarosław Bylina Institute of
More informationA CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method
A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia
More informationFrom PZ summation to wavefield separation, mirror imaging and up-down deconvolution: the evolution of ocean-bottom seismic data processing
From PZ summation to wavefield separation, mirror imaging and up-down deconvolution: the evolution of ocean-bottom seismic data processing Sergio Grion, CGGVeritas Summary This paper discusses present
More informationParallel Numerical Algorithms
Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3 Numerical Optimization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at
More informationWeaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms
DOI: 10.1515/auom-2017-0004 An. Şt. Univ. Ovidius Constanţa Vol. 25(1),2017, 49 60 Weaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms Doina Carp, Ioana Pomparău,
More informationLow-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising
Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising Curt Da Silva and Felix J. Herrmann 2 Dept. of Mathematics 2 Dept. of Earth and Ocean Sciences, University
More informationMatrix Computations: Direct Methods II. May 5, 2014 Lecture 11
Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would
More informationA Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility
A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility Jamie Haddock Graduate Group in Applied Mathematics, Department of Mathematics, University of California, Davis Copper Mountain Conference on
More informationIntroduction to Seismic Imaging
Introduction to Seismic Imaging Alison Malcolm Department of Earth, Atmospheric and Planetary Sciences MIT August 20, 2010 Outline Introduction Why we image the Earth How data are collected Imaging vs
More informationP016 Toward Gauss-Newton and Exact Newton Optimization for Full Waveform Inversion
P016 Toward Gauss-Newton and Exact Newton Optiization for Full Wavefor Inversion L. Métivier* ISTerre, R. Brossier ISTerre, J. Virieux ISTerre & S. Operto Géoazur SUMMARY Full Wavefor Inversion FWI applications
More informationDomain Decomposition-based contour integration eigenvalue solvers
Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM
More information