SLIM. University of British Columbia

Size: px
Start display at page:

Download "SLIM. University of British Columbia"

Transcription

1 Accelerating an Iterative Helmholtz Solver Using Reconfigurable Hardware Art Petrenko M.Sc. Defence, April 9, 2014 Seismic Laboratory for Imaging and Modelling Department of Earth, Ocean and Atmospheric Sciences, UBC SLIM University of British Columbia 1

2 Oh by the way: I have a stutter. 2

3 Seismic Wave Simulation 3

4 Seismic Exploration for Oil and Gas 4

5 Full-waveform Inversion Seismic Wavefield (u) Earth model (m) 5

6 Full-waveform inversion is SLOW 6

7 The Accelerators Have Arrived Top 10 of Top 500 Supercomputers

8 FPGAs: Reconfigurable Hardware Accelerators 8

9 The Punchline 9

10 Modelling Seismic Waves Mathematical Formulation 10

11 s medium can be written as Modelling Seismic Waves: The Wave Equation 1 2 Ê m+ 2 u = q, z equation. As written above, the Helmho Frequency Laplacian Earth Model Seismic Source (right- hand side) Wavefield (iterate) t density isotropic medium which only su are modelled heuristically by allowing m t 11

12 Modelling Seismic Waves: Discretization [Operto, 2007] A(m, Ê)u = q 12

13 Solving the Helmholtz System 13

14 The Kaczmarz Algorithm [Kaczmarz, 1937] DKSWP(A, u, q, ) Adapted from [van Leeuwen, 2012] 14

15 The Kaczmarz Algorithm: Equivalent to SSOR-NE [Björck and Elfving, 1979] Double Kaczmarz sweep on the original system: Au = q One iteration of SSOR on the normal equations: AA y = q A y = u Both are computed as: u k+1 = u k + (b i ha i, u k i) a i ka i k 2 k :1! 2N i :1! N,N! 1 15

16 Kaczmarz + CG = CGMN [Björck & Elfving 1979] 16

17 CGMN: Solves for Fixed Point of Kaczmarz Row Projections DKSWP(A, u, q, ) =Q 1 Q N Q N Q 1 u + Rq = Qu + Rq. Assume u is a solution and re-arrange: (I Q)u = Rq, 17

18 Contribution of This Work 18

19 Compute Node Overview [Maxeler Technologies, 2011] Algorithm 1 CGMN (Björck and Elfving [4]) Input: A, u, q, 1: Rq Ω DKSWP(A, 0, q, ) 2: r Ω Rq u +DKSWP(A, u, 0, ) 3: p Ω r 4: while ÎrÎ 2 > tol do 5: s Ω (I Q)p = p DKSWP(A, p, 0, ) 6: ΩÎrÎ 2 / Èp, sí 7: u Ω u + p 8: r Ω r s Adapted from [Pell, 2013] 9: ΩÎrÎ 2 curr / ÎrÎ2 prev 10: ÎrÎ 2 prev ΩÎrÎ2 curr 11: p Ω r + p 12: end while Output: u Kernel: running on accelerator 19

20 Low levels of abstraction are scary 20

21 Design at high level of abstraction 21

22 Implementation Details 22

23 Layout of 3D Wavefields in 1D Memory 23

24 Buffering: Overcoming Latency of Memory Access 24

25 Pipelining: Overcoming Latency of Computation 25

26 Pipelining: Overcoming Latency of Computation 26

27 Pipelining: Overcoming Latency of Computation 27

28 Memory Access: 384 bytes / burst Number of bits in a real number Number of bits in a complex number Complex numbers per burst (single precision) (double precision)

29 Backward Sweep: Double Buffering 29

30 Number Representation Matrix row storage efficiency bits used in this work Number of bits in a real number 30

31 Results 31

32 End-to-end Execution Time one Intel Xeon E core Intel core + FPGA- based accelerator 32

33 Kaczmarz Sweeps: No Longer the Bottleneck other CGMN operations (inner products, vector addition, etc.) Kaczmarz sweeps 33

34 Effect of matrix row ordering on CGMN convergence Sequential (1 to N) Accelerator ordering 34

35 FPGA Resource Usage 35

36 Recent Work: Multiple Kaczmarz Sweeps / CGMN Iteration (432 x 240 x 25 system) Total Kaczmarz sweeps on accelerator CGMN on CPU host 36

37 Avoiding Future Communication Bottlenecks 38.4 GB/s: memory bandwidth limit 2 GB/s: PCIe bandwidth limit 37

38 The Next Step Problem: On-chip memory (4 MB) limits block size to 300 x 300 in the two faster dimensions. Solution: Implement domain decomposition for larger systems. 38

39 Straight-forward Extension Goal: Systematically use all 4 accelerators. Solution: Solve several forward problems at once. 39

40 Future Work Problem: Kaczmarz sweeps now account for only approximately 10% of CGMN time. Solution: Port all of CGMN to the DFE. 40

41 Future Work Fact: Reading A from memory limits optimizations like increasing FPGA frequency. Result: Read only earth model m and generate A on the DFE. 41

42 Future Work Problem: Domain size limited by memory size: 24 GB. Solution: Parallelize CGMN to CARP-CG [Gordon & Gordon, 2010]. 42

43 Conclusion Have implemented frequency-domain wave simulation using reconfigurable hardware. A speed-up of 2 x 1 Intel Xeon core results from a dataflow computing paradigm. 43

44 Acknowledgements Thank you to: Felix Herrmann, Henryk Modzelewski, Diego Oriato, Simon Tilbury, Tristan van Leeuwen, Eddie Hung, Lina Miao, Rafael Lago, my Master s commitee members: Michael Friedlander, Christian Schoof, my external examiner: Steve Wilton, Maxeler Technologies, and everyone in the SLIM group! This work was financially supported in part by the Natural Sciences and Engineering Research Council of Canada Discovery Grant (RGPIN ) and the Collaborative Research and Development Grant DNOISE II (CDRP J ). This research was carried out as part of the SINBAD II project with support from the following organizations: BG Group, BGP, BP, Chevron, ConocoPhillips, CGG, ION GXT, Petrobras, PGS, Statoil, Total SA, WesternGeco, Woodside. 44

45 References Å. Björck and T. Elfving. Accelerated projection methods for computing pseudoinverse solutions of systems of linear equations. BIT Numerical Mathematics, 19(2): , ISSN doi: /BF URL D. Gordon and R. Gordon. Component- averaged row projections: A robust, block- parallel scheme for sparse linear systems. SIAM Journal on Scientific Computing, 27(3): , doi: / URL D. Gordon and R. Gordon. CARP- CG: A robust and efficient parallel solver for linear systems, applied to strongly convection dominated PDEs. Parallel Computing, 36(9): , ISSN doi: /j.parco URL F. Grüll, M. Kunz, M. Hausmann, and U. Kebschull. An implementation of 3D electron tomography on FPGAs. In Reconfigurable Computing and FPGAs (ReConFig), 2012 International Conference on, pages 1 5, doi: /ReConFig S. Kaczmarz. Angenäherte auflösung von systemen linearer gleichungen. Bulletin International de l Academie Polonaise des Sciences et des Lettres, 35: , S. Kaczmarz. Approximate solution of systems of linear equations. International Journal of Control, 57(6): , doi: / (translation) T. van Leeuwen, D. Gordon, R. Gordon, and F. J. Herrmann. Preconditioning the Helmholtz equation via row- projections. In EAGE technical program. EAGE, URL H. Meuer, E. Strohmaier, J. Dongarra, and H. Simon. Top 500 supercomputer sites, November URL S. Operto, J. Virieux, P. Amestoy, J.- Y. L Excellent, L. Giraud, and H. B. H. Ali. 3D finite- difference frequency- domain modeling of visco- acoustic wave propagation using a massively parallel direct solver: A feasibility study. Geophysics, 72(5):SM195 SM211, doi: / URL SM195.abstract. O. Pell, J. Bower, R. Dimond, O. Mencer, and M. J. Flynn. Finite- difference wave propagation modeling on special- purpose dataflow machines. Parallel and Distributed Systems, IEEE Transactions on, 24(5): , ISSN doi: /TPDS

Wavefield Reconstruction Inversion (WRI) a new take on wave-equation based inversion Felix J. Herrmann

Wavefield Reconstruction Inversion (WRI) a new take on wave-equation based inversion Felix J. Herrmann Wavefield Reconstruction Inversion (WRI) a new take on wave-equation based inversion Felix J. Herrmann SLIM University of British Columbia van Leeuwen, T and Herrmann, F J (2013). Mitigating local minima

More information

Source estimation for frequency-domain FWI with robust penalties

Source estimation for frequency-domain FWI with robust penalties Source estimation for frequency-domain FWI with robust penalties Aleksandr Y. Aravkin, Tristan van Leeuwen, Henri Calandra, and Felix J. Herrmann Dept. of Earth and Ocean sciences University of British

More information

FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke

FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke Consortium 2010 FWI with Compressive Updates Aleksandr Aravkin, Felix Herrmann, Tristan van Leeuwen, Xiang Li, James Burke SLIM University of British Columbia Full Waveform Inversion The Full Waveform

More information

Application of matrix square root and its inverse to downward wavefield extrapolation

Application of matrix square root and its inverse to downward wavefield extrapolation Application of matrix square root and its inverse to downward wavefield extrapolation Polina Zheglova and Felix J. Herrmann Department of Earth and Ocean sciences, University of British Columbia, Vancouver,

More information

Simultaneous estimation of wavefields & medium parameters

Simultaneous estimation of wavefields & medium parameters Simultaneous estimation of wavefields & medium parameters reduced-space versus full-space waveform inversion Bas Peters, Felix J. Herrmann Workshop W- 2, The limit of FWI in subsurface parameter recovery.

More information

Sparsity-promoting migration with multiples

Sparsity-promoting migration with multiples Sparsity-promoting migration with multiples Tim Lin, Ning Tu and Felix Herrmann SLIM Seismic Laboratory for Imaging and Modeling the University of British Columbia Courtesy of Verschuur, 29 SLIM Motivation..

More information

Mitigating data gaps in the estimation of primaries by sparse inversion without data reconstruction

Mitigating data gaps in the estimation of primaries by sparse inversion without data reconstruction Mitigating data gaps in the estimation of primaries by sparse inversion without data reconstruction Tim T.Y. Lin SLIM University of British Columbia Talk outline Brief review of REPSI Data reconstruction

More information

Structured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 26, 2013

Structured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 26, 2013 Structured tensor missing-trace interpolation in the Hierarchical Tucker format Curt Da Silva and Felix J. Herrmann Sept. 6, 13 SLIM University of British Columbia Motivation 3D seismic experiments - 5D

More information

Full-Waveform Inversion with Gauss- Newton-Krylov Method

Full-Waveform Inversion with Gauss- Newton-Krylov Method Full-Waveform Inversion with Gauss- Newton-Krylov Method Yogi A. Erlangga and Felix J. Herrmann {yerlangga,fherrmann}@eos.ubc.ca Seismic Laboratory for Imaging and Modeling The University of British Columbia

More information

Migration with Implicit Solvers for the Time-harmonic Helmholtz

Migration with Implicit Solvers for the Time-harmonic Helmholtz Migration with Implicit Solvers for the Time-harmonic Helmholtz Yogi A. Erlangga, Felix J. Herrmann Seismic Laboratory for Imaging and Modeling, The University of British Columbia {yerlangga,fherrmann}@eos.ubc.ca

More information

Compressive sampling meets seismic imaging

Compressive sampling meets seismic imaging Compressive sampling meets seismic imaging Felix J. Herrmann fherrmann@eos.ubc.ca http://slim.eos.ubc.ca joint work with Tim Lin and Yogi Erlangga Seismic Laboratory for Imaging & Modeling Department of

More information

Uncertainty quantification for Wavefield Reconstruction Inversion

Uncertainty quantification for Wavefield Reconstruction Inversion Uncertainty quantification for Wavefield Reconstruction Inversion Zhilong Fang *, Chia Ying Lee, Curt Da Silva *, Felix J. Herrmann *, and Rachel Kuske * Seismic Laboratory for Imaging and Modeling (SLIM),

More information

Seismic data interpolation and denoising using SVD-free low-rank matrix factorization

Seismic data interpolation and denoising using SVD-free low-rank matrix factorization Seismic data interpolation and denoising using SVD-free low-rank matrix factorization R. Kumar, A.Y. Aravkin,, H. Mansour,, B. Recht and F.J. Herrmann Dept. of Earth and Ocean sciences, University of British

More information

Uncertainty quantification for Wavefield Reconstruction Inversion

Uncertainty quantification for Wavefield Reconstruction Inversion using a PDE free semidefinite Hessian and randomize-then-optimize method Zhilong Fang *, Chia Ying Lee, Curt Da Silva, Tristan van Leeuwen and Felix J. Herrmann Seismic Laboratory for Imaging and Modeling

More information

(This is a sample cover image for this issue. The actual cover is not yet available at this time.)

(This is a sample cover image for this issue. The actual cover is not yet available at this time.) (This is a sample cover image for this issue. The actual cover is not yet available at this time.) This article appeared in a journal published by Elsevier. The attached copy is furnished to the author

More information

FPGA Implementation of a Predictive Controller

FPGA Implementation of a Predictive Controller FPGA Implementation of a Predictive Controller SIAM Conference on Optimization 2011, Darmstadt, Germany Minisymposium on embedded optimization Juan L. Jerez, George A. Constantinides and Eric C. Kerrigan

More information

Uncertainty quantification for inverse problems with a weak wave-equation constraint

Uncertainty quantification for inverse problems with a weak wave-equation constraint Uncertainty quantification for inverse problems with a weak wave-equation constraint Zhilong Fang*, Curt Da Silva*, Rachel Kuske** and Felix J. Herrmann* *Seismic Laboratory for Imaging and Modeling (SLIM),

More information

On the exponential convergence of. the Kaczmarz algorithm

On the exponential convergence of. the Kaczmarz algorithm On the exponential convergence of the Kaczmarz algorithm Liang Dai and Thomas B. Schön Department of Information Technology, Uppsala University, arxiv:4.407v [cs.sy] 0 Mar 05 75 05 Uppsala, Sweden. E-mail:

More information

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark

Block AIR Methods. For Multicore and GPU. Per Christian Hansen Hans Henrik B. Sørensen. Technical University of Denmark Block AIR Methods For Multicore and GPU Per Christian Hansen Hans Henrik B. Sørensen Technical University of Denmark Model Problem and Notation Parallel-beam 3D tomography exact solution exact data noise

More information

Compressive Sensing Applied to Full-wave Form Inversion

Compressive Sensing Applied to Full-wave Form Inversion Compressive Sensing Applied to Full-wave Form Inversion Felix J. Herrmann* fherrmann@eos.ubc.ca Joint work with Yogi Erlangga, and Tim Lin *Seismic Laboratory for Imaging & Modeling Department of Earth

More information

A 27-point scheme for a 3D frequency-domain scalar wave equation based on an average-derivative method

A 27-point scheme for a 3D frequency-domain scalar wave equation based on an average-derivative method Geophysical Prospecting 04 6 58 77 doi: 0./365-478.090 A 7-point scheme for a 3D frequency-domain scalar wave equation based on an average-derivative method Jing-Bo Chen Key Laboratory of Petroleum Resources

More information

SUMMARY. The main contribution of this paper is threefold. First, we show that FWI based on the Helmholtz equation has the advantage

SUMMARY. The main contribution of this paper is threefold. First, we show that FWI based on the Helmholtz equation has the advantage Randomized full-waveform inversion: a dimenstionality-reduction approach Peyman P. Moghaddam and Felix J. Herrmann, University of British Columbia, Canada SUMMARY Full-waveform inversion relies on the

More information

Recent results in curvelet-based primary-multiple separation: application to real data

Recent results in curvelet-based primary-multiple separation: application to real data Recent results in curvelet-based primary-multiple separation: application to real data Deli Wang 1,2, Rayan Saab 3, Ozgur Yilmaz 4, Felix J. Herrmann 2 1.College of Geoexploration Science and Technology,

More information

Convergence Rates for Greedy Kaczmarz Algorithms

Convergence Rates for Greedy Kaczmarz Algorithms onvergence Rates for Greedy Kaczmarz Algorithms Julie Nutini 1, Behrooz Sepehry 1, Alim Virani 1, Issam Laradji 1, Mark Schmidt 1, Hoyt Koepke 2 1 niversity of British olumbia, 2 Dato Abstract We discuss

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

SUMMARY. H (ω)u(ω,x s ;x) :=

SUMMARY. H (ω)u(ω,x s ;x) := Interpolating solutions of the Helmholtz equation with compressed sensing Tim TY Lin*, Evgeniy Lebed, Yogi A Erlangga, and Felix J Herrmann, University of British Columbia, EOS SUMMARY We present an algorithm

More information

Matrix Probing and Simultaneous Sources: A New Approach for Preconditioning the Hessian

Matrix Probing and Simultaneous Sources: A New Approach for Preconditioning the Hessian Matrix Probing and Simultaneous Sources: A New Approach for Preconditioning the Hessian Curt Da Silva 1 and Felix J. Herrmann 2 1 Dept. of Mathematics 2 Dept. of Earth and Ocean SciencesUniversity of British

More information

P214 Efficient Computation of Passive Seismic Interferometry

P214 Efficient Computation of Passive Seismic Interferometry P214 Efficient Computation of Passive Seismic Interferometry J.W. Thorbecke* (Delft University of Technology) & G.G. Drijkoningen (Delft University of Technology) SUMMARY Seismic interferometry is from

More information

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk

More information

Time domain sparsity promoting LSRTM with source estimation

Time domain sparsity promoting LSRTM with source estimation Time domain sparsity promoting LSRTM with source estimation Mengmeng Yang, Philipp Witte, Zhilong Fang & Felix J. Herrmann SLIM University of British Columbia Motivation Features of RTM: pros - no dip

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

Seismic wavefield inversion with curvelet-domain sparsity promotion

Seismic wavefield inversion with curvelet-domain sparsity promotion Seismic wavefield inversion with curvelet-domain sparsity promotion Felix J. Herrmann* fherrmann@eos.ubc.ca Deli Wang** wangdeli@email.jlu.edu.cn *Seismic Laboratory for Imaging & Modeling Department of

More information

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI

Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Building a Multi-FPGA Virtualized Restricted Boltzmann Machine Architecture Using Embedded MPI Charles Lo and Paul Chow {locharl1, pc}@eecg.toronto.edu Department of Electrical and Computer Engineering

More information

Comparison between least-squares reverse time migration and full-waveform inversion

Comparison between least-squares reverse time migration and full-waveform inversion Comparison between least-squares reverse time migration and full-waveform inversion Lei Yang, Daniel O. Trad and Wenyong Pan Summary The inverse problem in exploration geophysics usually consists of two

More information

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,

More information

Accelerating linear algebra computations with hybrid GPU-multicore systems.

Accelerating linear algebra computations with hybrid GPU-multicore systems. Accelerating linear algebra computations with hybrid GPU-multicore systems. Marc Baboulin INRIA/Université Paris-Sud joint work with Jack Dongarra (University of Tennessee and Oak Ridge National Laboratory)

More information

SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD

SUMMARY INTRODUCTION BLOCK LOW-RANK MULTIFRONTAL METHOD D frequency-domain seismic modeling with a Block Low-Rank algebraic multifrontal direct solver. C. Weisbecker,, P. Amestoy, O. Boiteau, R. Brossier, A. Buttari, J.-Y. L Excellent, S. Operto, J. Virieux

More information

Some Geometric and Algebraic Aspects of Domain Decomposition Methods

Some Geometric and Algebraic Aspects of Domain Decomposition Methods Some Geometric and Algebraic Aspects of Domain Decomposition Methods D.S.Butyugin 1, Y.L.Gurieva 1, V.P.Ilin 1,2, and D.V.Perevozkin 1 Abstract Some geometric and algebraic aspects of various domain decomposition

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation

Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Nano-scale Integrated Circuit and System (NICS) Laboratory Sparse LU Factorization on GPUs for Accelerating SPICE Simulation Xiaoming Chen PhD Candidate Department of Electronic Engineering Tsinghua University,

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

Incomplete Cholesky preconditioners that exploit the low-rank property

Incomplete Cholesky preconditioners that exploit the low-rank property anapov@ulb.ac.be ; http://homepages.ulb.ac.be/ anapov/ 1 / 35 Incomplete Cholesky preconditioners that exploit the low-rank property (theory and practice) Artem Napov Service de Métrologie Nucléaire, Université

More information

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs

Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Faster Kinetics: Accelerate Your Finite-Rate Combustion Simulation with GPUs Christopher P. Stone, Ph.D. Computational Science and Engineering, LLC Kyle Niemeyer, Ph.D. Oregon State University 2 Outline

More information

Elastic full waveform inversion for near surface imaging in CMP domain Zhiyang Liu*, Jie Zhang, University of Science and Technology of China (USTC)

Elastic full waveform inversion for near surface imaging in CMP domain Zhiyang Liu*, Jie Zhang, University of Science and Technology of China (USTC) Elastic full waveform inversion for near surface imaging in CMP domain Zhiyang Liu*, Jie Zhang, University of Science and Technology of China (USTC) Summary We develop an elastic full waveform inversion

More information

A simple Concept for the Performance Analysis of Cluster-Computing

A simple Concept for the Performance Analysis of Cluster-Computing A simple Concept for the Performance Analysis of Cluster-Computing H. Kredel 1, S. Richling 2, J.P. Kruse 3, E. Strohmaier 4, H.G. Kruse 1 1 IT-Center, University of Mannheim, Germany 2 IT-Center, University

More information

Technology USA; 3 Dept. of Mathematics, University of California, Irvine. SUMMARY

Technology USA; 3 Dept. of Mathematics, University of California, Irvine. SUMMARY Adrien Scheuer, Leonardo Zepeda-Núñez,3*), Russell J. Hewett 2 and Laurent Demanet Dept. of Mathematics and Earth Resources Lab, Massachusetts Institute of Technology; 2 Total E&P Research & Technology

More information

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota

Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota Multilevel low-rank approximation preconditioners Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM CSE Boston - March 1, 2013 First: Joint work with Ruipeng Li Work

More information

Janus: FPGA Based System for Scientific Computing Filippo Mantovani

Janus: FPGA Based System for Scientific Computing Filippo Mantovani Janus: FPGA Based System for Scientific Computing Filippo Mantovani Physics Department Università degli Studi di Ferrara Ferrara, 28/09/2009 Overview: 1. The physical problem: - Ising model and Spin Glass

More information

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages

An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages PIERS ONLINE, VOL. 6, NO. 7, 2010 679 An H-LU Based Direct Finite Element Solver Accelerated by Nested Dissection for Large-scale Modeling of ICs and Packages Haixin Liu and Dan Jiao School of Electrical

More information

Measuring freeze-out parameters on the Bielefeld GPU cluster

Measuring freeze-out parameters on the Bielefeld GPU cluster Measuring freeze-out parameters on the Bielefeld GPU cluster Outline Fluctuations and the QCD phase diagram Fluctuations from Lattice QCD The Bielefeld hybrid GPU cluster Freeze-out conditions from QCD

More information

A Fast High Order Algorithm for 3D Helmholtz Equation with Dirichlet Boundary

A Fast High Order Algorithm for 3D Helmholtz Equation with Dirichlet Boundary Applied and Computational Mathematics 2018; 7(4): 180-187 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20180704.11 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Fast High Order

More information

Accelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada

Accelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada Accelerated large-scale inversion with message passing Felix J. Herrmann, the University of British Columbia, Canada SUMMARY To meet current-day challenges, exploration seismology increasingly relies on

More information

Multi-source inversion of TEM data: with field applications to Mt. Milligan

Multi-source inversion of TEM data: with field applications to Mt. Milligan Multi-source inversion of TEM data: with field applications to Mt. Milligan D. Oldenburg, E. Haber, D. Yang Geophysical Inversion Facility, University of British Columbia, Vancouver, BC, Canada Summary

More information

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb

Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters. Mike Showerman, Guochun Shi Steven Gottlieb Tuning And Understanding MILC Performance In Cray XK6 GPU Clusters Mike Showerman, Guochun Shi Steven Gottlieb Outline Background Lattice QCD and MILC GPU and Cray XK6 node architecture Implementation

More information

Acoustics Analysis of Speaker ANSYS, Inc. November 28, 2014

Acoustics Analysis of Speaker ANSYS, Inc. November 28, 2014 Acoustics Analysis of Speaker 1 Introduction ANSYS 14.0 offers many enhancements in the area of acoustics. In this presentation, an example speaker analysis will be shown to highlight some of the acoustics

More information

SUMMARY INTRODUCTION. f ad j (t) = 2 Es,r. The kernel

SUMMARY INTRODUCTION. f ad j (t) = 2 Es,r. The kernel The failure mode of correlation focusing for model velocity estimation Hyoungsu Baek 1(*), Henri Calandra 2, and Laurent Demanet 1 1 Dept. of Mathematics and Earth Resources Lab, Massachusetts Institute

More information

HOW LFE WORKS SIMEN GAURE

HOW LFE WORKS SIMEN GAURE HOW LFE WORKS SIMEN GAURE Abstract. Here is a proof for the demeaning method used in lfe, and a description of the methods used for solving the residual equations. As well as a toy-example. This is a preliminary

More information

The Block Kaczmarz Algorithm Based On Solving Linear Systems With Arrowhead Matrices

The Block Kaczmarz Algorithm Based On Solving Linear Systems With Arrowhead Matrices Applied Mathematics E-Notes, 172017, 142-156 c ISSN 1607-2510 Available free at mirror sites of http://www.math.nthu.edu.tw/ amen/ The Block Kaczmarz Algorithm Based On Solving Linear Systems With Arrowhead

More information

Parallelism in Structured Newton Computations

Parallelism in Structured Newton Computations Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca

More information

W011 Full Waveform Inversion for Detailed Velocity Model Building

W011 Full Waveform Inversion for Detailed Velocity Model Building W011 Full Waveform Inversion for Detailed Velocity Model Building S. Kapoor* (WesternGeco, LLC), D. Vigh (WesternGeco), H. Li (WesternGeco) & D. Derharoutian (WesternGeco) SUMMARY An accurate earth model

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

Accelerating Transfer Entropy Computation

Accelerating Transfer Entropy Computation Accelerating Transfer Entropy Computation Shengjia Shao, Ce Guo and Wayne Luk Department of Computing Imperial College London United Kingdom E-mail: {shengjia.shao12, ce.guo10, w.luk}@imperial.ac.uk Stephen

More information

Robust solution of Poisson-like problems with aggregation-based AMG

Robust solution of Poisson-like problems with aggregation-based AMG Robust solution of Poisson-like problems with aggregation-based AMG Yvan Notay Université Libre de Bruxelles Service de Métrologie Nucléaire Paris, January 26, 215 Supported by the Belgian FNRS http://homepages.ulb.ac.be/

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

Open-source finite element solver for domain decomposition problems

Open-source finite element solver for domain decomposition problems 1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University

More information

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012

Weather Research and Forecasting (WRF) Performance Benchmark and Profiling. July 2012 Weather Research and Forecasting (WRF) Performance Benchmark and Profiling July 2012 Note The following research was performed under the HPC Advisory Council activities Participating vendors: Intel, Dell,

More information

Sparse solver 64 bit and out-of-core addition

Sparse solver 64 bit and out-of-core addition Sparse solver 64 bit and out-of-core addition Prepared By: Richard Link Brian Yuen Martec Limited 1888 Brunswick Street, Suite 400 Halifax, Nova Scotia B3J 3J8 PWGSC Contract Number: W7707-145679 Contract

More information

Numerical Solution Techniques in Mechanical and Aerospace Engineering

Numerical Solution Techniques in Mechanical and Aerospace Engineering Numerical Solution Techniques in Mechanical and Aerospace Engineering Chunlei Liang LECTURE 3 Solvers of linear algebraic equations 3.1. Outline of Lecture Finite-difference method for a 2D elliptic PDE

More information

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2)

INF2270 Spring Philipp Häfliger. Lecture 8: Superscalar CPUs, Course Summary/Repetition (1/2) INF2270 Spring 2010 Philipp Häfliger Summary/Repetition (1/2) content From Scalar to Superscalar Lecture Summary and Brief Repetition Binary numbers Boolean Algebra Combinational Logic Circuits Encoder/Decoder

More information

arxiv: v1 [hep-lat] 8 Nov 2014

arxiv: v1 [hep-lat] 8 Nov 2014 Staggered Dslash Performance on Intel Xeon Phi Architecture arxiv:1411.2087v1 [hep-lat] 8 Nov 2014 Department of Physics, Indiana University, Bloomington IN 47405, USA E-mail: ruizli AT umail.iu.edu Steven

More information

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay

SP-CNN: A Scalable and Programmable CNN-based Accelerator. Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay SP-CNN: A Scalable and Programmable CNN-based Accelerator Dilan Manatunga Dr. Hyesoon Kim Dr. Saibal Mukhopadhyay Motivation Power is a first-order design constraint, especially for embedded devices. Certain

More information

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster

Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Performance evaluation of scalable optoelectronics application on large-scale Knights Landing cluster Yuta Hirokawa Graduate School of Systems and Information Engineering, University of Tsukuba hirokawa@hpcs.cs.tsukuba.ac.jp

More information

AN HELMHOLTZ ITERATIVE SOLVER FOR THE THREE-DIMENSIONAL SEISMIC IMAGING PROBLEMS?

AN HELMHOLTZ ITERATIVE SOLVER FOR THE THREE-DIMENSIONAL SEISMIC IMAGING PROBLEMS? European Conference on Computational Fluid Dynamics ECCOMAS CFD 2006 P. Wesseling, E. Oñate, J. Périaux (Eds) c TU Delft, Delft The Netherlands, 2006 AN HELMHOLTZ ITERATIVE SOLVER FOR THE THREE-DIMENSIONAL

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

SUMMARY REVIEW OF THE FREQUENCY DOMAIN L2 FWI-HESSIAN

SUMMARY REVIEW OF THE FREQUENCY DOMAIN L2 FWI-HESSIAN Efficient stochastic Hessian estimation for full waveform inversion Lucas A. Willemsen, Alison E. Malcolm and Russell J. Hewett, Massachusetts Institute of Technology SUMMARY In this abstract we present

More information

A frequency-space 2-D scalar wave extrapolator using extended 25-point finite-difference operator

A frequency-space 2-D scalar wave extrapolator using extended 25-point finite-difference operator GEOPHYSICS, VOL. 63, NO. 1 (JANUARY-FEBRUARY 1998); P. 289 296, 7 FIGS. A frequency-space 2-D scalar wave extrapolator using extended 25-point finite-difference operator Changsoo Shin and Heejeung Sohn

More information

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers

A Robust Preconditioned Iterative Method for the Navier-Stokes Equations with High Reynolds Numbers Applied and Computational Mathematics 2017; 6(4): 202-207 http://www.sciencepublishinggroup.com/j/acm doi: 10.11648/j.acm.20170604.18 ISSN: 2328-5605 (Print); ISSN: 2328-5613 (Online) A Robust Preconditioned

More information

SUMMARY. fractured reservoirs in the context of exploration seismology.

SUMMARY. fractured reservoirs in the context of exploration seismology. Modeling scattering of cross-well seismic waves using Radiative Transfer Theory Josimar A. Da Silva Jr, Oleg V. Poliannikov and Michael Fehler, Earth Resources Laboratory / M.I.T SUMMARY We model P and

More information

Fighting the Curse of Dimensionality: Compressive Sensing in Exploration Seismology

Fighting the Curse of Dimensionality: Compressive Sensing in Exploration Seismology Fighting the Curse of Dimensionality: Compressive Sensing in Exploration Seismology Herrmann, F.J.; Friedlander, M.P.; Yilmat, O. Signal Processing Magazine, IEEE, vol.29, no.3, pp.88-100 Andreas Gaich,

More information

A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters

A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters A Stochastic-based Optimized Schwarz Method for the Gravimetry Equations on GPU Clusters Abal-Kassim Cheik Ahamed and Frédéric Magoulès Introduction By giving another way to see beneath the Earth, gravimetry

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers

Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Block Low-Rank (BLR) approximations to improve multifrontal sparse solvers Joint work with Patrick Amestoy, Cleve Ashcraft, Olivier Boiteau, Alfredo Buttari and Jean-Yves L Excellent, PhD started on October

More information

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry

Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry Heterogeneous programming for hybrid CPU-GPU systems: Lessons learned from computational chemistry and Eugene DePrince Argonne National Laboratory (LCF and CNM) (Eugene moved to Georgia Tech last week)

More information

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota

The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota The new challenges to Krylov subspace methods Yousef Saad Department of Computer Science and Engineering University of Minnesota SIAM Applied Linear Algebra Valencia, June 18-22, 2012 Introduction Krylov

More information

Multipole-Based Preconditioners for Sparse Linear Systems.

Multipole-Based Preconditioners for Sparse Linear Systems. Multipole-Based Preconditioners for Sparse Linear Systems. Ananth Grama Purdue University. Supported by the National Science Foundation. Overview Summary of Contributions Generalized Stokes Problem Solenoidal

More information

SOLUTION of linear systems of equations of the form:

SOLUTION of linear systems of equations of the form: Proceedings of the Federated Conference on Computer Science and Information Systems pp. Mixed precision iterative refinement techniques for the WZ factorization Beata Bylina Jarosław Bylina Institute of

More information

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method A CPU-GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method Jee Choi 1, Aparna Chandramowlishwaran 3, Kamesh Madduri 4, and Richard Vuduc 2 1 ECE, Georgia Tech 2 CSE, Georgia

More information

From PZ summation to wavefield separation, mirror imaging and up-down deconvolution: the evolution of ocean-bottom seismic data processing

From PZ summation to wavefield separation, mirror imaging and up-down deconvolution: the evolution of ocean-bottom seismic data processing From PZ summation to wavefield separation, mirror imaging and up-down deconvolution: the evolution of ocean-bottom seismic data processing Sergio Grion, CGGVeritas Summary This paper discusses present

More information

Parallel Numerical Algorithms

Parallel Numerical Algorithms Parallel Numerical Algorithms Chapter 6 Structured and Low Rank Matrices Section 6.3 Numerical Optimization Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at

More information

Weaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms

Weaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms DOI: 10.1515/auom-2017-0004 An. Şt. Univ. Ovidius Constanţa Vol. 25(1),2017, 49 60 Weaker assumptions for convergence of extended block Kaczmarz and Jacobi projection algorithms Doina Carp, Ioana Pomparău,

More information

Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising

Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising Low-rank Promoting Transformations and Tensor Interpolation - Applications to Seismic Data Denoising Curt Da Silva and Felix J. Herrmann 2 Dept. of Mathematics 2 Dept. of Earth and Ocean Sciences, University

More information

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11

Matrix Computations: Direct Methods II. May 5, 2014 Lecture 11 Matrix Computations: Direct Methods II May 5, 2014 ecture Summary You have seen an example of how a typical matrix operation (an important one) can be reduced to using lower level BS routines that would

More information

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility

A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility A Sampling Kaczmarz-Motzkin Algorithm for Linear Feasibility Jamie Haddock Graduate Group in Applied Mathematics, Department of Mathematics, University of California, Davis Copper Mountain Conference on

More information

Introduction to Seismic Imaging

Introduction to Seismic Imaging Introduction to Seismic Imaging Alison Malcolm Department of Earth, Atmospheric and Planetary Sciences MIT August 20, 2010 Outline Introduction Why we image the Earth How data are collected Imaging vs

More information

P016 Toward Gauss-Newton and Exact Newton Optimization for Full Waveform Inversion

P016 Toward Gauss-Newton and Exact Newton Optimization for Full Waveform Inversion P016 Toward Gauss-Newton and Exact Newton Optiization for Full Wavefor Inversion L. Métivier* ISTerre, R. Brossier ISTerre, J. Virieux ISTerre & S. Operto Géoazur SUMMARY Full Wavefor Inversion FWI applications

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information