Susumu YAMADA 1,3 Toshiyuki IMAMURA 2,3, Masahiko MACHIDA 1,3

Size: px
Start display at page:

Download "Susumu YAMADA 1,3 Toshiyuki IMAMURA 2,3, Masahiko MACHIDA 1,3"

Transcription

1 Dynamical Variation of Eigenvalue Problems in Density-Matrix Renormalization-Group Code PP12, Feb. 15, Center for Computational Science and e-systems, Japan Atomic Energy Agency 2 The University of Electro-Communications 3 CREST(JST) Susumu YAMADA 1,3 Toshiyuki IMAMURA 2,3, Masahiko MACHIDA 1,3

2 Outline Strongly Correlated Quantum System Parallelization scheme for density matrix renormalization group method Communication strategy for a massively parallel computer Numerical experiment Auto-tuning for parallel DMRG method Conclusion

3 Strongly-correlated Quantum Systems A typical example: BiO SrO CuO Ca CuO SrO BiO High-Tc cuprate superconductors Superconducting Layer (SL) Insulating Layer (IL) Superconducting Layer (SL) Insulating Layer (IL) Cu O CuO 2 plane Cu ex. Bi 2 Sr 2 CaCu 2 O 8-δ Superconducting Layer (SL) Crystalline Structure U The Simple Model: Hubbard Model t Hamiltonian t U : Coulomb interaction t : hopping parameter

4 Density Matrix Renormalization Group renormalization renormalization 2-D direction A L system A R environment Superblock leg-direction Direct extension of DMRG method toward 2D model The dimension of the Hamiltonian increases exponentially. Parallelization of DMRG

5 Target of parallelization The time consuming operations of DMRG method Solving all eigenpairs of a density matrix (dense matrix) Solving the ground state of the Hamiltonian for the superblock All eigenstates of density matrix dense matrix ScaLAPACK The ground state of Hamiltonian large sparse matrix Iteration method is generally utilized. (Lanczos method, LOBPCG method, ) The most time consuming operation of iteration method: Hamiltonian (large sparse matrix)-vector multiplication

6 Parallelization using feature of model Superblock for quasi-2d model Divide the model into 3 blocks Block 1 Block 4 Block 1 Block 4 Block 2 Block 3 i 1 i 2 i 3 i 4 Block 2 Block 3 H H l H c H r The Hamiltonian H is decomposed as H I 4 I3 Hl I4 Hc I1 H r I2 I1 I The identity matrix whose dimension is the same as the i number of the states of the block i. Hv I4 I3 Hl v I4 Hc I1 v H r I2 I1 v Hamiltonian-vector multiplication 3 matrix-vector multiplications

7 Parallelization of matrix-vector multiplication Convert vector v into matrices V l, V c, and V r in consideration of the direct product with the identity matrix. Hv I 4 I3 Hl v HlVl I4 Hc I1 v HcVc H r I2 I1 v H rvr Three sparse matrixvector multiplications Three sparse matrix-dense matrix multiplications Parallelization of sparse matrix - dense matrix multiplication Sparse matrix partitioning dense matrix columnwisely Computation cost can be partitioned equally. Transformation of the partitioned data of matrices V l, V c, and V r all-to-all communication

8 Communication for transformation between partitioned matrices The all-to-all communication can realize the transformation between the data of the partitioned matrices V l, V c, and V r. Conflict process 0 process 1 process 2 process COM1 COM2 V l COM3 COM Ex. All-to-all communication on 4 processes V r V c The communication conflict occurs, because of the communication on all processes simultaneously. The all-to-all communication is not suitable for a massively parallel computer.

9 2-step communication All-to-all communication on all processes can be avoided by doubling the communication. process 0 process 1 process 2 process The total amount of communication data is the same as the all-to-all communication. V l COM1 COM V c COM2 COM Ex. 2-step communication on 4 processes V r The communication conflict decreases. But, the amount of communication data becomes double.

10 Numerical Experiment T2K Open Supercomputer (Todai Combined Cluster) The University of Tokyo Processor:AMD Opteron 8356 Quad core (2.3GHz) Number of processors per node :4 (16 cores) Network:Myrinet-10G link Bandwidth: 5GB/s full-duplex Compiler:Intel Fortran Compiler 11.0 Option:-O3 ip Parallelization:FlatMPI

11 Total elapsed time (sec) Numerical Experiment 4x10-site Hubbard model 19 up-spins, 19 down-spins U/t=10 64 cores 128 cores Elapsed time cores 512 cores cores Number of states kept (m) Conventional all-to-all communication cores 128 cores 256 cores 512 cores 1024 cores Number of states kept (m) 2-step communication Speed down on 1024 cores

12 Elapsed time (sec) Reason for speed down Elapsed time (sec) Communication and calculation time distribution for matrix-vector multiplication (m=200) Conventional communication 2-step communication cores 1024 cores COM 1 COM 2 COM 3 COM 4 calculation All communication times decrease Conventional communication 2-step communication COM2 and COM3 increase. V l V c V r COM1 COM4 COM2 COM3 No problem Factor in speed down

13 Reason for speed down Ex. Parallel computer with 8 dual-core processors COM1,COM4 COM2,COM3 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 core core processor P 10 P 11 P 12 P 13 P 14 The conflict hardly occurs, because of local communicating. P 15 P 8 P 9 P 10 P 11 P 12 P 13 P 14 The conflict may occur frequently, because of global communication. P 15

14 Scheduling for overlapping the calculation and the communication P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 P 12 P 13 P 14 P 15 Execute the communication per each group one by one. Some cores, which do not execute the communication, become idle. Communication conflict can be avoided. Execute the calculation on the idle cores. Overlapping calculation and communication

15 Elapsed time (sec) Total elapsed time (sec) Effect of overlapping the calculation and the communication 4x10-site Hubbard model 19 up-spins, 19 down-spins U/t= T2K Open Supercomputer (Todai Combined Cluster) Parallelization:FlatMPI Total elapsed time Matrix-vector multiplication time (overlap method) cores 128 cores 1024 cores cores 512 cores cores speedup Conventional step Overlap method communication method Number of states kept (m) COM 1 COM 2 COM 3 COM 4 calculation calculation+communication Speedup up to 1024 cores

16 Targets of auto-tuning for parallel DMRG method In our parallel strategy, performance of two operations strongly depend on the computer architecture. Pattern of communication group for the 2-step all-to-all communication Eigenvalue problem for density matrix

17 Pattern of communication group for 2-step all-to-all communication The network architecture of a multi-core parallel computer system is complex and often heterogeneous. We can choose various pattern of the communication groups for the 2-step all-to-all communication. Example patterns of communication groups for COM1 and COM4 on parallel computer with 8 dual-core processors 4 groups of 4 processes 8 groups of 2 processes P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 0 P 1 P 2 P 3 P 4 P 5 P 6 P 7 P 8 P 9 P 10 P 11 P 12 P 13 P 14 P 15 P 8 P 9 P 10 P 11 P 12 P 13 P 14 P 15

18 Total elapsed time (sec) Elapsed time for patterns of communication group 4x10-site Hubbard model 18 up-spins, 18 down-spins, U/t=10, Number of states kept : 400 FUJITSU PRIMERGY BX900 (Japan Atomic Energy Agency), 1024 cores 2000 The optimal case Number of communication groups of COM1 and COM 4 The performance strongly depends on the number of the communication groups. We have to optimize the pattern by executing DMRG method on various patterns. Auto tuning is required.

19 Eigenvalue problem for density matrix Density matrix Block diagonal matrix The dimension of each block matrix is various. Assign all processors to the large matrix. Ex. Assign the optimal number of processors to each problem. Ex. A B C D All PE s Serial computing (1 PE) A B C D 1000 PE s 100 PE s 10 PE s 1120 PE s It is very difficult to estimate the optimal number of processors theoretically. Auto-tuning is demanded.

20 Conclusion We proposed the parallelization strategy of DMRG method for quasi-2-dimensional quantum model. Key point Hamiltonian (sparse matrix) vector multiplication Sparse matrix- dense matrix multiplication using the property of the quantum model Parallelization by decomposing dense matrix All-to-all communication Strategy for avoiding conflict 2-step communication Overlapping for communication and calculation Our method can obtain the parallel efficiency up to 1024 cores. In future work We develop auto-tuning schemes to optimize: the dividing pattern of communication group for 2-step communication, the parallel eigenvalue solver for density matrix.

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems

Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems Parallelization Strategies for Density Matrix Renormalization Group algorithms on Shared-Memory Systems G. Hager HPC Services, Computing Center Erlangen, Germany E. Jeckelmann Theoretical Physics, Univ.

More information

Ultra-Large Scale Simulations for Superconductor MgB 2 Device toward Nuclear Application and Fundamental Issues in Nano-structured Superconductors

Ultra-Large Scale Simulations for Superconductor MgB 2 Device toward Nuclear Application and Fundamental Issues in Nano-structured Superconductors Chapter 3 Epoch Making Simulation Ultra-Large Scale Simulations for Superconductor MgB 2 Device toward Nuclear Application and Fundamental Issues in Nano-structured Superconductors Project Representative

More information

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations

A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations A parameter tuning technique of a weighted Jacobi-type preconditioner and its application to supernova simulations Akira IMAKURA Center for Computational Sciences, University of Tsukuba Joint work with

More information

Computational strongly correlated materials R. Torsten Clay Physics & Astronomy

Computational strongly correlated materials R. Torsten Clay Physics & Astronomy Computational strongly correlated materials R. Torsten Clay Physics & Astronomy Current/recent students Saurabh Dayal (current PhD student) Wasanthi De Silva (new grad student 212) Jeong-Pil Song (finished

More information

Extreme scale simulations of high-temperature superconductivity. Thomas C. Schulthess

Extreme scale simulations of high-temperature superconductivity. Thomas C. Schulthess Extreme scale simulations of high-temperature superconductivity Thomas C. Schulthess T [K] Superconductivity: a state of matter with zero electrical resistivity Heike Kamerlingh Onnes (1853-1926) Discovery

More information

How to model holes doped into a cuprate layer

How to model holes doped into a cuprate layer How to model holes doped into a cuprate layer Mona Berciu University of British Columbia With: George Sawatzky and Bayo Lau Hadi Ebrahimnejad, Mirko Moller, and Clemens Adolphs Stewart Blusson Institute

More information

Dynamical properties of strongly correlated electron systems studied by the density-matrix renormalization group (DMRG) Takami Tohyama

Dynamical properties of strongly correlated electron systems studied by the density-matrix renormalization group (DMRG) Takami Tohyama Dynamical properties of strongly correlated electron systems studied by the density-matrix renormalization group (DMRG) Takami Tohyama Tokyo University of Science Shigetoshi Sota AICS, RIKEN Outline Density-matrix

More information

ANTIFERROMAGNETIC EXCHANGE AND SPIN-FLUCTUATION PAIRING IN CUPRATES

ANTIFERROMAGNETIC EXCHANGE AND SPIN-FLUCTUATION PAIRING IN CUPRATES ANTIFERROMAGNETIC EXCHANGE AND SPIN-FLUCTUATION PAIRING IN CUPRATES N.M.Plakida Joint Institute for Nuclear Research, Dubna, Russia CORPES, Dresden, 26.05.2005 Publications and collaborators: N.M. Plakida,

More information

Quantum Lattice Models & Introduction to Exact Diagonalization

Quantum Lattice Models & Introduction to Exact Diagonalization Quantum Lattice Models & Introduction to Exact Diagonalization H! = E! Andreas Läuchli IRRMA EPF Lausanne ALPS User Workshop CSCS Manno, 28/9/2004 Outline of this lecture: Quantum Lattice Models Lattices

More information

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute

More information

TFlops and 159-Billion-dimensional Exact-diagonalization for Trapped Fermion-Hubbard Model on the Earth Simulator

TFlops and 159-Billion-dimensional Exact-diagonalization for Trapped Fermion-Hubbard Model on the Earth Simulator 16.447 TFlops and 159-Billion-dimensional Exact-diagonalization for Trapped Fermion-Hubbard Model on the Earth Simulator Susumu Yamada Japan Atomic Energy Research Institute 6-9-3 Higashi-Ueno, Taito-ku

More information

Physics 215 Quantum Mechanics 1 Assignment 1

Physics 215 Quantum Mechanics 1 Assignment 1 Physics 5 Quantum Mechanics Assignment Logan A. Morrison January 9, 06 Problem Prove via the dual correspondence definition that the hermitian conjugate of α β is β α. By definition, the hermitian conjugate

More information

arxiv: v1 [hep-lat] 19 Jul 2009

arxiv: v1 [hep-lat] 19 Jul 2009 arxiv:0907.3261v1 [hep-lat] 19 Jul 2009 Application of preconditioned block BiCGGR to the Wilson-Dirac equation with multiple right-hand sides in lattice QCD Abstract H. Tadano a,b, Y. Kuramashi c,b, T.

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Review: From problem to parallel algorithm

Review: From problem to parallel algorithm Review: From problem to parallel algorithm Mathematical formulations of interesting problems abound Poisson s equation Sources: Electrostatics, gravity, fluid flow, image processing (!) Numerical solution:

More information

Parallel Eigensolver Performance on High Performance Computers

Parallel Eigensolver Performance on High Performance Computers Parallel Eigensolver Performance on High Performance Computers Andrew Sunderland Advanced Research Computing Group STFC Daresbury Laboratory CUG 2008 Helsinki 1 Summary (Briefly) Introduce parallel diagonalization

More information

A Twisted Ladder: Relating the Iron Superconductors and the High-Tc Cuprates

A Twisted Ladder: Relating the Iron Superconductors and the High-Tc Cuprates A Twisted Ladder: Relating the Iron Superconductors and the High-Tc Cuprates arxiv:0905.1096, To appear in New. J. Phys. Erez Berg 1, Steven A. Kivelson 1, Doug J. Scalapino 2 1 Stanford University, 2

More information

Striping in Cuprates. Michael Bertolli. Solid State II Elbio Dagotto Spring 2008 Department of Physics, Univ. of Tennessee

Striping in Cuprates. Michael Bertolli. Solid State II Elbio Dagotto Spring 2008 Department of Physics, Univ. of Tennessee Striping in Cuprates Michael Bertolli Solid State II Elbio Dagotto Spring 2008 Department of Physics, Univ. of Tennessee Outline Introduction Basics of Striping Implications to Superconductivity Experimental

More information

Quasiparticle dynamics and interactions in non uniformly polarizable solids

Quasiparticle dynamics and interactions in non uniformly polarizable solids Quasiparticle dynamics and interactions in non uniformly polarizable solids Mona Berciu University of British Columbia à beautiful physics that George Sawatzky has been pursuing for a long time à today,

More information

Angle-Resolved Two-Photon Photoemission of Mott Insulator

Angle-Resolved Two-Photon Photoemission of Mott Insulator Angle-Resolved Two-Photon Photoemission of Mott Insulator Takami Tohyama Institute for Materials Research (IMR) Tohoku University, Sendai Collaborators IMR: H. Onodera, K. Tsutsui, S. Maekawa H. Onodera

More information

Numerical Methods in Many-body Physics

Numerical Methods in Many-body Physics Numerical Methods in Many-body Physics Reinhard M. Noack Philipps-Universität Marburg Exchange Lecture BME, Budapest, Spring 2007 International Research Training Group 790 Electron-Electron Interactions

More information

High Temperature Cuprate Superconductors

High Temperature Cuprate Superconductors High Temperature Cuprate Superconductors Theoretical Physics Year 4 Project T. K. Kingsman School of Physics and Astronomy University of Birmingham March 1, 2015 Outline 1 Introduction Cuprate Structure

More information

Momentum-space and Hybrid Real- Momentum Space DMRG applied to the Hubbard Model

Momentum-space and Hybrid Real- Momentum Space DMRG applied to the Hubbard Model Momentum-space and Hybrid Real- Momentum Space DMRG applied to the Hubbard Model Örs Legeza Reinhard M. Noack Collaborators Georg Ehlers Jeno Sólyom Gergely Barcza Steven R. White Collaborators Georg Ehlers

More information

Journal Club: Brief Introduction to Tensor Network

Journal Club: Brief Introduction to Tensor Network Journal Club: Brief Introduction to Tensor Network Wei-Han Hsiao a a The University of Chicago E-mail: weihanhsiao@uchicago.edu Abstract: This note summarizes the talk given on March 8th 2016 which was

More information

The end is (not) in sight: exact diagonalization, Lanczos, and DMRG

The end is (not) in sight: exact diagonalization, Lanczos, and DMRG The end is (not) in sight: exact diagonalization, Lanczos, and DMRG Jürgen Schnack, Matthias Exler, Peter Hage, Frank Hesmer Department of Physics - University of Osnabrück http://www.physik.uni-osnabrueck.de/makrosysteme/

More information

Numerical diagonalization studies of quantum spin chains

Numerical diagonalization studies of quantum spin chains PY 502, Computational Physics, Fall 2016 Anders W. Sandvik, Boston University Numerical diagonalization studies of quantum spin chains Introduction to computational studies of spin chains Using basis states

More information

Real-Space Renormalization Group (RSRG) Approach to Quantum Spin Lattice Systems

Real-Space Renormalization Group (RSRG) Approach to Quantum Spin Lattice Systems WDS'11 Proceedings of Contributed Papers, Part III, 49 54, 011. ISBN 978-80-7378-186-6 MATFYZPRESS Real-Space Renormalization Group (RSRG) Approach to Quantum Spin Lattice Systems A. S. Serov and G. V.

More information

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems

Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Static-scheduling and hybrid-programming in SuperLU DIST on multicore cluster systems Ichitaro Yamazaki University of Tennessee, Knoxville Xiaoye Sherry Li Lawrence Berkeley National Laboratory MS49: Sparse

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

ab initio Electronic Structure Calculations

ab initio Electronic Structure Calculations ab initio Electronic Structure Calculations New scalability frontiers using the BG/L Supercomputer C. Bekas, A. Curioni and W. Andreoni IBM, Zurich Research Laboratory Rueschlikon 8803, Switzerland ab

More information

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners

Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners Leveraging Task-Parallelism in Energy-Efficient ILU Preconditioners José I. Aliaga Leveraging task-parallelism in energy-efficient ILU preconditioners Universidad Jaime I (Castellón, Spain) José I. Aliaga

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters

Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Dynamic Scheduling for Work Agglomeration on Heterogeneous Clusters Jonathan Lifflander, G. Carl Evans, Anshu Arya, Laxmikant Kale University of Illinois Urbana-Champaign May 25, 2012 Work is overdecomposed

More information

Parallel Preconditioning Methods for Ill-conditioned Problems

Parallel Preconditioning Methods for Ill-conditioned Problems Parallel Preconditioning Methods for Ill-conditioned Problems Kengo Nakajima Information Technology Center, The University of Tokyo 2014 Conference on Advanced Topics and Auto Tuning in High Performance

More information

Efficient implementation of the overlap operator on multi-gpus

Efficient implementation of the overlap operator on multi-gpus Efficient implementation of the overlap operator on multi-gpus Andrei Alexandru Mike Lujan, Craig Pelissier, Ben Gamari, Frank Lee SAAHPC 2011 - University of Tennessee Outline Motivation Overlap operator

More information

Sakurai-Sugiura algorithm based eigenvalue solver for Siesta. Georg Huhs

Sakurai-Sugiura algorithm based eigenvalue solver for Siesta. Georg Huhs Sakurai-Sugiura algorithm based eigenvalue solver for Siesta Georg Huhs Motivation Timing analysis for one SCF-loop iteration: left: CNT/Graphene, right: DNA Siesta Specifics High fraction of EVs needed

More information

Parallel Eigensolver Performance on High Performance Computers 1

Parallel Eigensolver Performance on High Performance Computers 1 Parallel Eigensolver Performance on High Performance Computers 1 Andrew Sunderland STFC Daresbury Laboratory, Warrington, UK Abstract Eigenvalue and eigenvector computations arise in a wide range of scientific

More information

Making electronic structure methods scale: Large systems and (massively) parallel computing

Making electronic structure methods scale: Large systems and (massively) parallel computing AB Making electronic structure methods scale: Large systems and (massively) parallel computing Ville Havu Department of Applied Physics Helsinki University of Technology - TKK Ville.Havu@tkk.fi 1 Outline

More information

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors

Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors Parallelization of Multilevel Preconditioners Constructed from Inverse-Based ILUs on Shared-Memory Multiprocessors J.I. Aliaga 1 M. Bollhöfer 2 A.F. Martín 1 E.S. Quintana-Ortí 1 1 Deparment of Computer

More information

The advent of computer era has opened the possibility to perform large scale

The advent of computer era has opened the possibility to perform large scale Chapter 2 Density Matrix Renormalization Group 2.1 Introduction The advent of computer era has opened the possibility to perform large scale numerical simulations of the quantum many-body systems and thus

More information

Parallel sparse direct solvers for Poisson s equation in streamer discharges

Parallel sparse direct solvers for Poisson s equation in streamer discharges Parallel sparse direct solvers for Poisson s equation in streamer discharges Margreet Nool, Menno Genseberger 2 and Ute Ebert,3 Centrum Wiskunde & Informatica (CWI), P.O.Box 9479, 9 GB Amsterdam, The Netherlands

More information

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication.

CME342 Parallel Methods in Numerical Analysis. Matrix Computation: Iterative Methods II. Sparse Matrix-vector Multiplication. CME342 Parallel Methods in Numerical Analysis Matrix Computation: Iterative Methods II Outline: CG & its parallelization. Sparse Matrix-vector Multiplication. 1 Basic iterative methods: Ax = b r = b Ax

More information

NON EQUILIBRIUM DYNAMICS OF QUANTUM ISING CHAINS IN THE PRESENCE OF TRANSVERSE AND LONGITUDINAL MAGNETIC FIELDS

NON EQUILIBRIUM DYNAMICS OF QUANTUM ISING CHAINS IN THE PRESENCE OF TRANSVERSE AND LONGITUDINAL MAGNETIC FIELDS NON EQUILIBRIUM DYNAMICS OF QUANTUM ISING CHAINS IN THE PRESENCE OF TRANSVERSE AND LONGITUDINAL MAGNETIC FIELDS by Zahra Mokhtari THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE

More information

arxiv:cond-mat/ v2 [cond-mat.str-el] 27 Dec 1999

arxiv:cond-mat/ v2 [cond-mat.str-el] 27 Dec 1999 Phase separation in t-j ladders Stefan Rommer and Steven R. White Department of Physics and Astronomy, University of California, Irvine, California 9697 D. J. Scalapino Department of Physics, University

More information

Multi-Length Scale Matrix Computations and Applications in Quantum Mechanical Simulations

Multi-Length Scale Matrix Computations and Applications in Quantum Mechanical Simulations Multi-Length Scale Matrix Computations and Applications in Quantum Mechanical Simulations Zhaojun Bai http://www.cs.ucdavis.edu/ bai joint work with Wenbin Chen, Roger Lee, Richard Scalettar, Ichitaro

More information

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC

Hybrid static/dynamic scheduling for already optimized dense matrix factorization. Joint Laboratory for Petascale Computing, INRIA-UIUC Hybrid static/dynamic scheduling for already optimized dense matrix factorization Simplice Donfack, Laura Grigori, INRIA, France Bill Gropp, Vivek Kale UIUC, USA Joint Laboratory for Petascale Computing,

More information

4 Matrix product states

4 Matrix product states Physics 3b Lecture 5 Caltech, 05//7 4 Matrix product states Matrix product state (MPS) is a highly useful tool in the study of interacting quantum systems in one dimension, both analytically and numerically.

More information

De l atome au. supraconducteur à haute température critique. O. Parcollet Institut de Physique Théorique CEA-Saclay, France

De l atome au. supraconducteur à haute température critique. O. Parcollet Institut de Physique Théorique CEA-Saclay, France De l atome au 1 supraconducteur à haute température critique O. Parcollet Institut de Physique Théorique CEA-Saclay, France Quantum liquids Quantum many-body systems, fermions (or bosons), with interactions,

More information

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers

ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers ELSI: A Unified Software Interface for Kohn-Sham Electronic Structure Solvers Victor Yu and the ELSI team Department of Mechanical Engineering & Materials Science Duke University Kohn-Sham Density-Functional

More information

Quantum spin systems - models and computational methods

Quantum spin systems - models and computational methods Summer School on Computational Statistical Physics August 4-11, 2010, NCCU, Taipei, Taiwan Quantum spin systems - models and computational methods Anders W. Sandvik, Boston University Lecture outline Introduction

More information

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters

A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters A Quantum Chemistry Domain-Specific Language for Heterogeneous Clusters ANTONINO TUMEO, ORESTE VILLA Collaborators: Karol Kowalski, Sriram Krishnamoorthy, Wenjing Ma, Simone Secchi May 15, 2012 1 Outline!

More information

Renormalization of Tensor- Network States Tao Xiang

Renormalization of Tensor- Network States Tao Xiang Renormalization of Tensor- Network States Tao Xiang Institute of Physics/Institute of Theoretical Physics Chinese Academy of Sciences txiang@iphy.ac.cn Physical Background: characteristic energy scales

More information

Exact results concerning the phase diagram of the Hubbard Model

Exact results concerning the phase diagram of the Hubbard Model Steve Kivelson Apr 15, 2011 Freedman Symposium Exact results concerning the phase diagram of the Hubbard Model S.Raghu, D.J. Scalapino, Li Liu, E. Berg H. Yao, W-F. Tsai, A. Lauchli G. Karakonstantakis,

More information

arxiv: v1 [cond-mat.str-el] 22 Jun 2007

arxiv: v1 [cond-mat.str-el] 22 Jun 2007 Optimized implementation of the Lanczos method for magnetic systems arxiv:0706.3293v1 [cond-mat.str-el] 22 Jun 2007 Jürgen Schnack a, a Universität Bielefeld, Fakultät für Physik, Postfach 100131, D-33501

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

X. Zotos - Research Publications

X. Zotos - Research Publications X. Zotos - Research Publications After 2004 1. Phonon-Magnon Interaction in Low Dimensional Quantum Magnets Observed by Dynamic Heat Transport Measurements, M. Montagnese, M. Otter, X. Zotos et al., Physical

More information

One-dimensional electron-phonon systems: Mott- versus Peierls-insulators

One-dimensional electron-phonon systems: Mott- versus Peierls-insulators One-dimensional electron-phonon systems: Mott- versus Peierls-insulators H. Fehske 1,, G. Wellein 3, A. P. Kampf 4, M. Sekania 4, G. Hager 3, A. Weiße, H. Büttner, and A. R. Bishop 5 1 Institut für Physik,

More information

Quantum Cluster Methods: An introduction

Quantum Cluster Methods: An introduction Quantum Cluster Methods: An introduction David Sénéchal Département de physique, Université de Sherbrooke International summer school on New trends in computational approaches for many-body systems May

More information

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM

PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM Proceedings of ALGORITMY 25 pp. 22 211 PRECONDITIONING IN THE PARALLEL BLOCK-JACOBI SVD ALGORITHM GABRIEL OKŠA AND MARIÁN VAJTERŠIC Abstract. One way, how to speed up the computation of the singular value

More information

Tuning order in cuprate superconductors

Tuning order in cuprate superconductors Tuning order in cuprate superconductors arxiv:cond-mat/0201401 v1 23 Jan 2002 Subir Sachdev 1 and Shou-Cheng Zhang 2 1 Department of Physics, Yale University, P.O. Box 208120, New Haven, CT 06520-8120,

More information

High temperature superconductivity - insights from Angle Resolved Photoemission Spectroscopy

High temperature superconductivity - insights from Angle Resolved Photoemission Spectroscopy High temperature superconductivity - insights from Angle Resolved Photoemission Spectroscopy Adam Kaminski Ames Laboratory and Iowa State University Funding: Ames Laboratory - US Department of Energy Ames

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization)

A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) A model leading to self-consistent iteration computation with need for HP LA (e.g, diagonalization and orthogonalization) Schodinger equation: Hψ = Eψ Choose a basis set of wave functions Two cases: Orthonormal

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Numerical Studies of the 2D Hubbard Model

Numerical Studies of the 2D Hubbard Model arxiv:cond-mat/0610710v1 [cond-mat.str-el] 25 Oct 2006 Numerical Studies of the 2D Hubbard Model D.J. Scalapino Department of Physics, University of California, Santa Barbara, CA 93106-9530, USA Abstract

More information

Introduction to DMFT

Introduction to DMFT Introduction to DMFT Lecture 2 : DMFT formalism 1 Toulouse, May 25th 2007 O. Parcollet 1. Derivation of the DMFT equations 2. Impurity solvers. 1 Derivation of DMFT equations 2 Cavity method. Large dimension

More information

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems

Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Balanced Truncation Model Reduction of Large and Sparse Generalized Linear Systems Jos M. Badía 1, Peter Benner 2, Rafael Mayo 1, Enrique S. Quintana-Ortí 1, Gregorio Quintana-Ortí 1, A. Remón 1 1 Depto.

More information

Large-scale Simulation for a Terahertz Resonance Superconductor Device

Large-scale Simulation for a Terahertz Resonance Superconductor Device Large-scale Simulation for a Terahertz Resonance Superconductor Device Project Representative Masashi Tachiki Research Organization for Information Science and Technology Authors Mikio Iizuka 1, Masashi

More information

The Hubbard model out of equilibrium - Insights from DMFT -

The Hubbard model out of equilibrium - Insights from DMFT - The Hubbard model out of equilibrium - Insights from DMFT - t U Philipp Werner University of Fribourg, Switzerland KITP, October 212 The Hubbard model out of equilibrium - Insights from DMFT - In collaboration

More information

Introduction to Superconductivity. Superconductivity was discovered in 1911 by Kamerlingh Onnes. Zero electrical resistance

Introduction to Superconductivity. Superconductivity was discovered in 1911 by Kamerlingh Onnes. Zero electrical resistance Introduction to Superconductivity Superconductivity was discovered in 1911 by Kamerlingh Onnes. Zero electrical resistance Meissner Effect Magnetic field expelled. Superconducting surface current ensures

More information

H ψ = E ψ. Introduction to Exact Diagonalization. Andreas Läuchli, New states of quantum matter MPI für Physik komplexer Systeme - Dresden

H ψ = E ψ. Introduction to Exact Diagonalization. Andreas Läuchli, New states of quantum matter MPI für Physik komplexer Systeme - Dresden H ψ = E ψ Introduction to Exact Diagonalization Andreas Läuchli, New states of quantum matter MPI für Physik komplexer Systeme - Dresden http://www.pks.mpg.de/~aml laeuchli@comp-phys.org Simulations of

More information

Performance Analysis of Lattice QCD Application with APGAS Programming Model

Performance Analysis of Lattice QCD Application with APGAS Programming Model Performance Analysis of Lattice QCD Application with APGAS Programming Model Koichi Shirahata 1, Jun Doi 2, Mikio Takeuchi 2 1: Tokyo Institute of Technology 2: IBM Research - Tokyo Programming Models

More information

Introduction to numerical computations on the GPU

Introduction to numerical computations on the GPU Introduction to numerical computations on the GPU Lucian Covaci http://lucian.covaci.org/cuda.pdf Tuesday 1 November 11 1 2 Outline: NVIDIA Tesla and Geforce video cards: architecture CUDA - C: programming

More information

Introduction to tensor network state -- concept and algorithm. Z. Y. Xie ( 谢志远 ) ITP, Beijing

Introduction to tensor network state -- concept and algorithm. Z. Y. Xie ( 谢志远 ) ITP, Beijing Introduction to tensor network state -- concept and algorithm Z. Y. Xie ( 谢志远 ) 2018.10.29 ITP, Beijing Outline Illusion of complexity of Hilbert space Matrix product state (MPS) as lowly-entangled state

More information

Time Evolving Block Decimation Algorithm

Time Evolving Block Decimation Algorithm Time Evolving Block Decimation Algorithm Application to bosons on a lattice Jakub Zakrzewski Marian Smoluchowski Institute of Physics and Mark Kac Complex Systems Research Center, Jagiellonian University,

More information

arxiv:cond-mat/ v2 [cond-mat.str-el] 24 Feb 2006

arxiv:cond-mat/ v2 [cond-mat.str-el] 24 Feb 2006 Applications of Cluster Perturbation Theory Using Quantum Monte Carlo Data arxiv:cond-mat/0512406v2 [cond-mat.str-el] 24 Feb 2006 Fei Lin, Erik S. Sørensen, Catherine Kallin and A. John Berlinsky Department

More information

Communication-avoiding LU and QR factorizations for multicore architectures

Communication-avoiding LU and QR factorizations for multicore architectures Communication-avoiding LU and QR factorizations for multicore architectures DONFACK Simplice INRIA Saclay Joint work with Laura Grigori INRIA Saclay Alok Kumar Gupta BCCS,Norway-5075 16th April 2010 Communication-avoiding

More information

A knowledge-based approach to high-performance computing in ab initio simulations.

A knowledge-based approach to high-performance computing in ab initio simulations. Mitglied der Helmholtz-Gemeinschaft A knowledge-based approach to high-performance computing in ab initio simulations. AICES Advisory Board Meeting. July 14th 2014 Edoardo Di Napoli Academic background

More information

Quantum Cluster Methods (CPT/CDMFT)

Quantum Cluster Methods (CPT/CDMFT) Quantum Cluster Methods (CPT/CDMFT) David Sénéchal Département de physique Université de Sherbrooke Sherbrooke (Québec) Canada Autumn School on Correlated Electrons Forschungszentrum Jülich, Sept. 24,

More information

High-T c superconductors

High-T c superconductors High-T c superconductors Parent insulators Carrier doping Band structure and Fermi surface Pseudogap, superconducting gap, superfluid Nodal states Bilayer, trilayer Stripes High-T c superconductors Parent

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2

Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption. Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 1 / 23 Parallel Asynchronous Hybrid Krylov Methods for Minimization of Energy Consumption Langshi CHEN 1,2,3 Supervised by Serge PETITON 2 Maison de la Simulation Lille 1 University CNRS March 18, 2013

More information

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment

QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment QR Factorization of Tall and Skinny Matrices in a Grid Computing Environment Emmanuel AGULLO (INRIA / LaBRI) Camille COTI (Iowa State University) Jack DONGARRA (University of Tennessee) Thomas HÉRAULT

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

An introduction to the dynamical mean-field theory. L. V. Pourovskii

An introduction to the dynamical mean-field theory. L. V. Pourovskii An introduction to the dynamical mean-field theory L. V. Pourovskii Nordita school on Photon-Matter interaction, Stockholm, 06.10.2016 OUTLINE The standard density-functional-theory (DFT) framework An

More information

Computational Approaches to Quantum Critical Phenomena ( ) ISSP. Fermion Simulations. July 31, Univ. Tokyo M. Imada.

Computational Approaches to Quantum Critical Phenomena ( ) ISSP. Fermion Simulations. July 31, Univ. Tokyo M. Imada. Computational Approaches to Quantum Critical Phenomena (2006.7.17-8.11) ISSP Fermion Simulations July 31, 2006 ISSP, Kashiwa Univ. Tokyo M. Imada collaboration T. Kashima, Y. Noda, H. Morita, T. Mizusaki,

More information

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata

Parallelization of the Dirac operator. Pushan Majumdar. Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Parallelization of the Dirac operator Pushan Majumdar Indian Association for the Cultivation of Sciences, Jadavpur, Kolkata Outline Introduction Algorithms Parallelization Comparison of performances Conclusions

More information

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems

Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Mitglied der Helmholtz-Gemeinschaft Block Iterative Eigensolvers for Sequences of Dense Correlated Eigenvalue Problems Birkbeck University, London, June the 29th 2012 Edoardo Di Napoli Motivation and Goals

More information

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method

Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Algorithm for Sparse Approximate Inverse Preconditioners in the Conjugate Gradient Method Ilya B. Labutin A.A. Trofimuk Institute of Petroleum Geology and Geophysics SB RAS, 3, acad. Koptyug Ave., Novosibirsk

More information

FROM NODAL LIQUID TO NODAL INSULATOR

FROM NODAL LIQUID TO NODAL INSULATOR FROM NODAL LIQUID TO NODAL INSULATOR Collaborators: Urs Ledermann and Maurice Rice John Hopkinson (Toronto) GORDON, 2004, Oxford Doped Mott insulator? Mott physics: U Antiferro fluctuations: J SC fluctuations

More information

Techniques for translationally invariant matrix product states

Techniques for translationally invariant matrix product states Techniques for translationally invariant matrix product states Ian McCulloch University of Queensland Centre for Engineered Quantum Systems (EQuS) 7 Dec 2017 Ian McCulloch (UQ) imps 7 Dec 2017 1 / 33 Outline

More information

Performance Evaluation of MPI on Weather and Hydrological Models

Performance Evaluation of MPI on Weather and Hydrological Models NCAR/RAL Performance Evaluation of MPI on Weather and Hydrological Models Alessandro Fanfarillo elfanfa@ucar.edu August 8th 2018 Cheyenne - NCAR Supercomputer Cheyenne is a 5.34-petaflops, high-performance

More information

Porting a sphere optimization program from LAPACK to ScaLAPACK

Porting a sphere optimization program from LAPACK to ScaLAPACK Porting a sphere optimization program from LAPACK to ScaLAPACK Mathematical Sciences Institute, Australian National University. For presentation at Computational Techniques and Applications Conference

More information

Introduction to Density Functional Theory

Introduction to Density Functional Theory 1 Introduction to Density Functional Theory 21 February 2011; V172 P.Ravindran, FME-course on Ab initio Modelling of solar cell Materials 21 February 2011 Introduction to DFT 2 3 4 Ab initio Computational

More information

Superconductivity in Fe-based ladder compound BaFe 2 S 3

Superconductivity in Fe-based ladder compound BaFe 2 S 3 02/24/16 QMS2016 @ Incheon Superconductivity in Fe-based ladder compound BaFe 2 S 3 Tohoku University Kenya OHGUSHI Outline Introduction Fe-based ladder material BaFe 2 S 3 Basic physical properties High-pressure

More information

New trends in density matrix renormalization

New trends in density matrix renormalization Advances in Physics, Vol. 55, Nos. 5 6, July October 2006, 477 526 New trends in density matrix renormalization KAREN A. HALLBERG Instituto Balseiro and Centro Ato mico Bariloche, Comisio n Nacional de

More information

All-electron density functional theory on Intel MIC: Elk

All-electron density functional theory on Intel MIC: Elk All-electron density functional theory on Intel MIC: Elk W. Scott Thornton, R.J. Harrison Abstract We present the results of the porting of the full potential linear augmented plane-wave solver, Elk [1],

More information

Magnetic-field-tuned superconductor-insulator transition in underdoped La 2-x Sr x CuO 4

Magnetic-field-tuned superconductor-insulator transition in underdoped La 2-x Sr x CuO 4 Magnetic-field-tuned superconductor-insulator transition in underdoped La 2-x Sr x CuO 4 Dragana Popović National High Magnetic Field Laboratory Florida State University, Tallahassee, FL, USA Collaborators

More information

arxiv:cond-mat/ v1 [cond-mat.str-el] 4 Sep 2006

arxiv:cond-mat/ v1 [cond-mat.str-el] 4 Sep 2006 Advances in Physics Vol. 00, No. 00, January-February 2005, 1 54 arxiv:cond-mat/0609039v1 [cond-mat.str-el] 4 Sep 2006 New Trends in Density Matrix Renormalization KAREN A. HALLBERG Instituto Balseiro

More information