An FPGA Implementation of Reciprocal Sums for SPME Sam Lee and Paul Chow Edward S. Rogers Sr. Department of Electrical and Computer Engineering University of Toronto
Objectives Accelerate part of Molecular Dynamics Simulation Smooth Particle Mesh Ewald Implementation FPGA based Try it and learn Investigation Acceleration bottleneck Precision requirement Parallelization strategy
Presentation Outline Molecular Dynamics SPME The Reciprocal Sum Compute Engine Speedup and Parallelization Precision Future work
Molecular Dynamics Simulation 4
Molecular Dynamics Combines empirical force calculations with Newton s equations of motion. Predict the time trajectory of small atomic systems. Computationally demanding.. Calculate interatomic forces.. Calculate the net force.. Integrate Newton s equations of motion. a r = F m ( t + δt ) = r () t + δt v () t + 0.5δt a () t v ( t + δt ) = v () t + 0. 5δt a () t + a ( t + δt ) F 5
Molecular Dynamics k Θ All Angles + + k ( l b l o All Bonds ( Θ Θ ) o ) U = A [ + cos( nτ + φ)] All Torsions + + All Pairs All Pairs q q r σ 4ε r 6 σ r + δ δ 6
MD Simulation Problem scientists are facing: SLOW! O(N ) complexity. 0 CPU Years 7
Solutions Parallelize to more compute engines Accelerate with FPGA Especially: The non-bonded calculations To be more specific, this paper addresses: Electrostatic interaction (Reciprocal space) Smooth Particle Mesh Ewald algorithm. 8
Previous Work Software SPME Implementations: Original PME Package written by Toukmaji. Used in NAMD. Hardware Implementations: No previous hardware implementation of reciprocal sums calculation. MD-Grape & MD-Engine uses Ewald Summation. Ewald Summation is O(N ); SPME is O(NLogN)! 9
Smooth Particle Mesh Ewald 0
Electrostatic Interaction Coulombic equation: qq = 4πε r v coulomb 0 Under the Periodic Boundary Condition, the summation to calculate Electrostatic energy is only Conditionally Convergent. U = ' n N N q q i= j= ij, n i r j
Periodic Boundary Condition A 4 5 B 4 5 C 4 5 D 4 5 E 4 5 F 4 5 G 4 5 H 4 5 I 4 5 To combat Surface Effect 4 5 Replication
Ewald Summation Used For PBC To calculate the Coulombic Interactions O(N ) Direct Sum + O(N ) Reciprocal Sum q r Direct Sum q Reciprocal Sum q r r
Smooth Particle Mesh Ewald Shift the workload to the Reciprocal Sum. Use Fast Fourier Transform. O(N) Real + O(NLogN) Reciprocal. RSCE calculates the Reciprocal Sums using the SPME algorithm. 4
5 SPME Reciprocal Contribution ),m,m m Q)( (θ ),m,m (m r Q r E F K m K m rec K m αi αi rec ~ 0 0 0 = = = = = ) (m b ) (m b ) b (m ),m,m B(m = 0 exp exp = + = n k i i n i i i i ) K πim k ( ) (k M ) K )m πi(n ( ) b (m exp m ) /β m π ( πv ),m,m C(m = 0 0 0 0 0 = ),,,c( m ) m, m, m )F(Q)(,m,m F(Q)(m ),m,m B(m m ) /β m π ( πv E m ~ 0 exp = FFT FFT Energy: Force: ),m,m m Q)( (θ ),m,m Q(m E K m K m rec K m ~ 0 0 0 = = = =
Charge Interpolation F D A C B E 6
Reciprocal Sum Compute Engine 7
RSCE Architecture 8
RSCE Verification Testbench 9
RSCE Validation Environment 0
Speedup Estimate RSCE vs. Software Implementation
RSCE Speedup RSCE @ 00MHz vs. P4 Intel @.4GHz. Speedup: x to 4x Why so insignificant? Reciprocal Sums calculations not easily parallelizable. QMM memory bandwidth limitation. Improvement: Using more QMM memories can improve the speedup. Slight design modifications are required.
Parallelization Strategy Multiple RSCE
RSCE Parallelization Strategy Assume a -D simulation system. Assume P=, K=8, N=6. Assume NumP = 4. An 8x8x8 mesh Four 4x4x4 Mini Meshes 4
RSCE Parallelization Strategy Mini-mesh composed -> D-IFFT D-IFFT = two passes of D-FFT (X and Y). X Direction FFT Y Direction FFT Ky Ky P P P P P P P4 D FFT Y direction P4 0 Kx D FFT X direction 0 Kx 5
Parallelization Strategy D-IFFT -> Energy Calculation -> D-FFT D-FFT -> Force Calculation Energy Calculation Force Calculation E Total = E P P= 0 D-FFT 6
MD Simulations RSCE + NAMD 7
RSCE Precision Precision goal: Relative error bound < 0-5. Two major calculation steps: B-Spline Calculation. D-FFT/IFFT Calculation. Due to the limited logic resource & limited precision FFT LogiCore. => Precision goal cannot be achieved. 8
RSCE Precision To achieve the relative error bound of < 0-5. Minimum calculation precision: FFT {4.0}, B-Spline {.7} 9
MD Simulation with RSCE RMS Energy Error Fluctuation: RMS Energy Fluctuatio n = E E E 0
FFT Precision Vs. Energy Fluctuation
Summary Implementation of FPGA-based Reciprocal Sums Compute Engine and its SystemC model. Integration of the RSCE into a widely used Molecular Dynamics program called NAMD for verification RSCE Speedup Estimate x to 4x Precision Requirement B-Spline: {.7} & FFT: {4:0} => 0-5 rel. error Parallelization Strategy
Future Work More in-depth precision analysis. Investigation on how to further speedup the SPME algorithm with FPGA.
Questions 4