Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013

Size: px

Start display at page:

Download "Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013"

Cleopatra Cox
5 years ago
Views:

1 Recent Progress of Parallel SAMCEF with User Group Meeting 213 Jean-Pierre Delsemme Product Development Manager

2 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 2. User Group Meeting 213

3 SAMCEF, a brief history SAMCEF: general purpose FE software First line of code written in 1967 at University of Liege First customer in aeronautic in 1977 Creation of SAMTECH in 1986 (9 people) Implementation of BCSLIB in 1998 Implementation of 25 Acquisition of 6% of SAMTECH by LMS in 211 (9 subsidiaries, 3 people, 3 MEuro per year) 3. User Group Meeting 213

4 3 rd participation User Group Meeting, Toulouse April 15 th an 16 th User Group Meeting 213

5 Parallel computing in SAMCEF What is parallel computing in SAMCEF? ASEF Linear static analysis Use of for linear system resolution MECANO Non-linear static and dynamic analysis Elements generation / System resolution / Results archive (depending on PARALL) DYNAM STABI BACON Modal analysis Linear buckling analysis Ray Tracing (.MCRT command) Use of SLEPC and for eigenvalues problem resolution Use of SLEPC and for eigenvalues problem resolution Distribution of ray tracing computation 5. User Group Meeting 213

6 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 6. User Group Meeting 213

7 MECANO-MECANO Co-simulation Co-simulation principle 2 fields of unknown 2 governing equations Function of time F x, y, t f g x, y, t x, y, t Staggered solution scheme Field are exchanged at time step level Approximation due to delay of the information Can lead to numerical problem 7. User Group Meeting 213

8 MECANO-MECANO Co-simulation Monolithic solution scheme Coupling at iteration level All equations treated at once Master slave approach Internal dof's "11" Interface dof's "22" At each iteration, exchange of: M 11 M 21 Interface unknowns q 2, R 2 Schur complement of the interface and linear constrains K K K K B M 12 M 22 M K K S 11 S 21 K K B K K S 12 S 22 S S* 22 S* 22 B B B K MT ST dq dq dq dq M 1 M 2 S 1 S 2 R R R R M 1 M 2 S 1 S 2 d T B S* 22 K S 21 K S 11 K S User Group Meeting 213

9 MECANO-MECANO Co-simulation Example: 4 wheel car model 4 dof's per tires 2 phase loading Tire inflate without contact Apply gravity on wheel to force contact with ground Non optimum automatic domain decomposition 9. User Group Meeting 213

10 MECANO-MECANO Co-simulation Example: 8 wheel car model 4 dof's per tires Impressive computation time reduction Configuration Time (min) Speed up monolithic 1 node, 4 procs monolithic 2 nodes, 8 procs monolithic 4 nodes, 16 procs monolithic 8 nodes, 32 procs New Method 5 nodes, 18 procs New Method 9 nodes, 34 procs User Group Meeting 213

11 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 11. User Group Meeting 213

12 Maaximus, the "Gigadof" challenge Calculate a fuselage made of, as much as possible, identical single barrels ~4,5, dof's per barrel Identification of new limitations "standardization" of long long integer version 12. User Group Meeting 213

Maaximus, the "Gigadof" challenge 3 sections of fuselage 13,956,62 dof's 1,891,176 shell elem. 232211 elem.

13 Maaximus, the "Gigadof" challenge 3 sections of fuselage 13,956,62 dof's 1,891,176 shell elem elem. In total 7 h 24' on a cluster of 8 nodes, up to 1% of prescribed load 5 time steps, 1 rejected, 19 iterations Intel(R) Core(TM) 2.67 GHz 12 Gb per node 13. User Group Meeting 213

14 Maaximus, the "Gigadof" challenge 4 sections of fuselage 18,58,217 dof's 2,521,568 shell elem. 3,92,81 elem. In total 28 h 3' on a cluster of 1 nodes. 7 time steps, 5 rejected, 57 iterations up to 91% of load Intel(R) Core(TM) 2.67 GHz 12 Gb per node 14. User Group Meeting 213

Maaximus, the "Gigadof" challenge 6 sections of fuselage 27,823,85 dof's 3782746 shell elem. 4,635,44 elem.

15 Maaximus, the "Gigadof" challenge 6 sections of fuselage 27,823,85 dof's shell elem. 4,635,44 elem. In total 37h 3 on a cluster of 8 nodes. 7 time steps, 3 rejected, 41 iterations up to 1% of load Intel(R) Core(TM) 2.67 GHz 12 Gb per node 15. User Group Meeting 213

16 Maaximus, the "Gigadof" challenge 7 sections of fuselage 37,127,423 dof's shell elem. 6,177,11 elem. In total h on a cluster of 8 nodes. time steps, rejected, iterations up to % of load Intel(R) Core(TM) 2.67 GHz 12 Gb per node Segmentation fault in METIS! Maybe long long integer needed Can anyone help? 16. User Group Meeting 213

17 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 17. User Group Meeting 213

18 Parallel Eigenvalue Solver Since version 15 (partially available in 14) an interface with SLEPC solver has been implemented. SLEPC library is a public domain software developed by Universidad Politecnica de Valencia, Spain V. Hernandez, J. E. Roman and V. Vidal (25), "SLEPc: A Scalable and Flexible Toolkit for the Solution of Eigenvalue Problems", ACM Trans. Math. Softw. 31(3), SLEPC perform mathematical operations in parallel based on: PETSC for matrix vector product. for liner system solve. 18. User Group Meeting 213

19 Parallel Eigenvalue Solver Benefit for users Faster computation Possibility to use several computers and than more memory Possible disadvantage is able to work out-of-core but PETSC isn t. On a given computer (fixed memory) there is maximum problem size that is lower than BCSLIB maximum size. 19. User Group Meeting 213

Elapsed time (sec) Speedup Parallel Eigenvalue Solver Scalable test case: parametric model Comparison of a sequential BCSLIB run with parallel 4 proc. SLEPC run.

20 Elapsed time (sec) Speedup Parallel Eigenvalue Solver Scalable test case: parametric model Comparison of a sequential BCSLIB run with parallel 4 proc. SLEPC run. Intel(R) 2.66GHz, 48 GBytes of memory, 8 cores 25 2 BCSLIB Elapsed PEIGDY2 Elapsed Speedup User Group Meeting Problem size (Mdof)

21 Elapsed Solver time, Log1 (sec.) Parallel Eigenvalue Solver Scalable test case: parametric model Allows to use several nodes in a cluster and than more memory. Intel(R) Core(TM) 3.4GHz, 16 Gbytes of memory per node, 4 cores 4 threads per node; 1 eigenvalues 894 Out-of-Core 4 Nodes 8 Nodes Degrees of Freedom, (Mdof) 21. User Group Meeting 213

22 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 22. User Group Meeting 213

23 Customer question On my machine, what is the maximum model size I can analyze? or To analyze my model, what is the minimum machine configuration (cheapest) I should acquire? Answer: "test it yourself" 23. User Group Meeting 213

24 Parametric model Realistic "fuselage" model Made of shell elements With skin, frames and stringers 2 types of analysis A non-linear static loading 2 iterations (2 factors, 2solves) An eigenvalue analysis 2 eigenvalues, 1 factor, n solves (18 or 27) 24. User Group Meeting 213

25 Design of experiment Hardware Cluster of 16 (identical) nodes Processor: Intel(R) Core(TM) i7-393k 3.2GHz # Cores: 6 MemTotal: SwapTotal: kb kb 25. User Group Meeting 213

26 Design of experiment Software 1 model sizes [1-1] million of dof's 2 solvers BCSLIB and or SLEPC/ Sequential BCSLIB with 1, 2, 4 or 6 MKL Threads Sequential with 1, 2, 4 or 6 MKL Threads Parallel on 2 procs. with 1 or 2 MKL Threads Parallel on 3 procs. with 2 MKL Threads Parallel on 4 procs. with 1 MKL Threads Parallel on 6 procs. with 1 MKL Threads 9 among 14 possible cases Total of 1*2*( ) = 26 analyses Total of 19days 14h 3min 26.5sec 26. User Group Meeting 213

27 Floating point operations Memory (M bytes) Model properties E+13 # Dof's E E+13 BCSLIB ops. F BCSLIB Min OOC Mb 6 ops. F BCSLIB Min IC Mb Min OOC Mb 5 Arithmetic Intensity Min IC Mb E+13 4 ICNTL(22) MemTotal E E+ 1.E+ 2.E+6 4.E+6 6.E+6 8.E+6 1.E+7 1.2E+7.E+ 2.E+6 4.E+6 6.E+6 8.E+6 1.E+7 1.2E User Group Meeting 213

28 Solver elapsed time (sec.) Solver elapsed time (sec.) Non-linear static analysis 2, 18, 16, 14, 12, 1, 8, 6, 4, 2, 1Proc 4, 2Proc 4Proc 1Proc 6Proc 3,5 2Proc 2th 1Proc 4Proc 2th 2Proc 3, 6Proc 2th 3Proc 2th 1Proc 4th 1Proc 2th 2Proc 2,5 6th 1Proc 2th 3Proc BCSLIB 4th 1Proc BCSLIB 2, 2th 6th 1Proc BCSLIB 4th BCSLIB BCSLIB 1,5 6th BCSLIB 2th BCSLIB 4th 1, BCSLIB 6th User Group Meeting 213

29 Total Elapsed Time (sec.) Speedup Non-linear static analysis Total Elapsed time, 8 Mdof BCSLIB BCSLIB 2th 4Proc 1Proc BCSLIB 4th BCSLIB 6th 2Proc 2th 3Proc 2th 1Proc 2th 2Proc 4th 1Proc 6th 1Proc 6Proc. 29. User Group Meeting 213

30 Total Elapsed Time (sec.) Speedup Non-linear static analysis 3 Total Elapsed time, 4 Mdof BCSLIB 2th BCSLIB BCSLIB 6th 1Proc BCSLIB 4th 6Proc 2th 1Proc 4th 1Proc 2Proc 4Proc 6th 1Proc 2th 2Proc 2th 3Proc. 3. User Group Meeting 213

31 Total Elapsed time (sec.) Total Elapsed time (sec.) Eigenvalue analysis 35, 3, 25, 2, 15, 1, 5, 1Proc 4, 2Proc 4Proc 1Proc 3,5 6Proc 2Proc 2th 1Proc 4Proc 3, 2th 2Proc 6Proc 2th 3Proc 2th 1Proc 2,5 4th 1Proc 2th 2Proc 6th 1Proc 2th 3Proc BCSLIB 4th 1Proc 2, BCSLIB 2th 6th 1Proc BCSLIB 4th BCSLIB 1,5 BCSLIB 6th BCSLIB 2th BCSLIB 4th 1, BCSLIB 6th User Group Meeting 213

32 Total Elapsed Time (sec.) Speedup Eigenvalue analysis 3 25 Total Elapsed time, =8 Mdof Proc 6Proc 1Proc 4Proc 2th 3Proc 2th 1Proc 2th 2Proc 6th 1Proc 4th 1Proc BCSLIB 6th BCSLIB 4th BCSLIB BCSLIB 2th. 32. User Group Meeting 213

33 Total Elapsed Time (sec.) Speedup Eigenvalue analysis Total Elapsed time, 4 Mdof th 1Proc BCSLIB BCSLIB 4th BCSLIB 2th 2Proc BCSLIB 6th 1Proc 2th 2Proc 4th 1Proc 6th 1Proc 6Proc 2th 3Proc 4Proc. 33. User Group Meeting 213

34 Conclusions SAMCEF still improves parallel computing Better parallel processing of non-linear analysis Co-simulation helps parallel processing Parallel eigenvalue solver SLEPC/ Need help for METIS long long integer Test the limits of your machine Parametric model available No need to try too far out of memory Optimum balance MKL thread / proc to find SLEPC shows good performance if it can run in core Real need for efficient eigenvalue solver for large problem 34. User Group Meeting 213

35 Thank You Jean-Pierre Delsemme Product Development Manager

36 If you come to Liège 36. User Group Meeting 213

Engine-Gasket HPC trial and Suspension NVH analysis using Linear_Perturbation ANSYS Japan Mechanical BU Toru Hiyake

Engine-Gasket HPC trial and Suspension NVH analysis using Linear_Perturbation ANSYS Japan Mechanical BU Toru Hiyake Agenda Topic 1 : Engine-Gasket Nonlinear Analysis Legacy Data Import Gasket modeling