Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013

Size: px
Start display at page:

Download "Recent Progress of Parallel SAMCEF with MUMPS MUMPS User Group Meeting 2013"

Transcription

1 Recent Progress of Parallel SAMCEF with User Group Meeting 213 Jean-Pierre Delsemme Product Development Manager

2 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 2. User Group Meeting 213

3 SAMCEF, a brief history SAMCEF: general purpose FE software First line of code written in 1967 at University of Liege First customer in aeronautic in 1977 Creation of SAMTECH in 1986 (9 people) Implementation of BCSLIB in 1998 Implementation of 25 Acquisition of 6% of SAMTECH by LMS in 211 (9 subsidiaries, 3 people, 3 MEuro per year) 3. User Group Meeting 213

4 3 rd participation User Group Meeting, Toulouse April 15 th an 16 th User Group Meeting 213

5 Parallel computing in SAMCEF What is parallel computing in SAMCEF? ASEF Linear static analysis Use of for linear system resolution MECANO Non-linear static and dynamic analysis Elements generation / System resolution / Results archive (depending on PARALL) DYNAM STABI BACON Modal analysis Linear buckling analysis Ray Tracing (.MCRT command) Use of SLEPC and for eigenvalues problem resolution Use of SLEPC and for eigenvalues problem resolution Distribution of ray tracing computation 5. User Group Meeting 213

6 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 6. User Group Meeting 213

7 MECANO-MECANO Co-simulation Co-simulation principle 2 fields of unknown 2 governing equations Function of time F x, y, t f g x, y, t x, y, t Staggered solution scheme Field are exchanged at time step level Approximation due to delay of the information Can lead to numerical problem 7. User Group Meeting 213

8 MECANO-MECANO Co-simulation Monolithic solution scheme Coupling at iteration level All equations treated at once Master slave approach Internal dof's "11" Interface dof's "22" At each iteration, exchange of: M 11 M 21 Interface unknowns q 2, R 2 Schur complement of the interface and linear constrains K K K K B M 12 M 22 M K K S 11 S 21 K K B K K S 12 S 22 S S* 22 S* 22 B B B K MT ST dq dq dq dq M 1 M 2 S 1 S 2 R R R R M 1 M 2 S 1 S 2 d T B S* 22 K S 21 K S 11 K S User Group Meeting 213

9 MECANO-MECANO Co-simulation Example: 4 wheel car model 4 dof's per tires 2 phase loading Tire inflate without contact Apply gravity on wheel to force contact with ground Non optimum automatic domain decomposition 9. User Group Meeting 213

10 MECANO-MECANO Co-simulation Example: 8 wheel car model 4 dof's per tires Impressive computation time reduction Configuration Time (min) Speed up monolithic 1 node, 4 procs monolithic 2 nodes, 8 procs monolithic 4 nodes, 16 procs monolithic 8 nodes, 32 procs New Method 5 nodes, 18 procs New Method 9 nodes, 34 procs User Group Meeting 213

11 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 11. User Group Meeting 213

12 Maaximus, the "Gigadof" challenge Calculate a fuselage made of, as much as possible, identical single barrels ~4,5, dof's per barrel Identification of new limitations "standardization" of long long integer version 12. User Group Meeting 213

13 Maaximus, the "Gigadof" challenge 3 sections of fuselage 13,956,62 dof's 1,891,176 shell elem elem. In total 7 h 24' on a cluster of 8 nodes, up to 1% of prescribed load 5 time steps, 1 rejected, 19 iterations Intel(R) Core(TM) 2.67 GHz 12 Gb per node 13. User Group Meeting 213

14 Maaximus, the "Gigadof" challenge 4 sections of fuselage 18,58,217 dof's 2,521,568 shell elem. 3,92,81 elem. In total 28 h 3' on a cluster of 1 nodes. 7 time steps, 5 rejected, 57 iterations up to 91% of load Intel(R) Core(TM) 2.67 GHz 12 Gb per node 14. User Group Meeting 213

15 Maaximus, the "Gigadof" challenge 6 sections of fuselage 27,823,85 dof's shell elem. 4,635,44 elem. In total 37h 3 on a cluster of 8 nodes. 7 time steps, 3 rejected, 41 iterations up to 1% of load Intel(R) Core(TM) 2.67 GHz 12 Gb per node 15. User Group Meeting 213

16 Maaximus, the "Gigadof" challenge 7 sections of fuselage 37,127,423 dof's shell elem. 6,177,11 elem. In total h on a cluster of 8 nodes. time steps, rejected, iterations up to % of load Intel(R) Core(TM) 2.67 GHz 12 Gb per node Segmentation fault in METIS! Maybe long long integer needed Can anyone help? 16. User Group Meeting 213

17 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 17. User Group Meeting 213

18 Parallel Eigenvalue Solver Since version 15 (partially available in 14) an interface with SLEPC solver has been implemented. SLEPC library is a public domain software developed by Universidad Politecnica de Valencia, Spain V. Hernandez, J. E. Roman and V. Vidal (25), "SLEPc: A Scalable and Flexible Toolkit for the Solution of Eigenvalue Problems", ACM Trans. Math. Softw. 31(3), SLEPC perform mathematical operations in parallel based on: PETSC for matrix vector product. for liner system solve. 18. User Group Meeting 213

19 Parallel Eigenvalue Solver Benefit for users Faster computation Possibility to use several computers and than more memory Possible disadvantage is able to work out-of-core but PETSC isn t. On a given computer (fixed memory) there is maximum problem size that is lower than BCSLIB maximum size. 19. User Group Meeting 213

20 Elapsed time (sec) Speedup Parallel Eigenvalue Solver Scalable test case: parametric model Comparison of a sequential BCSLIB run with parallel 4 proc. SLEPC run. Intel(R) 2.66GHz, 48 GBytes of memory, 8 cores 25 2 BCSLIB Elapsed PEIGDY2 Elapsed Speedup User Group Meeting Problem size (Mdof)

21 Elapsed Solver time, Log1 (sec.) Parallel Eigenvalue Solver Scalable test case: parametric model Allows to use several nodes in a cluster and than more memory. Intel(R) Core(TM) 3.4GHz, 16 Gbytes of memory per node, 4 cores 4 threads per node; 1 eigenvalues 894 Out-of-Core 4 Nodes 8 Nodes Degrees of Freedom, (Mdof) 21. User Group Meeting 213

22 Summary SAMCEF, a brief history Co-simulation, a good candidate for parallel processing MAAXIMUS, The "Gigadof" challenge SLEPC/, a parallel eigenvalue solver Test the limit of your machine Conclusions 22. User Group Meeting 213

23 Customer question On my machine, what is the maximum model size I can analyze? or To analyze my model, what is the minimum machine configuration (cheapest) I should acquire? Answer: "test it yourself" 23. User Group Meeting 213

24 Parametric model Realistic "fuselage" model Made of shell elements With skin, frames and stringers 2 types of analysis A non-linear static loading 2 iterations (2 factors, 2solves) An eigenvalue analysis 2 eigenvalues, 1 factor, n solves (18 or 27) 24. User Group Meeting 213

25 Design of experiment Hardware Cluster of 16 (identical) nodes Processor: Intel(R) Core(TM) i7-393k 3.2GHz # Cores: 6 MemTotal: SwapTotal: kb kb 25. User Group Meeting 213

26 Design of experiment Software 1 model sizes [1-1] million of dof's 2 solvers BCSLIB and or SLEPC/ Sequential BCSLIB with 1, 2, 4 or 6 MKL Threads Sequential with 1, 2, 4 or 6 MKL Threads Parallel on 2 procs. with 1 or 2 MKL Threads Parallel on 3 procs. with 2 MKL Threads Parallel on 4 procs. with 1 MKL Threads Parallel on 6 procs. with 1 MKL Threads 9 among 14 possible cases Total of 1*2*( ) = 26 analyses Total of 19days 14h 3min 26.5sec 26. User Group Meeting 213

27 Floating point operations Memory (M bytes) Model properties E+13 # Dof's E E+13 BCSLIB ops. F BCSLIB Min OOC Mb 6 ops. F BCSLIB Min IC Mb Min OOC Mb 5 Arithmetic Intensity Min IC Mb E+13 4 ICNTL(22) MemTotal E E+ 1.E+ 2.E+6 4.E+6 6.E+6 8.E+6 1.E+7 1.2E+7.E+ 2.E+6 4.E+6 6.E+6 8.E+6 1.E+7 1.2E User Group Meeting 213

28 Solver elapsed time (sec.) Solver elapsed time (sec.) Non-linear static analysis 2, 18, 16, 14, 12, 1, 8, 6, 4, 2, 1Proc 4, 2Proc 4Proc 1Proc 6Proc 3,5 2Proc 2th 1Proc 4Proc 2th 2Proc 3, 6Proc 2th 3Proc 2th 1Proc 4th 1Proc 2th 2Proc 2,5 6th 1Proc 2th 3Proc BCSLIB 4th 1Proc BCSLIB 2, 2th 6th 1Proc BCSLIB 4th BCSLIB BCSLIB 1,5 6th BCSLIB 2th BCSLIB 4th 1, BCSLIB 6th User Group Meeting 213

29 Total Elapsed Time (sec.) Speedup Non-linear static analysis Total Elapsed time, 8 Mdof BCSLIB BCSLIB 2th 4Proc 1Proc BCSLIB 4th BCSLIB 6th 2Proc 2th 3Proc 2th 1Proc 2th 2Proc 4th 1Proc 6th 1Proc 6Proc. 29. User Group Meeting 213

30 Total Elapsed Time (sec.) Speedup Non-linear static analysis 3 Total Elapsed time, 4 Mdof BCSLIB 2th BCSLIB BCSLIB 6th 1Proc BCSLIB 4th 6Proc 2th 1Proc 4th 1Proc 2Proc 4Proc 6th 1Proc 2th 2Proc 2th 3Proc. 3. User Group Meeting 213

31 Total Elapsed time (sec.) Total Elapsed time (sec.) Eigenvalue analysis 35, 3, 25, 2, 15, 1, 5, 1Proc 4, 2Proc 4Proc 1Proc 3,5 6Proc 2Proc 2th 1Proc 4Proc 3, 2th 2Proc 6Proc 2th 3Proc 2th 1Proc 2,5 4th 1Proc 2th 2Proc 6th 1Proc 2th 3Proc BCSLIB 4th 1Proc 2, BCSLIB 2th 6th 1Proc BCSLIB 4th BCSLIB 1,5 BCSLIB 6th BCSLIB 2th BCSLIB 4th 1, BCSLIB 6th User Group Meeting 213

32 Total Elapsed Time (sec.) Speedup Eigenvalue analysis 3 25 Total Elapsed time, =8 Mdof Proc 6Proc 1Proc 4Proc 2th 3Proc 2th 1Proc 2th 2Proc 6th 1Proc 4th 1Proc BCSLIB 6th BCSLIB 4th BCSLIB BCSLIB 2th. 32. User Group Meeting 213

33 Total Elapsed Time (sec.) Speedup Eigenvalue analysis Total Elapsed time, 4 Mdof th 1Proc BCSLIB BCSLIB 4th BCSLIB 2th 2Proc BCSLIB 6th 1Proc 2th 2Proc 4th 1Proc 6th 1Proc 6Proc 2th 3Proc 4Proc. 33. User Group Meeting 213

34 Conclusions SAMCEF still improves parallel computing Better parallel processing of non-linear analysis Co-simulation helps parallel processing Parallel eigenvalue solver SLEPC/ Need help for METIS long long integer Test the limits of your machine Parametric model available No need to try too far out of memory Optimum balance MKL thread / proc to find SLEPC shows good performance if it can run in core Real need for efficient eigenvalue solver for large problem 34. User Group Meeting 213

35 Thank You Jean-Pierre Delsemme Product Development Manager

36 If you come to Liège 36. User Group Meeting 213

Engine-Gasket HPC trial and Suspension NVH analysis using Linear_Perturbation ANSYS Japan Mechanical BU Toru Hiyake

Engine-Gasket HPC trial and Suspension NVH analysis using Linear_Perturbation ANSYS Japan Mechanical BU Toru Hiyake Engine-Gasket HPC trial and Suspension NVH analysis using Linear_Perturbation ANSYS Japan Mechanical BU Toru Hiyake Agenda Topic 1 : Engine-Gasket Nonlinear Analysis Legacy Data Import Gasket modeling

More information

Parallelization of the FE-solver ICONA

Parallelization of the FE-solver ICONA Parallelization of the FE-solver ICONA 2 nd Workshop on Structural Analysis of Lightweight Structures INTALES GmbH Engineering Solutions University of Innsbruck, Faculty of Civil Engineering Sciences University

More information

WRF performance tuning for the Intel Woodcrest Processor

WRF performance tuning for the Intel Woodcrest Processor WRF performance tuning for the Intel Woodcrest Processor A. Semenov, T. Kashevarova, P. Mankevich, D. Shkurko, K. Arturov, N. Panov Intel Corp., pr. ak. Lavrentieva 6/1, Novosibirsk, Russia, 630090 {alexander.l.semenov,tamara.p.kashevarova,pavel.v.mankevich,

More information

Sparse solver 64 bit and out-of-core addition

Sparse solver 64 bit and out-of-core addition Sparse solver 64 bit and out-of-core addition Prepared By: Richard Link Brian Yuen Martec Limited 1888 Brunswick Street, Suite 400 Halifax, Nova Scotia B3J 3J8 PWGSC Contract Number: W7707-145679 Contract

More information

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation

A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation A Domain Decomposition Based Jacobi-Davidson Algorithm for Quantum Dot Simulation Tao Zhao 1, Feng-Nan Hwang 2 and Xiao-Chuan Cai 3 Abstract In this paper, we develop an overlapping domain decomposition

More information

DISPENSA FEM in MSC. Nastran

DISPENSA FEM in MSC. Nastran DISPENSA FEM in MSC. Nastran preprocessing: mesh generation material definitions definition of loads and boundary conditions solving: solving the (linear) set of equations components postprocessing: visualisation

More information

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems

A Parallel Implementation of the Davidson Method for Generalized Eigenproblems A Parallel Implementation of the Davidson Method for Generalized Eigenproblems Eloy ROMERO 1 and Jose E. ROMAN Instituto ITACA, Universidad Politécnica de Valencia, Spain Abstract. We present a parallel

More information

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors

An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Contemporary Mathematics Volume 218, 1998 B 0-8218-0988-1-03024-7 An Efficient FETI Implementation on Distributed Shared Memory Machines with Independent Numbers of Subdomains and Processors Michel Lesoinne

More information

Matrix Assembly in FEA

Matrix Assembly in FEA Matrix Assembly in FEA 1 In Chapter 2, we spoke about how the global matrix equations are assembled in the finite element method. We now want to revisit that discussion and add some details. For example,

More information

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics

SPARSE SOLVERS POISSON EQUATION. Margreet Nool. November 9, 2015 FOR THE. CWI, Multiscale Dynamics SPARSE SOLVERS FOR THE POISSON EQUATION Margreet Nool CWI, Multiscale Dynamics November 9, 2015 OUTLINE OF THIS TALK 1 FISHPACK, LAPACK, PARDISO 2 SYSTEM OVERVIEW OF CARTESIUS 3 POISSON EQUATION 4 SOLVERS

More information

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano

Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS. Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano Parallel PIPS-SBB Multi-level parallelism for 2-stage SMIPS Lluís-Miquel Munguia, Geoffrey M. Oxberry, Deepak Rajan, Yuji Shinano ... Our contribution PIPS-PSBB*: Multi-level parallelism for Stochastic

More information

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA

Jacobi-Based Eigenvalue Solver on GPU. Lung-Sheng Chien, NVIDIA Jacobi-Based Eigenvalue Solver on GPU Lung-Sheng Chien, NVIDIA lchien@nvidia.com Outline Symmetric eigenvalue solver Experiment Applications Conclusions Symmetric eigenvalue solver The standard form is

More information

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems

A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Outline A High-Performance Parallel Hybrid Method for Large Sparse Linear Systems Azzam Haidar CERFACS, Toulouse joint work with Luc Giraud (N7-IRIT, France) and Layne Watson (Virginia Polytechnic Institute,

More information

Arnoldi Methods in SLEPc

Arnoldi Methods in SLEPc Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-4 Available at http://slepc.upv.es Arnoldi Methods in SLEPc V. Hernández J. E. Román A. Tomás V. Vidal Last update: October,

More information

Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers

Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers Parallel Computation of the Eigenstructure of Toeplitz-plus-Hankel matrices on Multicomputers José M. Badía * and Antonio M. Vidal * Departamento de Sistemas Informáticos y Computación Universidad Politécnica

More information

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers

Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers UT College of Engineering Tutorial Accelerating Linear Algebra on Heterogeneous Architectures of Multicore and GPUs using MAGMA and DPLASMA and StarPU Schedulers Stan Tomov 1, George Bosilca 1, and Cédric

More information

Welcome to MCS 572. content and organization expectations of the course. definition and classification

Welcome to MCS 572. content and organization expectations of the course. definition and classification Welcome to MCS 572 1 About the Course content and organization expectations of the course 2 Supercomputing definition and classification 3 Measuring Performance speedup and efficiency Amdahl s Law Gustafson

More information

Acoustics Analysis of Speaker ANSYS, Inc. November 28, 2014

Acoustics Analysis of Speaker ANSYS, Inc. November 28, 2014 Acoustics Analysis of Speaker 1 Introduction ANSYS 14.0 offers many enhancements in the area of acoustics. In this presentation, an example speaker analysis will be shown to highlight some of the acoustics

More information

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices

A robust multilevel approximate inverse preconditioner for symmetric positive definite matrices DICEA DEPARTMENT OF CIVIL, ENVIRONMENTAL AND ARCHITECTURAL ENGINEERING PhD SCHOOL CIVIL AND ENVIRONMENTAL ENGINEERING SCIENCES XXX CYCLE A robust multilevel approximate inverse preconditioner for symmetric

More information

Parallel Polynomial Evaluation

Parallel Polynomial Evaluation Parallel Polynomial Evaluation Jan Verschelde joint work with Genady Yoffe University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science http://www.math.uic.edu/ jan jan@math.uic.edu

More information

Open-source finite element solver for domain decomposition problems

Open-source finite element solver for domain decomposition problems 1/29 Open-source finite element solver for domain decomposition problems C. Geuzaine 1, X. Antoine 2,3, D. Colignon 1, M. El Bouajaji 3,2 and B. Thierry 4 1 - University of Liège, Belgium 2 - University

More information

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++

A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ . p.1/21 11 Dec. 2014, LJLL, Paris FreeFem++ workshop A dissection solver with kernel detection for unsymmetric matrices in FreeFem++ Atsushi Suzuki Atsushi.Suzuki@ann.jussieu.fr Joint work with François-Xavier

More information

CAEFEM v9.5 Information

CAEFEM v9.5 Information CAEFEM v9.5 Information Concurrent Analysis Corporation, 50 Via Ricardo, Thousand Oaks, CA 91320 USA Tel. (805) 375 1060, Fax (805) 375 1061 email: info@caefem.com or support@caefem.com Web: http://www.caefem.com

More information

Optimized analysis of isotropic high-nuclearity spin clusters with GPU acceleration

Optimized analysis of isotropic high-nuclearity spin clusters with GPU acceleration Optimized analysis of isotropic high-nuclearity spin clusters with GPU acceleration A. Lamas Daviña, E. Ramos, J. E. Roman D. Sistemes Informàtics i Computació, Universitat Politècnica de València, Camí

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer September 9, 23 PPAM 23 1 Ian Glendinning / September 9, 23 Outline Introduction Quantum Bits, Registers

More information

Content. Department of Mathematics University of Oslo

Content. Department of Mathematics University of Oslo Chapter: 1 MEK4560 The Finite Element Method in Solid Mechanics II (January 25, 2008) (E-post:torgeiru@math.uio.no) Page 1 of 14 Content 1 Introduction to MEK4560 3 1.1 Minimum Potential energy..............................

More information

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic

GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic GPU acceleration of Newton s method for large systems of polynomial equations in double double and quad double arithmetic Jan Verschelde joint work with Xiangcheng Yu University of Illinois at Chicago

More information

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems

TR A Comparison of the Performance of SaP::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems TR-0-07 A Comparison of the Performance of ::GPU and Intel s Math Kernel Library (MKL) for Solving Dense Banded Linear Systems Ang Li, Omkar Deshmukh, Radu Serban, Dan Negrut May, 0 Abstract ::GPU is a

More information

A Data Communication Reliability and Trustability Study for Cluster Computing

A Data Communication Reliability and Trustability Study for Cluster Computing A Data Communication Reliability and Trustability Study for Cluster Computing Speaker: Eduardo Colmenares Midwestern State University Wichita Falls, TX HPC Introduction Relevant to a variety of sciences,

More information

Imaging using GPU. V-K Veligatla, Kapteyn Institute P. Labropoulos, ASTRON and Kapteyn Institute L. Koopmans, Kapteyn Institute

Imaging using GPU. V-K Veligatla, Kapteyn Institute P. Labropoulos, ASTRON and Kapteyn Institute L. Koopmans, Kapteyn Institute Imaging using GPU V-K Veligatla, Kapteyn Institute P. Labropoulos, ASTRON and Kapteyn Institute L. Koopmans, Kapteyn Institute Introduction What is a GPU? Why another Imager? Large amount of data to be

More information

Computing least squares condition numbers on hybrid multicore/gpu systems

Computing least squares condition numbers on hybrid multicore/gpu systems Computing least squares condition numbers on hybrid multicore/gpu systems M. Baboulin and J. Dongarra and R. Lacroix Abstract This paper presents an efficient computation for least squares conditioning

More information

D && 9.0 DYNAMIC ANALYSIS

D && 9.0 DYNAMIC ANALYSIS 9.0 DYNAMIC ANALYSIS Introduction When a structure has a loading which varies with time, it is reasonable to assume its response will also vary with time. In such cases, a dynamic analysis may have to

More information

Dense Arithmetic over Finite Fields with CUMODP

Dense Arithmetic over Finite Fields with CUMODP Dense Arithmetic over Finite Fields with CUMODP Sardar Anisul Haque 1 Xin Li 2 Farnam Mansouri 1 Marc Moreno Maza 1 Wei Pan 3 Ning Xie 1 1 University of Western Ontario, Canada 2 Universidad Carlos III,

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law Topics 2 Page The Nature of Time real (i.e. wall clock) time = User Time: time spent

More information

Measurement & Performance

Measurement & Performance Measurement & Performance Topics Timers Performance measures Time-based metrics Rate-based metrics Benchmarking Amdahl s law 2 The Nature of Time real (i.e. wall clock) time = User Time: time spent executing

More information

Performance and Scalability. Lars Karlsson

Performance and Scalability. Lars Karlsson Performance and Scalability Lars Karlsson Outline Complexity analysis Runtime, speedup, efficiency Amdahl s Law and scalability Cost and overhead Cost optimality Iso-efficiency function Case study: matrix

More information

Theoretical Manual Theoretical background to the Strand7 finite element analysis system

Theoretical Manual Theoretical background to the Strand7 finite element analysis system Theoretical Manual Theoretical background to the Strand7 finite element analysis system Edition 1 January 2005 Strand7 Release 2.3 2004-2005 Strand7 Pty Limited All rights reserved Contents Preface Chapter

More information

PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1

PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1 PI SERVER 2012 Do. More. Faster. Now! Copyr i g h t 2012 O S Is o f t, L L C. 1 AUGUST 7, 2007 APRIL 14, 2010 APRIL 24, 2012 Copyr i g h t 2012 O S Is o f t, L L C. 2 PI Data Archive Security PI Asset

More information

Lecture 3, Performance

Lecture 3, Performance Repeating some definitions: Lecture 3, Performance CPI MHz MIPS MOPS Clocks Per Instruction megahertz, millions of cycles per second Millions of Instructions Per Second = MHz / CPI Millions of Operations

More information

Lecture 3, Performance

Lecture 3, Performance Lecture 3, Performance Repeating some definitions: CPI Clocks Per Instruction MHz megahertz, millions of cycles per second MIPS Millions of Instructions Per Second = MHz / CPI MOPS Millions of Operations

More information

CS 700: Quantitative Methods & Experimental Design in Computer Science

CS 700: Quantitative Methods & Experimental Design in Computer Science CS 700: Quantitative Methods & Experimental Design in Computer Science Sanjeev Setia Dept of Computer Science George Mason University Logistics Grade: 35% project, 25% Homework assignments 20% midterm,

More information

Parallelization of the QC-lib Quantum Computer Simulator Library

Parallelization of the QC-lib Quantum Computer Simulator Library Parallelization of the QC-lib Quantum Computer Simulator Library Ian Glendinning and Bernhard Ömer VCPC European Centre for Parallel Computing at Vienna Liechtensteinstraße 22, A-19 Vienna, Austria http://www.vcpc.univie.ac.at/qc/

More information

Enhancing Scalability of Sparse Direct Methods

Enhancing Scalability of Sparse Direct Methods Journal of Physics: Conference Series 78 (007) 0 doi:0.088/7-6596/78//0 Enhancing Scalability of Sparse Direct Methods X.S. Li, J. Demmel, L. Grigori, M. Gu, J. Xia 5, S. Jardin 6, C. Sovinec 7, L.-Q.

More information

= ( 1 P + S V P) 1. Speedup T S /T V p

= ( 1 P + S V P) 1. Speedup T S /T V p Numerical Simulation - Homework Solutions 2013 1. Amdahl s law. (28%) Consider parallel processing with common memory. The number of parallel processor is 254. Explain why Amdahl s law for this situation

More information

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems

Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Scalable Domain Decomposition Preconditioners For Heterogeneous Elliptic Problems Pierre Jolivet, F. Hecht, F. Nataf, C. Prud homme Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann INRIA Rocquencourt

More information

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications

Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Parallel Algorithms for Solution of Large Sparse Linear Systems with Applications Murat Manguoğlu Department of Computer Engineering Middle East Technical University, Ankara, Turkey Prace workshop: HPC

More information

FEAST eigenvalue algorithm and solver: review and perspectives

FEAST eigenvalue algorithm and solver: review and perspectives FEAST eigenvalue algorithm and solver: review and perspectives Eric Polizzi Department of Electrical and Computer Engineering University of Masachusetts, Amherst, USA Sparse Days, CERFACS, June 25, 2012

More information

A Blackbox Polynomial System Solver on Parallel Shared Memory Computers

A Blackbox Polynomial System Solver on Parallel Shared Memory Computers A Blackbox Polynomial System Solver on Parallel Shared Memory Computers Jan Verschelde University of Illinois at Chicago Department of Mathematics, Statistics, and Computer Science The 20th Workshop on

More information

ERLANGEN REGIONAL COMPUTING CENTER

ERLANGEN REGIONAL COMPUTING CENTER ERLANGEN REGIONAL COMPUTING CENTER Making Sense of Performance Numbers Georg Hager Erlangen Regional Computing Center (RRZE) Friedrich-Alexander-Universität Erlangen-Nürnberg OpenMPCon 2018 Barcelona,

More information

Direct Self-Consistent Field Computations on GPU Clusters

Direct Self-Consistent Field Computations on GPU Clusters Direct Self-Consistent Field Computations on GPU Clusters Guochun Shi, Volodymyr Kindratenko National Center for Supercomputing Applications University of Illinois at UrbanaChampaign Ivan Ufimtsev, Todd

More information

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors

Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Large-scale Electronic Structure Simulations with MVAPICH2 on Intel Knights Landing Manycore Processors Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr) Principal Researcher / Korea Institute of Science and Technology

More information

Distributed Nonsmooth Contact Domain Decomposition (NSCDD): algorithmic structure and scalability

Distributed Nonsmooth Contact Domain Decomposition (NSCDD): algorithmic structure and scalability Distributed Nonsmooth Contact Domain Decomposition (NSCDD): algorithmic structure and scalability V. Visseq 1, A. Martin 2, D. Dureisseix 3, F. Dubois 1, and P. Alart 1 1 Introduction Numerical simulations

More information

Analysis of PFLOTRAN on Jaguar

Analysis of PFLOTRAN on Jaguar Analysis of PFLOTRAN on Jaguar Kevin Huck, Jesus Labarta, Judit Gimenez, Harald Servat, Juan Gonzalez, and German Llort CScADS Workshop on Performance Tools for Petascale Computing Snowbird, UT How this

More information

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling

Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling 2019 Intel extreme Performance Users Group (IXPUG) meeting Massively scalable computing method to tackle large eigenvalue problems for nanoelectronics modeling Hoon Ryu, Ph.D. (E: elec1020@kisti.re.kr)

More information

Domain Decomposition-based contour integration eigenvalue solvers

Domain Decomposition-based contour integration eigenvalue solvers Domain Decomposition-based contour integration eigenvalue solvers Vassilis Kalantzis joint work with Yousef Saad Computer Science and Engineering Department University of Minnesota - Twin Cities, USA SIAM

More information

N-body Simulations. On GPU Clusters

N-body Simulations. On GPU Clusters N-body Simulations On GPU Clusters Laxmikant Kale Filippo Gioachin Pritish Jetley Thomas Quinn Celso Mendes Graeme Lufkin Amit Sharma Joachim Stadel Lukasz Wesolowski James Wadsley Edgar Solomonik Fabio

More information

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University

Model Order Reduction via Matlab Parallel Computing Toolbox. Istanbul Technical University Model Order Reduction via Matlab Parallel Computing Toolbox E. Fatih Yetkin & Hasan Dağ Istanbul Technical University Computational Science & Engineering Department September 21, 2009 E. Fatih Yetkin (Istanbul

More information

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters --

Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer and GPU-Clusters -- Parallel Processing for Energy Efficiency October 3, 2013 NTNU, Trondheim, Norway Open-Source Parallel FE Software : FrontISTR -- Performance Considerations about B/F (Byte per Flop) of SpMV on K-Supercomputer

More information

Parallelism in Structured Newton Computations

Parallelism in Structured Newton Computations Parallelism in Structured Newton Computations Thomas F Coleman and Wei u Department of Combinatorics and Optimization University of Waterloo Waterloo, Ontario, Canada N2L 3G1 E-mail: tfcoleman@uwaterlooca

More information

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver

Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Scalable Hybrid Programming and Performance for SuperLU Sparse Direct Solver Sherry Li Lawrence Berkeley National Laboratory Piyush Sao Rich Vuduc Georgia Institute of Technology CUG 14, May 4-8, 14, Lugano,

More information

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX

Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX Utilisation de la compression low-rank pour réduire la complexité du solveur PaStiX 26 Septembre 2018 - JCAD 2018 - Lyon Grégoire Pichon, Mathieu Faverge, Pierre Ramet, Jean Roman Outline 1. Context 2.

More information

MBS/FEM Co-Simulation Approach for Analyzing Fluid/Structure- Interaction Phenomena in Turbine Systems

MBS/FEM Co-Simulation Approach for Analyzing Fluid/Structure- Interaction Phenomena in Turbine Systems Multibody Systems Martin Busch University of Kassel Mechanical Engineering MBS/FEM Co-Simulation Approach for Analyzing Fluid/Structure- Interaction Phenomena in Turbine Systems Martin Busch and Bernhard

More information

Parsimonious Adaptive Rejection Sampling

Parsimonious Adaptive Rejection Sampling Parsimonious Adaptive Rejection Sampling Luca Martino Image Processing Laboratory, Universitat de València (Spain). Abstract Monte Carlo (MC) methods have become very popular in signal processing during

More information

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures

Multilevel spectral coarse space methods in FreeFem++ on parallel architectures Multilevel spectral coarse space methods in FreeFem++ on parallel architectures Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann DD 21, Rennes. June 29, 2012 In collaboration with

More information

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters

Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters Lattice Boltzmann simulations on heterogeneous CPU-GPU clusters H. Köstler 2nd International Symposium Computer Simulations on GPU Freudenstadt, 29.05.2013 1 Contents Motivation walberla software concepts

More information

New Facilities for Multiphysics Modelling in Opera-3d version 16 By Chris Riley

New Facilities for Multiphysics Modelling in Opera-3d version 16 By Chris Riley FEA ANALYSIS General-purpose multiphy sics design and analy sis softw are for a w ide range of applications OPTIMIZER A utomatically selects and manages multiple goalseeking algorithms INTEROPERABILITY

More information

Visualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka

Visualizing Big Data on Maps: Emerging Tools and Techniques. Ilir Bejleri, Sanjay Ranka Visualizing Big Data on Maps: Emerging Tools and Techniques Ilir Bejleri, Sanjay Ranka Topics Web GIS Visualization Big Data GIS Performance Maps in Data Visualization Platforms Next: Web GIS Visualization

More information

ww.padasalai.net

ww.padasalai.net t w w ADHITHYA TRB- TET COACHING CENTRE KANCHIPURAM SUNDER MATRIC SCHOOL - 9786851468 TEST - 2 COMPUTER SCIENC PG - TRB DATE : 17. 03. 2019 t et t et t t t t UNIT 1 COMPUTER SYSTEM ARCHITECTURE t t t t

More information

Advantages, Limitations and Error Estimation of Mixed Solid Axisymmetric Modeling

Advantages, Limitations and Error Estimation of Mixed Solid Axisymmetric Modeling Advantages, Limitations and Error Estimation of Mixed Solid Axisymmetric Modeling Sudeep Bosu TATA Consultancy Services Baskaran Sundaram TATA Consultancy Services, 185 Lloyds Road,Chennai-600086,INDIA

More information

Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method

Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method Efficiently solving large sparse linear systems on a distributed and heterogeneous grid by using the multisplitting-direct method S. Contassot-Vivier, R. Couturier, C. Denis and F. Jézéquel Université

More information

Institute of Structural Engineering Page 1. Method of Finite Elements I. Chapter 2. The Direct Stiffness Method. Method of Finite Elements I

Institute of Structural Engineering Page 1. Method of Finite Elements I. Chapter 2. The Direct Stiffness Method. Method of Finite Elements I Institute of Structural Engineering Page 1 Chapter 2 The Direct Stiffness Method Institute of Structural Engineering Page 2 Direct Stiffness Method (DSM) Computational method for structural analysis Matrix

More information

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco

Parallel programming using MPI. Analysis and optimization. Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Parallel programming using MPI Analysis and optimization Bhupender Thakur, Jim Lupo, Le Yan, Alex Pacheco Outline l Parallel programming: Basic definitions l Choosing right algorithms: Optimal serial and

More information

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff

Hydra. A library for data analysis in massively parallel platforms. A. Augusto Alves Jr and Michael D. Sokoloff Hydra A library for data analysis in massively parallel platforms A. Augusto Alves Jr and Michael D. Sokoloff University of Cincinnati aalvesju@cern.ch Presented at NVIDIA s GPU Technology Conference,

More information

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors Aachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/09-4 23/September/2010 The Algorithm of Multiple Relatively Robust Representations for Multi-Core Processors

More information

Unsteady CFD for Automotive Aerodynamics

Unsteady CFD for Automotive Aerodynamics Unsteady CFD for Automotive Aerodynamics T. Indinger, B. Schnepf, P. Nathen, M. Peichl, TU München, Institute of Aerodynamics and Fluid Mechanics Prof. Dr.-Ing. N.A. Adams Outline 2 Motivation Applications

More information

Shape Optimization of Revolute Single Link Flexible Robotic Manipulator for Vibration Suppression

Shape Optimization of Revolute Single Link Flexible Robotic Manipulator for Vibration Suppression 15 th National Conference on Machines and Mechanisms NaCoMM011-157 Shape Optimization of Revolute Single Link Flexible Robotic Manipulator for Vibration Suppression Sachindra Mahto Abstract In this work,

More information

What s the best data structure for multivariate polynomials in a world of 64 bit multicore computers?

What s the best data structure for multivariate polynomials in a world of 64 bit multicore computers? What s the best data structure for multivariate polynomials in a world of 64 bit multicore computers? Michael Monagan Center for Experimental and Constructive Mathematics Simon Fraser University British

More information

Toward High Performance Matrix Multiplication for Exact Computation

Toward High Performance Matrix Multiplication for Exact Computation Toward High Performance Matrix Multiplication for Exact Computation Pascal Giorgi Joint work with Romain Lebreton (U. Waterloo) Funded by the French ANR project HPAC Séminaire CASYS - LJK, April 2014 Motivations

More information

Some thoughts about energy efficient application execution on NEC LX Series compute clusters

Some thoughts about energy efficient application execution on NEC LX Series compute clusters Some thoughts about energy efficient application execution on NEC LX Series compute clusters G. Wellein, G. Hager, J. Treibig, M. Wittmann Erlangen Regional Computing Center & Department of Computer Science

More information

Computationally Efficient Analysis of Large Array FTIR Data In Chemical Reaction Studies Using Distributed Computing Strategy

Computationally Efficient Analysis of Large Array FTIR Data In Chemical Reaction Studies Using Distributed Computing Strategy 575f Computationally Efficient Analysis of Large Array FTIR Data In Chemical Reaction Studies Using Distributed Computing Strategy Ms Suyun Ong, Dr. Wee Chew, * Dr. Marc Garland Institute of Chemical and

More information

#SEU16. FEA in Solid Edge and FEMAP Mark Sherman

#SEU16. FEA in Solid Edge and FEMAP Mark Sherman FEA in Solid Edge and FEMAP Mark Sherman Realize innovation. FEMAP Continuous development with the same core team! Since 1985 there have been more than 35 releases of FEMAP with only one major architecture

More information

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI *

Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * Solving the Inverse Toeplitz Eigenproblem Using ScaLAPACK and MPI * J.M. Badía and A.M. Vidal Dpto. Informática., Univ Jaume I. 07, Castellón, Spain. badia@inf.uji.es Dpto. Sistemas Informáticos y Computación.

More information

Optimizing Time Integration of Chemical-Kinetic Networks for Speed and Accuracy

Optimizing Time Integration of Chemical-Kinetic Networks for Speed and Accuracy Paper # 070RK-0363 Topic: Reaction Kinetics 8 th U. S. National Combustion Meeting Organized by the Western States Section of the Combustion Institute and hosted by the University of Utah May 19-22, 2013

More information

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu

Performance Metrics for Computer Systems. CASS 2018 Lavanya Ramapantulu Performance Metrics for Computer Systems CASS 2018 Lavanya Ramapantulu Eight Great Ideas in Computer Architecture Design for Moore s Law Use abstraction to simplify design Make the common case fast Performance

More information

Preconditioned Parallel Block Jacobi SVD Algorithm

Preconditioned Parallel Block Jacobi SVD Algorithm Parallel Numerics 5, 15-24 M. Vajteršic, R. Trobec, P. Zinterhof, A. Uhl (Eds.) Chapter 2: Matrix Algebra ISBN 961-633-67-8 Preconditioned Parallel Block Jacobi SVD Algorithm Gabriel Okša 1, Marián Vajteršic

More information

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel?

CRYSTAL in parallel: replicated and distributed (MPP) data. Why parallel? CRYSTAL in parallel: replicated and distributed (MPP) data Roberto Orlando Dipartimento di Chimica Università di Torino Via Pietro Giuria 5, 10125 Torino (Italy) roberto.orlando@unito.it 1 Why parallel?

More information

SOLUTION of linear systems of equations of the form:

SOLUTION of linear systems of equations of the form: Proceedings of the Federated Conference on Computer Science and Information Systems pp. Mixed precision iterative refinement techniques for the WZ factorization Beata Bylina Jarosław Bylina Institute of

More information

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application

Administrivia. Course Objectives. Overview. Lecture Notes Week markem/cs333/ 2. Staff. 3. Prerequisites. 4. Grading. 1. Theory and application Administrivia 1. markem/cs333/ 2. Staff 3. Prerequisites 4. Grading Course Objectives 1. Theory and application 2. Benefits 3. Labs TAs Overview 1. What is a computer system? CPU PC ALU System bus Memory

More information

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique

Claude Tadonki. MINES ParisTech PSL Research University Centre de Recherche Informatique Claude Tadonki MINES ParisTech PSL Research University Centre de Recherche Informatique claude.tadonki@mines-paristech.fr Monthly CRI Seminar MINES ParisTech - CRI June 06, 2016, Fontainebleau (France)

More information

STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016

STCE. Adjoint Code Design Patterns. Uwe Naumann. RWTH Aachen University, Germany. QuanTech Conference, London, April 2016 Adjoint Code Design Patterns Uwe Naumann RWTH Aachen University, Germany QuanTech Conference, London, April 2016 Outline Why Adjoints? What Are Adjoints? Software Tool Support: dco/c++ Adjoint Code Design

More information

Large Scale Sparse Linear Algebra

Large Scale Sparse Linear Algebra Large Scale Sparse Linear Algebra P. Amestoy (INP-N7, IRIT) A. Buttari (CNRS, IRIT) T. Mary (University of Toulouse, IRIT) A. Guermouche (Univ. Bordeaux, LaBRI), J.-Y. L Excellent (INRIA, LIP, ENS-Lyon)

More information

Recent advances in HPC with FreeFem++

Recent advances in HPC with FreeFem++ Recent advances in HPC with FreeFem++ Pierre Jolivet Laboratoire Jacques-Louis Lions Laboratoire Jean Kuntzmann Fourth workshop on FreeFem++ December 6, 2012 With F. Hecht, F. Nataf, C. Prud homme. Outline

More information

Structural System, Machines and Load Cases

Structural System, Machines and Load Cases Machine-Induced Vibrations Machine-Induced Vibrations In the following example the dynamic excitation of two rotating machines is analyzed. A time history analysis in the add-on module RF-DYNAM Pro - Forced

More information

Parallelization of the Molecular Orbital Program MOS-F

Parallelization of the Molecular Orbital Program MOS-F Parallelization of the Molecular Orbital Program MOS-F Akira Asato, Satoshi Onodera, Yoshie Inada, Elena Akhmatskaya, Ross Nobes, Azuma Matsuura, Atsuya Takahashi November 2003 Fujitsu Laboratories of

More information

MSC Nastran N is for NonLinear as in SOL400. Shekhar Kanetkar, PhD

MSC Nastran N is for NonLinear as in SOL400. Shekhar Kanetkar, PhD MSC Nastran N is for NonLinear as in SOL400 Shekhar Kanetkar, PhD AGENDA What is SOL400? Types of Nonlinearities Contact Defining Contact Moving Rigid Bodies Friction in Contact S2S Contact CASI Solver

More information

Multi-Point Constraints

Multi-Point Constraints Multi-Point Constraints Multi-Point Constraints Multi-Point Constraints Single point constraint examples Multi-Point constraint examples linear, homogeneous linear, non-homogeneous linear, homogeneous

More information

Scalable and Power-Efficient Data Mining Kernels

Scalable and Power-Efficient Data Mining Kernels Scalable and Power-Efficient Data Mining Kernels Alok Choudhary, John G. Searle Professor Dept. of Electrical Engineering and Computer Science and Professor, Kellogg School of Management Director of the

More information

Sparse Polynomial Multiplication and Division in Maple 14

Sparse Polynomial Multiplication and Division in Maple 14 Sparse Polynomial Multiplication and Division in Maple 4 Michael Monagan and Roman Pearce Department of Mathematics, Simon Fraser University Burnaby B.C. V5A S6, Canada October 5, 9 Abstract We report

More information

World Shelters. U-Dome 200. Dome Shelter. Engineering Report: Dome Structure ER October South G St., Suite 3 Arcata, CA USA

World Shelters. U-Dome 200. Dome Shelter. Engineering Report: Dome Structure ER October South G St., Suite 3 Arcata, CA USA Page 1 of 30 ER-87496 World Shelters 550 South G St., Suite 3 Arcata, CA 95521 USA Telephone: +1-707-822-6600 Email: info @ worldshelters.org U-Dome 200 Dome Shelter Engineering Report: Dome Structure

More information

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints

Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Energy-efficient Mapping of Big Data Workflows under Deadline Constraints Presenter: Tong Shu Authors: Tong Shu and Prof. Chase Q. Wu Big Data Center Department of Computer Science New Jersey Institute

More information