Scientific Computing on Supercomputers III

Scientific Computing on Supercomputers III Edited by J ozef T. Devreese and Piet E. Van Camp Universiteit Antwerpen Antwerpen, Belgium SPRINGER SCIENCE+BUSINESS MEDIA, LLC

Proceedings of the Sixth International Workshop on the Use of Supercomputers in Theoretical Science, held January 24-25, 1991, at Universiteit Antwerpen, Antwerpen, Belgium ISBN 978-1-4899-2583-1 ISBN 978-1-4899-2581-7 (ebook) DOI 10.1007/978-1-4899-2581-7 Springer Science+Business Media New York 1992 Originally published by Plenum Press, New York in 1992 Softcover reprint of the hardcover 1st edition 1992 All rights reserved No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, microfilming, recording, or otherwise, without written permission from the Publisher

PREFACE The International Workshop on "The Use of Supercomputers in Theoretical Science" took place on January 24 and 25, 1991, at the University of Antwerp (UIA), Antwerpen, Belgium. It was the sixth in a series of workshops, the first of which took place in 1984. The principal aim of these workshops is to present the state of the art in scientific large-scale and high speed-computation. Computational science has developed into a third methodology equally important now as its theoretical and experimental companions. Gradually academic researchers acquired access to a variety of supercomputers and as a consequence computational science has become a major tool for their work. It is a pleasure to thank the Belgian National Science Foundation (NFWO-FNRS) and the Ministry of ScientifIc Affairs for sponsoring the workshop. It was organized both in the framework of the Third Cycle "Vectorization, Parallel Processing and Supercomputers" and the "Governemental Program in Information Technology". We also very much would like to thank the University of Antwerp (Universitaire Instelling Antwerpen - VIA) for financial and material support. Special thanks are due to Mrs. H. Evans for the typing and editing of the manuscripts and for the preparation of the author and subject indexes. J.T. Devreese P.E. Van Camp University of Antwerp July 1991 v

CONlENTS High Perfonnance Numerically Intensive Applications on Distributed Memory Parallel Computers.... F.W. Wray Abstract.... 1. Introduction... 2 2. A Parallel Implementation of Gaussian Elimination.......... 4 3. The Parallel Solution of Tridiagonal Systems of Equations... 14 4. The Parallel Solution of Computational Fluid Dynamics Problems.............................. 24 5. Conclusions... 35 6. References... 36 Parallel Computational Fluid Dynamics on a Meiko Transputer System with Express in Comparison to ipsc Systems... 37 L. Beemaert, D. Roose and W. Verhoeven Abstract......................................... 37 1. Introduction... 37 2. Express... 38 3. Benchmark Results for Express on a Meiko Transputer System............................... 39 3.1. Computation benchmarks... 40 3.2. Communication benchmarks... 40 3.2.1. Nearest neighbour communication... 40 3.2.2. Multi-hop communication... 42 3.2.3. Message exchange... 44 3.2.4. Global communication... 46 4. Parallelization of a Fluid Dynamics Application... 46 4.1. The Euler equations and numerical solution techniques... 46 4.2. The test problem............................. 48 4.3. Solution methods: relaxation and multigrid... 49 5. Parallelization of the Code... 50 5.1. Relaxation solvers... 50 5.2. Multigrid solver... 52 5.3. Implementation details..., 53 vii

6. Timing and Efficiency Results... 54 6.1. Relaxation methods.........................., 54 6.1.1. Red-black point Gauss-Seidel relaxation... 54 6.1.2. Red-black line Gauss-Seidel relaxation.,... 56 6.2. Multigrid methods... 57 7. Conclusion..................................... 58 Acknowledgement.................................. 59 References....................................... 59 Preconditioned Conjugate Gradients on the PUMA Architecture... 61 R. Cook Abstract...... 61 1. Introduction... 61 2. Preconditioned Conjugate Gradient... 62 2.1. The algorithm... 62 2.2. Preconditioner... 63 2.3. Matrix vector multiplications... 64 2.4. Vector updates... 64 2.5. Dot products... 64 2.6. Parallel implementation........................ 65 3. Reformulation... "... 65 4. Timing Models... 68 4.1. Sparse matrix multiply... 69 4.2. Vector updates... "... 70 4.3. Dot product... 70 4.4. Preconditioned conjugate gradients... 72 4.5. Preconditioned conjugate gradients (reformulated)... 72 5. Conclusions... 74 References....................................... 75 Parallel Discrete Event Simulation: Opportunities and Pitfalls........... 77 E. Dirkx and F. Verboven Abstract......................................... 77 1. Introduction... 77 1.1. Discrete event simulation....................... 77 1.2. Modellization............................... 78 1.3. Implementation.............. 79 2. Discrete Event Simulation..."... 80 2.1. Event and time driven simulation... 80 2.2. Sequential event driven simulati.m... 80 2.3. Experimental results... 81 3. Parallel Discrete Event Simulation..................... 82 3.1. Parallel computer architectures... 82 3.2. Heuristics... 83 3.2.1. Algorithmic parallelism.................... 83 3.2.2. Farming... 84 3.3. Interconnection topology... 86 viii

4. Conclusion..................................... 87 Acknowledgements... 87 References....................................... 87 Parallel Programming on Amoeba Using Different Distributed Shared Memory...................................... 89 H.E. Bal, M.P. Kaashoek and A.S. Tanenbaum Abstract... 89 l. Introduction... 89 2. A Distributed Shared Memory Model Based on Shared Objects... 91 3. A RPC-Based Implementation... 92 3.l. The invalidation protocol... 93 3.2. The update protocol... 94 3.3. Performance... 94 4. A Multicast-Based Implementation... 95 4.l. Reliable multicast... 96 4.2. An update protocol using reliable multicasts... 96 4.3. Performance................................ 97 5. Example Applications and Their Performance... 98 5.1. The all-pairs shortest paths problem... 98 5.2. Branch-and-bound... 99 5.3. Successive overrelaxation... 100 6. A Comparison with Other DSM Systems... 101 7. Conclusions... 103 References...................................... 103 3D Shallow Water Model on the CRA Y Y-MP4/464... 107 E.D. de Goede Abstract........................................ 107 1. Introduction... 107 2. Mathematical Model... 108 3. Implementation................................. 109 4. Scalar and Vector Performance...................... 110 5. Parallelism.............. 110 6. Numerical Results............................... 111 7. Conclusions... 113 References...................................... 113 Simulating Compressible Flow on a Distributed Memory Machine....... 115 P. Batten, O. Tutty and J. Reeve Abstract........................................ 115 l. Introduction... 115 2. Software Tools and Current Hardware................. 116 2.1. The T800 transputer... 116 2.2. Transputer based machines..................... 117 2.3. CAD and domain decomposition tool... 117 2.4. The SHAPE router... 118 IX

2.5. Parallel mesh generator... 118 3. Shock Capturing... 118 3.1. Conservation form and the entropy condition........ 119 3.2. Total variation diminishing methods... 119 3.3. Simplified TVD schemes... ;................ 120 3.4. Extension to systems of equations (the Euler equations)... 122 4. Parallel Implementation... 123 4.1. Finite volume... 123 4.2. Artificial viscosity method..................... 124 4.3. Geometric parallelism... 124 4.4. The TVD method... 125 4.5. Future hardware... 126 4.6. Virtual channel router (VCR)................... 127 5. Summary and Results... 128 References...................................... 130 Principles of Code Optimization on Convex-C230.................. 133 F. Brosens Abstract........................................ 133 I. Introduction... 134 II. Basic Vector Concept... 138 III. Subarray Syntax (FORTRAN-XX)... 140 I1I.A. FORTRAN ARRAY LAYOUT in memory... 140 I1I.B. FORTRAN-XX ARRAY SECTIONS, WHERE and VECTOR statements... 141 I1I.B.l. Array section syntax.................... 141 I1I.B.2. VECTORIZATION and PARALLEL processing 146 IV. VECTORIZABLE DO-loops... 147 IV.A. Non-vectorizable statements in DO-loops... 148 IV.A.I. Recurrence... 148 IV.A.2. I/O statements........................ 148 IV.A.3. OOTO statements... 149 IV.A.4. Subprogram calls...................... 149 IV.A.5. Nested IF-blocks..................... 152 IV.B. VECTORIZATION of the CANDIDATES FOR VECTORIZATION... 152 IV.B.l. Vectorization of SCALAR references........ 152 IV.B.2. Vectorization of ARRAY references... 154 IV.B.3. Recurrence... 155 V. VECLIB Library... 160 V.A. DYNAMIC MEMORY allocation............... 160 V.A.l. DYNAMIC... 161 V.A.2. MALLOC... 163 V.A.3. NALLOC, RALLOC, DALLOC... 165 V.B. VECTOR programs provided by VECLIB... 165 VI. Some Worked Examples... 172 VI.A. Solution of a set of linear equations............. 172 VI.B. Polynomial evaluation....................... 175 x

VI.c. Integration with equally spaced abscissas... 176 VI.D. Gaussian quadrature...................... 177 VI.E. Chebychev approximation... 180 Conclusion...................................... 184 Acknowledgement..., 184 References.................................... 185 On the Vectorization and Parallelization of a Finite Difference Scheme... 187 R.I. van der Pas Abstract........................................ 187 1. Introduction... 187 2. The Convex C2 Architecture... 188 3. A Block Iterative Method.......................... 190 4. An Implementation... 194 5. Performance Considerations... 196 6. An Improved Implementation... 198 7. Conclusions... 206 8. Acknowledgements... 206 9. References... 206 Author Index... 207 Subject Index.. 211