EDGE-BASED DATA STRUCTURES FOR THE PARALLEL SOLUTION OF 3D STEADY INCOMPRESSIBLE FLUID FLOW WITH THE SUPG/PSPG FINITE ELEMENT FORMULATION.

Similar documents
3-2-1 ANN Architecture

Available online at ScienceDirect. Procedia Engineering 126 (2015 )

y=h B 2h Z y=-h ISSN (Print) Dr. Anand Swrup Sharma

A Sub-Optimal Log-Domain Decoding Algorithm for Non-Binary LDPC Codes

Finite element discretization of Laplace and Poisson equations

AS 5850 Finite Element Analysis

Dynamic Modelling of Hoisting Steel Wire Rope. Da-zhi CAO, Wen-zheng DU, Bao-zhu MA *

Effects of Couple Stress Lubricants on Pressure and Load Capacity of Infinitely Wide Exponentially Shaped Slider Bearing

is an appropriate single phase forced convection heat transfer coefficient (e.g. Weisman), and h

Large Scale Topology Optimization Using Preconditioned Krylov Subspace Recycling and Continuous Approximation of Material Distribution

Middle East Technical University Department of Mechanical Engineering ME 413 Introduction to Finite Element Analysis

Finite Element Models for Steady Flows of Viscous Incompressible Fluids

Symmetric centrosymmetric matrix vector multiplication

Characteristics of Gliding Arc Discharge Plasma

Difference -Analytical Method of The One-Dimensional Convection-Diffusion Equation

Einstein Equations for Tetrad Fields

1 Isoparametric Concept

Data Assimilation 1. Alan O Neill National Centre for Earth Observation UK

Analysis of potential flow around two-dimensional body by finite element method

A Propagating Wave Packet Group Velocity Dispersion

Addition of angular momentum

Elements of Statistical Thermodynamics

Use of Proper Closure Equations in Finite Volume Discretization Schemes

Homotopy perturbation technique

A NEW SIGNATURE PROTOCOL BASED ON RSA AND ELGAMAL SCHEME

dy 1. If fx ( ) is continuous at x = 3, then 13. If y x ) for x 0, then f (g(x)) = g (f (x)) when x = a. ½ b. ½ c. 1 b. 4x a. 3 b. 3 c.

(Upside-Down o Direct Rotation) β - Numbers

COMPUTATIONAL NUCLEAR THERMAL HYDRAULICS

TOPOLOGY DESIGN OF STRUCTURE LOADED BY EARTHQUAKE. Vienna University of Technology

2008 AP Calculus BC Multiple Choice Exam

Direct Approach for Discrete Systems One-Dimensional Elements

A robust multigrid method for discontinuous Galerkin discretizations of Stokes and linear elasticity equations

Rational Approximation for the one-dimensional Bratu Equation

Middle East Technical University Department of Mechanical Engineering ME 413 Introduction to Finite Element Analysis

Dynamic response of a finite length euler-bernoulli beam on linear and nonlinear viscoelastic foundations to a concentrated moving force

A Weakly Over-Penalized Non-Symmetric Interior Penalty Method

Physics 43 HW #9 Chapter 40 Key

Finite Element Simulation of Viscous Fingering in Miscible Displacements at High Mobility-Ratios

ME469A Numerical Methods for Fluid Mechanics

VSMN30 FINITA ELEMENTMETODEN - DUGGA

Full Waveform Inversion Using an Energy-Based Objective Function with Efficient Calculation of the Gradient

Quasi-Classical States of the Simple Harmonic Oscillator

Higher-Order Discrete Calculus Methods

NONCONFORMING FINITE ELEMENTS FOR REISSNER-MINDLIN PLATES

Nonlinear Bending of Strait Beams

3 Finite Element Parametric Geometry

Trigonometric functions

Exam 1. It is important that you clearly show your work and mark the final answer clearly, closed book, closed notes, no calculator.

A LOCALLY CONSERVATIVE LDG METHOD FOR THE INCOMPRESSIBLE NAVIER-STOKES EQUATIONS

22/ Breakdown of the Born-Oppenheimer approximation. Selection rules for rotational-vibrational transitions. P, R branches.

An Investigation on the Effect of the Coupled and Uncoupled Formulation on Transient Seepage by the Finite Element Method

University of Washington Department of Chemistry Chemistry 453 Winter Quarter 2014 Lecture 20: Transition State Theory. ERD: 25.14

A dynamic boundary model for implementation of boundary conditions in lattice-boltzmann method

In this lecture... Subsonic and supersonic nozzles Working of these nozzles Performance parameters for nozzles

4.2 Design of Sections for Flexure

Addition of angular momentum

Fourier Transforms and the Wave Equation. Key Mathematics: More Fourier transform theory, especially as applied to solving the wave equation.

Selective Mass Scaling (SMS)

ruhr.pad An entropy stable spacetime discontinuous Galerkin method for the two-dimensional compressible Navier-Stokes equations

Finite Strain Elastic-Viscoplastic Model

Uniformly stable discontinuous Galerkin discretization and robust iterative solution methods for the Brinkman problem

Introduction to Computational Fluid Dynamics: Governing Equations, Turbulence Modeling Introduction and Finite Volume Discretization Basics.

10. The Discrete-Time Fourier Transform (DTFT)

cycle that does not cross any edges (including its own), then it has at least

Finite Element Model of a Ferroelectric

Properties of Phase Space Wavefunctions and Eigenvalue Equation of Momentum Dispersion Operator

Linear Non-Gaussian Structural Equation Models

Lecture 37 (Schrödinger Equation) Physics Spring 2018 Douglas Fields

Recursive Estimation of Dynamic Time-Varying Demand Models

LINEAR DELAY DIFFERENTIAL EQUATION WITH A POSITIVE AND A NEGATIVE TERM

Journal of Computational and Applied Mathematics. An adaptive discontinuous finite volume method for elliptic problems

u 3 = u 3 (x 1, x 2, x 3 )

Machine Detector Interface Workshop: ILC-SLAC, January 6-8, 2005.

Kinetic Integrated Modeling of Heating and Current Drive in Tokamak Plasmas

Construction of Mimetic Numerical Methods

COHORT MBA. Exponential function. MATH review (part2) by Lucian Mitroiu. The LOG and EXP functions. Properties: e e. lim.

Discontinuous Galerkin methods

Estimation of apparent fraction defective: A mathematical approach

2.3 Matrix Formulation

INFLUENCE OF GROUND SUBSIDENCE IN THE DAMAGE TO MEXICO CITY S PRIMARY WATER SYSTEM DUE TO THE 1985 EARTHQUAKE

MAE4700/5700 Finite Element Analysis for Mechanical and Aerospace Design

Higher order derivatives

On the Design of an On-line Complex FIR Filter

A ROBUST NUMERICAL METHOD FOR THE STOKES EQUATIONS BASED ON DIVERGENCE-FREE H(DIV) FINITE ELEMENT METHODS

Extraction of Doping Density Distributions from C-V Curves

Reparameterization and Adaptive Quadrature for the Isogeometric Discontinuous Galerkin Method

A Comparative study of Load Capacity and Pressure Distribution of Infinitely wide Parabolic and Inclined Slider Bearings

Construction of asymmetric orthogonal arrays of strength three via a replacement method

Discontinuous Galerkin and Mimetic Finite Difference Methods for Coupled Stokes-Darcy Flows on Polygonal and Polyhedral Grids

Observer Bias and Reliability By Xunchi Pu

That is, we start with a general matrix: And end with a simpler matrix:

c 2007 Society for Industrial and Applied Mathematics

ELECTRON-MUON SCATTERING

STRESSES FROM LOADING ON RIGID PAVEMENT COURSES

A UNIFIED A POSTERIORI ERROR ESTIMATOR FOR FINITE VOLUME METHODS FOR THE STOKES EQUATIONS

Rotor Stationary Control Analysis Based on Coupling KdV Equation Finite Steady Analysis Liu Dalong1,a, Xu Lijuan2,a

Chapter 3 Lecture 14 Longitudinal stick free static stability and control 3 Topics

UNIFIED ERROR ANALYSIS

NEW APPLICATIONS OF THE ABEL-LIOUVILLE FORMULA

Apparent Power and Power Factor Measurement by Using FPGA-based Chip Considering Nonsinusoidal and Unbalanced Conditions

Transcription:

Procdings of t XXVI Ibrian Latin-Amrican Congrss on Computational Mtods in Enginring CILAMCE 2005 Brazilian Assoc. for Comp. Mcanics (ABMEC) & Latin Amrican Assoc. of Comp. Mtods in Enginring (AMC), Guarapari, Espírito Santo, Brazil, 19 t 21 st Octobr 2005 Papr CIL 12-0423 EDGE-BASED DATA STRUCTURES FOR THE PARALLEL SOLUTION OF 3D STEADY INCOMPRESSIBLE FLUID FLOW WITH THE SUPG/PSPG FINITE ELEMENT FORMULATION. Rnato N. Elias Marcos A. D. Martins Alvaro L. G. A. Coutino Rubns M. Sydnstrickr Cntr for Paralll Computations and Dpartmnt of Civil Enginring Fdral Univrsity of Rio d Janiro, P. O. Box 68506, RJ 21945-970 Rio d Janiro, Brazil {rnato, marcos, alvaro, rubns}@nacad.ufrj.br Abstract. T paralll dg-basd SUPG/PSPG finit lmnt formulation applid to 3D stady incomprssibl Navir-Stoks quations is prsntd. T igly coupld vlocityprssur nonlinar systm of quations is solvd wit an inxact Nwton-lik mtod. T locally linar systm of quations originatd by t inxact nonlinar mtod is solvd wit a nodal block diagonal prconditiond GMRES solvr. Matrix-vctor computations witin t GMRES solvr ar computd dg-by-dg, diminising floating point oprations, and indirct mmory addrssing and mmory rquirmnts. T paralll implmntation comprising mssag passing paralllism is addrssd and t rsults for t tr dimnsional lid drivn cavity flows and t laminar flow troug a rac car body ar discussd. Rsults av sown tat dg-basd data structur is mor fficint tan traditional lmnt-basd scms and t paralll inxact non-linar solvr as sown good prformanc and robustnss. Kywords. Edg-basd data structur, Paralll finit lmnts, Inxact Nwton mtods, Navir-Stoks quations. 1 INTRODUCTION W considr t simulation of stady incomprssibl fluid flow govrnd by Navir-Stoks quations using t stabilizd finit lmnt formulation as sown in Tzduyar (1991). Tis formulation allows tat qual-ordr-intrpolation vlocity-prssur lmnts ar mployd by introducing two stabilization trms: t Stramlin Upwind Ptrov-Galrkin (SUPG) and t Prssur Stabilizing Ptrov Galrkin stabilization (PSPG). Wn discrtizd, t incomprssibl Navir-Stoks quations giv ris to a fully coupld vlocity-prssur systm of nonlinar quations du t prsnc of convctiv trms in momntum quation. T inxact Nwton mtod (Dmbo t al., 1982) associatd wit a propr prconditiond itrativ Krylov solvr, suc as GMRES, prsnts an appropriatd framwork to solv nonlinar systms, offring a trad-off btwn accuracy and t amount of computational ffort spnt pr itration. Elmnt-basd data structurs av bn xtnsivly usd in t implmntation of itrativ solvrs. A wid class of prconditionrs spcially dsignd for lmnt-by-lmnt

(EBE) tcniqus can b found in litratur (Frncz and Hugs, 1998). Tis typ of data structur, bsids fully xploiting t sparsity of t systm, is suitabl for paralllization and vctorization, as matrix-vctor (matvc) products can b writtn in t form of a singl loop structur (Coutino t al., 1991). Anotr way of writing an fficint cod for matvc products is by using t Comprssd Storag Row (CSR) format (Saad, 1996), in wic only global nonzro cofficints ar stord. In t CSR format, t algoritm for matrix-vctor products rquirs a two-loop structur: an outr loop ovr t quations and an innr loop ovr ac nonzro contribution. In comparison wit EBE implmntations, tis format as t advantag of storing only global cofficints, but wit t ovrad of an innr loop tat can not b unrolld, spcially if unstructurd mss ar to b considrd. Edg-basd data structurs or EDE for sort can b viwd as a blnd of EBE and CSR formats: only global nonzro cofficints ar stord and t singl loop structur is maintaind. In t contxt of finit lmnts, tis typ of data structur as bn usd sinc t arly 1990s. Prair t al. (1993) and Lyra t al. (1995) usd dg-basd data structurs to comput t nodal balanc of fluxs for comprssibl flow. Luo t al. (1994) usd an dgbasd approac in t dvlopmnt of an upwind finit lmnt scm for t solution of t Eulr quations. Lönr (1994) sowd diffrnt ways of grouping dgs in t valuation of rsiduals for t Laplacian oprator, aiming to rduc t numbr i/a oprations. Mor rcntly, Ribiro t al. (2001) prsntd an dg-basd implmntation for stabilizd smidiscrt and spac-tim finit lmnt formulations for sallow watr quations. Otr rcnt works on t subjct includ tos of Coutino t al. (2001), wit applications to nonlinar solid mcanics, Catabriga and Coutino (2002), for t implicit SUPG solution of t Eulr quations, and Soto t al. (2004) for incomprssibl flow problms. It as bn sown by Ribiro and Coutino (2005) tat for unstructurd grids composd by ttradra, dg-basd data structurs offr mor advantags tan CSR, particularly for problms involving many dgrs of frdom. Wn daling wit larg scal problms t us of paralll solvrs is an ssntial condition and t us of an algoritm abl to run fficintly in sard, distributd or ybrid mmory systms as bn a motivation for many rsarcrs to turn t solvr stratgy mor indpndnt of t ardwar rsourcs. T prsnt papr is organizd as follows. Sctions 2 prsnt t SUPG/PSPG finit lmnt formulation and nonlinar solution adoptd. In sction 3 w dscrib t dg-basd data structur. Som paralll aspcts ar brifly introducd in sction 4 and t last two sctions prsnt t rsults for two bncmarks problms considrd and our final discussions. 2 GOVERNING EQUATIONS n Lt Ω sd b t spatial domain, wr n sd is t numbr of spac dimnsions. Lt Γ dnot t boundary of Ω. W considr t following vlocity-prssur formulation of t Navir-Stoks quations govrning stady incomprssibl flows: ( ) ρ u u f σ = 0 on Ω (1) u = 0 on Ω (2) wr ρ and u ar t dnsity and vlocity, σ is t strss tnsor givn as

T ssntial and natural boundary conditions associatd wit quations (1) and (2) can b imposd at diffrnt portions of t boundary Γ and rprsntd by, u = g on Γ g n σ= on Γ (3) (4) wr Γ g and Γ ar complmntary substs of Γ. Lt us assum following Tzduyar (1991) tat w av som suitably dfind finitdimnsional trial solution and tst function spacs for vlocity and prssur,,, S and Vp = Sp. T finit lmnt formulation of quations (1) and (2) using SUPG and PSPG stabilizations for incomprssibl fluid flows can b writtn as follows: Find p S p suc tat w V u and q V : ( ) ( ) ( ) p w ρ u u f dω+ ε w : σ p, u dω w dγ+ q u dω S u u V u S Ω Ω Γ Ω nl SUPG σ p, = 1 Ω nl 1 PSPG q ( ) ( p ) = 1 ρ σ Ω ( ) ( ) + τ u w ρ u u u ρf dω p u and + τ ρ u u, u ρf dω = 0 In t abov quation t first four intgrals on t lft and sid rprsnt trms tat appar in t Galrkin formulation of t problm (1)-(4), wil t rmaining intgral xprssions rprsnt t additional trms wic aris in t stabilizd finit lmnt formulation. Not tat t stabilization trms ar valuatd as t sum of lmnt-wis intgral xprssions, wr n l is t numbr of lmnts in t ms. T first summation corrsponds to t SUPG (Stramlin Upwind Ptrov/Galrkin) trm and t scond to t PSPG (Prssur Stabilization Ptrov/Galrkin) trm. W av calculatd t stabilization paramtrs according to Tzduyar t al. (1992), as follows: (5) τ SUPG 2 2 4ν τ u = PSPG = + 9 # # 2 ( ) 2 12 (6) Hr u is t local vlocity vctor, ν rprsnt t kinmatic viscosity and t lmnt # lngt is dfind to b qual to t diamtr of t spr wic is volum-quivalnt to t lmnt. T spatial discrtization of Eq. (11) lads to t following systm of nonlinar quations,

Nu ( ) + Nδ( u) + Ku ( G+ Gδ) p = f u T G u + N ( u) + G p = f p ψ ψ (7) wr u is t vctor of unknown nodal valus of u and p is t vctor of unknown nodal valus of p. T non-linar vctors N(u), N δ (u), N ψ (u), and t matrics K, G, G δ, and G ψ manat, rspctivly, from t convctiv, viscous and prssur trms. T vctors f u and f p ar du to t boundary conditions (3) and (4). T subscripts δ and ψ idntify t SUPG and PSPG contributions rspctivly. In ordr to simplify t notation w dnot by x = (u, p) a vctor of nodal variabls comprising bot nodal vlocitis and prssurs. Tus, Eq. (7) can b writtn as, Fx ( ) = 0 (8) wr Fx ( ) rprsnts a nonlinar vctor function. A particularly simpl scm for solving t nonlinar systm of quations (8) is a fixd point itration procdur known as t succssiv substitution, (also known as Picard itration, functional itration or succssiv itration) wic may b writtn as J0( xk ) xk+ 1 = r k (9) r t nonlinarity is valuatd at t known itrat x k wr J 0 is a first ordr approximation of t Jacobian J(x), r k is t rsidual vctor, and a non-symmtric linar systm must b formd and solvd at ac itration. Altoug t convrgnc rat of tis scm can b slow (its convrgnc rat is only asymptotically linar), t mtod convrgs for a fair rang of Rynolds numbrs. Tis, togtr wit its simplicity of implmntation and rlativity insnsitivity to t initial itrat x 0, accounts for its popularity and rcommnds its us for t solution of a wid varity of problms. Howvr, as obsrvd by Elias t al (2004), t fully coupld Jacobian may b numrically approximatd by a truncatd Taylor xpansion, rducing t numbr of nonlinar itrations. On drawback of nonlinar mtods is t nd to solv a local linar systm (9) at ac stag. Computing t xact solution using a dirct mtod can b xpnsiv if t numbr of unknowns is larg and may not b justifid wn x k is far from a solution (Dmbo t al., 1982). Tus, on migt prfr to comput som approximat solution, lading to t following algoritm: For k = 0 stp 1 until convrgnc do Find som ηk [0,1) AND sk tat satisfy rk + J0( xk ) sk+ 1 ηk r k updat xk+ 1 = xk + s k (10) for som η k [0,1), wr is a norm of coic. Tis formulation naturally allows t us of an itrativ solvr lik GMRES or BiCGSTAB: on first cooss η k and tn applis t itrativ solvr to (9) until a s k is dtrmind for wic t rsidual norm satisfis (10). In tis contxt η k is oftn calld t forcing trm, sinc its rol is to forc t rsidual of (9) to b

suitably small. T forcing trm can b spcifid in svral ways (s, Eisnstat and Walkr, 1996 for dtails) and in tis work t following dfinition was mployd η 2 2 k k k 1 = 0.9 F( x ) F( x ). (11) 3 EDGE-BASED DATA STRUCTURE Edg-basd finit lmnt data structurs av bn introducd for xplicit computations of comprssibl flow in unstructurd grids composd by ttradra (Prair t al. 1993; Luo and Lönr, 1994). It was obsrvd in ts works tat rsidual computations wit dg-basd data structurs wr fastr and rquird lss mmory tan standard lmnt-basd rsidual valuations. Following ts idas, Coutino t al (2001) and Catabriga and Coutino (2002) drivd dg-basd formulations rspctivly for lasto-plasticity and t SUPG formulation for inviscid comprssibl flows. Ty usd t concpt of disassmbling t finit lmnt matrics to build t dg matrics. For tr dimnsional viscoplastic flow problms on unstructurd mss, w may driv an dg-basd finit lmnt scm by noting tat t lmnt matrics can b disassmbld into tir dg contributions as, m J = T s (12) T s s= 1 wr is t contribution of dg s to J and m is t numbr of lmnt dgs pr lmnt (six for ttradral). For instanc, Figur 1 sows a ttradron and its dg disassmbling into its six dgs. s In Figur 1, ac T ij is a ndof ndof sub-matrix, wr ndof is t numbr of dgrs of frdom (four for fully coupld incomprssibl fluid flow). T arrows rprsnt t adoptd dg orintation. Dnoting by E t st of all lmnts saring a givn dg s, w may add tir contributions, arriving to t dg matrix, J s s E s = T (13) T rsulting matrix for incomprssibl fluid flow is non-symmtric, prsrving t structur of t dg matrics givn in Figur 1. Tus, in principl w nd to stor two off-diagonal 4 4 blocks and two 4 4 diagonal blocks pr dg. As w sall s blow tis storag ara can b rducd.

4 6 4 5 3 3 1 2 1 2 T T 1 1 1 T 11 T12 0 0 1 1 = T21 T22 0 0 0 0 0 0 0 0 0 0 Edg matrix 1 3 3 T 11 0 T13 0 = 0 0 0 0 T31 0 T33 0 0 0 0 0 Edg matrix 3 3 3 3 0 0 0 0 5 5 = 0 T22 0 T24 T 5 0 0 0 0 5 5 0 T42 0 T44 Edg matrix 5 J 11 J12 J13 J14 = J21 J22 J23 J24 J J31 J32 J33 J34 J41 J42 J43 J44 Elmnt stiffnss matrix 0 0 0 0 2 2 = 0 T22 T23 0 T 2 2 2 0 T32 T33 0 0 0 0 0 Edg matrix 2 T T 4 4 4 T 11 0 0 T14 = 0 0 0 0 0 0 0 0 4 4 T41 0 0 T44 Edg matrix 4 0 0 0 0 = 0 0 0 0 0 0 T33 T34 6 6 0 0 T43 T44 Edg matrix 6 6 6 6 Figur 1- Edg disassmbling of a ttradral finit lmnt for incomprssibl flow problms. Wn working wit itrativ solvrs lik GMRES, it is ncssary to comput spars matrixvctor products at ac itration. A straigtforward way to implmnt t dg-by-dg (or lmnt-by-lmnt) matrix-vctor product is, J p n l l = Jp l = 1 (14) wr n is t total numbr of local structurs (dgs or lmnts) in t ms and p l is t rstriction of p to t dg or lmnt dgrs-of-frdom. Considring tat w may add all t local nodal block diagonals in a global nodal block diagonal matrix and w may stor only t off-diagonal blocks (by dgs or lmnts), it is possibl to improv t matrix-vctor product (14) disrgarding rdundant nodal block diagonal cofficints for ac local structur (Catabriga and Coutino, 2002). Tis scm is an xtnsion of tat proposd by Gijzn (1995) wr is considrd only t main diagonal instad of t nodal block diagonal. Tus, t matrix-vctor product may b rwrittn as n l l l A (15) l = 1 Jp = B ( J ) p + J B ( J ) p

wr B(J) stors t nodal block diagonal of J and A is t assmbling oprator. In t abov matrix-vctor product t first sum involvs only global quantitis wil t scond is vry similar to t standard matrix-vctor product (14). Not tat w nd to stor only t off-diagonal blocks associatd to ac structur (dg or lmnt), tus rducing mmory rquirmnts. It is also important to obsrv tat in t cas of t dg-basd matrix-vctor product (15) only global cofficints ar involvd in t computations. T rsulting algoritm is, For ac local structur l do: Rcovr t global numbring of t local dgrs of frdom. l Gatring opration: p p Prform product: jp l l l l = ( ) J B J p l Scattring and accumulation oprations: jp jp + jp nd for l jp = B() J p + jp T B(J)p product prformd at nodal lvl follows t sam idas dscribd in t abov algoritm (gatr, local computations and scattr and accumulation). Considring t standard pointrs, localization matrics and dstination arrays of finit lmnt implmntations (Hugs, 1987) applid to dg data structurs, it is asy to vrify tat pointrs, localization matrics and dstination arrays corrspond to a two-nod lmnt, tat is, an dg. In Tabl 1 w compar t storag rquirmnts to old t cofficints of t lmnt and dg matrics as wll as t numbr of floating point (flop) and indirct addrssing (i/a) oprations for computing matrix-vctor products (15) using lmnt and dg-basd data structurs. Tabl 1. Mmory to old matrix cofficints and computational costs for lmnt and dgbasd matrix-vctor products for ttradral finit lmnt mss. Data structur Mmory flop i/a Elmnt 1056 nnods 2112 nnods 1408 nnods Edg 224 nnods 448 nnods 448 nnods All data in ts tabls ar rfrrd to nnods, t numbr of nods in t finit lmnt ms. According to Lönr (1994), t following stimats ar valid for unstructurd 3D grids, nl 5.5 nnods, ndgs 7 nnods. Clarly data in Tabl 1 favors t dg-basd scm. Howvr, compard to t lmnt data structur, t dg scm dos not prsnt a good balanc btwn flop and i/a oprations. Minimizing indirct addrssing and improving data locality ar major concrns to aciv ig prformanc on currnt procssors. Lönr and Gall (2002) and Coutino t al (2005) discuss svral nancmnts to t simpl dg scm introducd r, tat mitigat t ffcts of indirct addrssing and bad data locality. In t nxt sction w dscrib t dg and nod rordring tcniqus usd r, wic ar lss involvd tan tos of Coutino t al (2005).

4 PARALLEL IMPLEMENTATION T paralll inxact nonlinar solvr prsntd in t prvious sction was implmntd basd in t mssag passing paralllism modl (MPI). T original unstructurd grid was partitiond into non-ovrlappd sub-domains by t us of t METIS_PartMsDual routin providd by Mtis packag (Karypis and Kumar, 1998). Aftrwards, t partitiond data is rordrd to avoid mmory dpndncy wit a ms coloring algoritm and t data locality is improvd troug a Rvrs Cutill Mck algoritm. Finally, t nods sard among partitions ar indxd to b usd on mssag passing communications. Most of t computational ffort spnt during t itrativ solution of linar systms is du to valuations of matrix-vctor products or matvc for sort. In our tsts matvc oprations acivd 92% of t total computational costs for lmnt-by-lmnt and 60% for dg-by-dg data structur. In lmnt-by-lmnt (EBE) and dg-by-dg (EDE) data structurs tis task is mssag passing paralllizabl by prforming matvc oprations at ac partition lvl, tn assmbling t contribution of t intrfac quations calling MPI_AllRduc routin. Finally, it is important to not tat dg (and lmnt) matrix cofficints ar computd in singl DO-LOOPS also in ac partition. Figur 2 sktcs ow w av bn computing ybrid (sard and distributd paralllisms simultanously) matvc oprations. In tis figur w sow a ms wit two partitions and colord in disjoint blocks to avoid data dpndncy for vctorization, piplining and/or sard mmory paralllism purposs. Not tat by using pr-compilr dirctivs on can gnrat cod for sard only, distributd only and ybrid programming modls. Rsults will b prsntd r only for t distributd mmory modl du t caractristics of t ardwar mployd. Howvr, ivdp dirctivs and t ncssary ms coloring wr usd to improv Itanium-2 prformanc troug an ffctiv us of its piplins wit no data dpndncis. isid = 0 DO iblk = 1, ndblk nvc = idblk(iblk)!dir$ ivdp!$omp PARALLEL DO DO i = isid+1, isid+nvc, 1...MATVEC computations... ENDDO!$OMP END PARALLEL DO isid ENDDO = isid + nvc...ovr intrfac nods... #ifdf MPICODE call MPI_AllRduc #ndif Figur 2 Matrix-vctor product for distributd (Mtis partitioning) and sard paralllisms (ms coloring).

5 RESULTS In t following sctions two bncmark problms ar sown to valuat t prformanc improvmnts, du t dg-basd data structur, and accuracy of our implmntation. T first problm is t wll known lid drivn cavity problm xtndd for t tr dimnsional cas and t scond complis a ypottical laminar flow troug t body of a L Mans rac car. All computations wr mad on a SGI Altix 350 systm (Intl Itanium-2 CPUs wit 1.5 GHz and 24 Gb of NUMA flx mmory). T systm run Linux and Intl Fortran compilr 8.1. No optimizations furtr tos providd by standard compilr flags (-O3) wr mad. For all tsts t numrical procdur considrd a fully coupld u-p vrsion of t stabilizd formulation using linar ttradron lmnts. T paralll solvr is composd by an outr inxact-nwton loop and an innr nodal block diagonal prconditiond GMRES linar solvr. 5.1 Tr dimnsional laky lid-drivn cavity flow In tis wll known problm t fluid confind in a cubic cavity is drivn by t motion of a laky lid. Boundary conditions consist in a unitary vlocity spcifid along t ntir top surfac and zro vlocity on t otr surfacs. T modl built wit 5,151,505 ttradral lmnts; 6,273,918 dgs and 1,061,208 nods summarizing 4,101,106 quations is sown in Figur 3. Figur 3 Flow in a laky lid-drivn cavity. Gomtry and boundary conditions. Figur 4 sow t computd vrtical and orizontal vlocitis at t cntrlin for Rynolds 1000 (lft) and Rynolds 2000 (rigt). T rsults ar compard wit tos rcntly prsntd by Lo t al. (2005). W may obsrv tat all rsults ar in good agrmnt.

1.00 1.00 0.80 Lo t al. (2005) Tis work 0.80 Lo t al. (2005) Tis work 0.60 Lo t al. (2005) 0.60 Lo t al. (2005) 0.40 Tis work 0.40 Tis work Vrtical vlocity (V z ) 0.20 0.00-0.20-0.40 Vrtical vlocity (V z ) 0.20 0.00-0.20-0.40-0.60-0.60-0.80-0.80-1.00-1.00-0.80-0.60-0.40-0.20 0.00 0.20 0.40 0.60 0.80 1.00-1.00-1.00-0.80-0.60-0.40-0.20 0.00 0.20 0.40 0.60 0.80 1.00 Horizontal vlocity (V x ) Horizontal vlocity (V x ) Figur 4 Tr dimnsional lid drivn cavity flow validation (lft) R 1000 (rigt) R 2000 In Figur 5 w may obsrv t stramlins (lft) for t cas of R=1000 and t corrsponding prssur and vlocity filds (rigt) at t cavity midplan. Du to t particular boundary conditions tis problm prsnts strongly tr-dimnsional faturs. Figur 5 Stramlins (lft), prssur and vlocity at t midplan for R=1000 In Figur 6 w sow t wall clock tims for t EBE and EDE paralll solutions. Not tat t EDE solutions ar always fastr tan t EBE, but mor pronouncd for fw procssors. T EDE solution wit on procssor is fastr tan t EBE solution wit two procssors. Howvr, w obsrvd gains for t EDE solution until 16 procssors, wil for t EBE wall clock tims diminis until 32 procssors.

4000 3500 Elmnt Simpl dg 3000 Tim (sconds) 2500 2000 1500 1000 500 0 1 2 4 8 16 32 Numbr of CPUs Figur 6 Wall tim comparison for lmnt-by-lmnt (EBE) and dg-by-dg (EDE) data structurs 5.2 Laminar flow troug a L Mans rac car Tis problm consists on a ypottical tr-dimnsional simulation of a laminar flow troug t body of a L Mans rac car modl for robustnss and prformanc valuation purposs. T modl wit 10,264,863 lmnts; 1,858,246 nods; 12,434,433 dgs rsulting in 7,720,432 quations is sown in Figur 7. T computations wr carrid out up to t rlativ rsidual ( r / r o ) and t rlativ rror ( Δu / u ) dcrasd 5 ordrs of magnitud. Som qualitativ rsults ar prsntd in Figur 8 wit t stramlins around t car and prssur contour plottd ovr t car body. Figur 7 L Mans rac car modl and ms.

Figur 8 Rsults for t laminar flow troug t L Mans rac car Tabl 2 lists t total mmory, t mmory pr procssor and t wall clock tims for t EDE solution of tis problm wit an incrasing numbr of procssors. Sinc t SGI Altix provids us wit a singl imag of t oprational systm, w wr abl to run tis problm in only on procssor vn spnding mor mmory tan tat availabl for ac computing nod. For t systm mployd, ac computing nod is built wit 2 CPUs and 2 GB of mmory and SGI rfrs tis nod as a C-Brick part of t systm. As t numbr of procssors is incrasd, total mmory incrass, but t mmory pr procssor diminiss and racs t mmory availabl locally at ac SGI Altix 350 C-Brick In Figur 9 w sow t scald spdups and fficincis computd according to t Gustafson s law (s Gustafson t al., 1988 for dtails) and dfind by S s = n + (1-n) s, wr n is t numbr of procssors and s corrsponds to t normalizd tim spnt in t srial portion of t program. Tss rsults sow good paralll prformanc dspit t cas wit 2 procssors wr t paralll prformanc as falln down blow of xpctd. Tabl 2. Mmory and timings for t L Mans car problm. Total mmory Gb Mmory pr CPU Gb Wall tim :mm:ss 1 7.07 7.07 06:58:34 2 7.10 3.55 05:16:05 4 7.21 1.80 02:00:10 8 7.56 0.95 01:05:41 12 8.04 0.67 00:53:03

Scald spdup 14 13 12 11 10 9 8 7 6 5 4 3 2 1 Linar Masurd 1.49 1.00 3.85 7.75 Masurd 11.49 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Numbr of CPUs Scald fficincy (%) 120.00 100.00 80.00 60.00 40.00 20.00 0.00 100.00 74.51 96.36 96.89 95.74 1 2 4 8 12 Numbr of CPUs Figur 9 Paralll prformanc on SGI Altix 350 (lft) Scald spdup and (rigt) Scald paralll fficincy 0.00 1.20 700.0-1.00 1.00 GMRES tolranc tim (sconds) 600.0 rlativ rsidual (log) -2.00-3.00-4.00 GMRES tolranc 0.80 0.60 0.40 500.0 400.0 300.0 200.0 Tim (sconds) -5.00 0.20 100.0-6.00 0 4 8 12 16 20 24 28 32 36 Nonlinar itration 0.00 0 5 10 15 20 25 30 nonlinar itrations 0.0 Figur 10 Inxact nonlinar mtod bavior (lft) Rlativ nonlinar rsidual. (rigt) GMRES tolranc (bars) controlld by inxact nonlinar mtod and tim pr nonlinar itration (lins) In Figur 10 w may obsrv t bavior of t inxact nonlinar solution mtod. T stopping critrion was satisfid aftr 35 nonlinar itrations and t linar tolranc rangd from 0.99 to 0.075 at nonlinar itration 27. T first 15 itrations wr vry fast, and as soon as t rlativ rsidual starts to drop, t linar tolranc is tigtnd and oscillats until convrgnc. 6 CONCLUSIONS W av tstd t prformanc of an dg-basd inxact Nwton-typ algoritm to solv nonlinar systms of quations arising from t SUPG/PSPG finit lmnt formulation of stady incomprssibl flows. Numrical tsts in two 3D problms, t lid drivn cavity flow at Rynolds numbrs 1000 and 2000 and t flow around a L Mans car, av sown t cod good paralll prformanc in t SGI Altix. W av obsrvd tat t inxact mtods adapt t innr Krylov spac itrativ mtod tolranc to follow t solution progrss towards t solution. T cod supports sard, distributd and ybrid programming modls wit standard

libraris and intrfacs and tanks to t dg-basd data structurs, w av bn abl to solv complx flow problms involving 12 million of ttradra in modrat sizd macins. Acknowldgmnts T autors would lik to tank t financial support of t Ptrolum National Agncy (ANP, Brazil) and MCT/CNPq, T Brazilian Council for Scintific Rsarc. T Cntr for Paralll Computations (NACAD) at t Fdral Univrsity of Rio d Janiro and SGI Brazil providd t computational rsourcs for tis rsarc. REFERENCES Catabriga L and Coutino ALGA., 2002. Implicit SUPG solution of Eulr quations using dg-basd data structurs. Computr Mtods in Applid Mcanics and Enginring, 32:3477-3490. Coutino ALGA, Alvs JLD and Ebckn NFF, 1991. A study of implmntation scms for vctorizd spars EBE matrix-vctor multiplication. Advancs in Enginring Softwar and Workstations, 13(3):130-134. Coutino ALGA, Martins MAD, Alvs JLD, Landau L and Moras A, 2001. Edg-basd finit lmnt tcniqus for non-linar solid mcanics problms. Int. J. for Num. Mt. in Engrg, 50(9):2053-2068 Coutino ALGA, Martins MAD, Sydnstrickr R and Elias RN. Prformanc comparison of data rordring algoritms for spars matrix-vctor multiplication in dg-basd unstructurd grid computations, submittd Dmbo, RS, Eisnstat, SC and Stiaug, 1982 T. Inxact Nwton mtods, SIAM J. Numr. Anal., 19: 400-408. Eisnstat, SC and Walkr, H F, 1996. Coosing t forcing trms in inxact Nwton mtod. SIAM. J. Sci. Comput., 17(1): 16 32. Elias, R. N., Coutino, 2003, Inxact Nwton-typ mtods for non-linar problms arising from t supg/pspg solution of stady incomprssibl Navir-Stoks quations. J. of t Braz. Soc. of Mc. Sci. & Eng., 26(3): 330-339. Frncz RM and Hugs TJR, 1998. Itrativ finit lmnt solutions in solid mcanics. Handbook for Numrical Analysis, Vol. VI, ds, P.G. Ciarlt and J.L. Lions, Elsvir Scinc. Gijzn MB, 1995. Larg scal finit lmnt computations wit GMRES-lik mtods on a CRAY Y-MP. Applid Numrical Matmatics, 19:51-62. Gustafson, J. L., Montry, G. R. and Bnnr, R. E., 1988. Dvlopmnt of Paralll Mtods for a 1024-Procssor Hyprcub, SIAM J. on Sci. and Stat. Comp., 9(4):609-638 Hugs TJR, 1987. T finit lmnt mtod: linar static and dynamic finit lmnt analysis. Prntic-Hall. NJ. Karypis G. and Kumar V., 1998, Mtis 4.0: Unstructurd Grap Partitioning and Spars Matrix Ordring Systm. Tcnical rport, Dpartmnt of Computr Scinc, Univrsity of Minnsota, Minnapolis; (ttp://www.usrs.cs.umn.du/~karypis/mtis). Lo, DC, Murugsan, K and Young, DL, 2005. Numrical solution of tr-dimnsional vlocity-vorticity Navir-Stoks quations by finit diffrnc mtod. Int. J. Numr. Mt. Fluids, 47:1469-1487. Lonr R, 1994. Edgs, stars, suprdgs and cains. Computr Mtods in Applid Mcanics and Enginring, 111(3-4): 255-263.

Lönr, R., Gall, 2002. M. Minimization of indirct addrssing for dg-basd fild solvrs. Communications in Numrical Mtods in Enginring, 18: 335-343. Luo H, Baum JD, Lönr R, 1994. Edg-basd finit lmnt scm for t Eulr quations, AIAA Journal, 32(6):1183-1190. Lyra PRM, Manzari MT, Morgan K, Hassan O and Prair J, 1995. Sid-basd unstructurd grid algoritms for comprssibl viscous flow computations. Intrnational Journal for Enginring Analysis and Dsign. 2:88-102. Prair J, Piro J, Morgan K, 1993. Multigrid solution of t 3d-comprssibl Eulr quations on unstructurd grids. Int. J. Num. Mt. Engrg.. 36(6): 1029-1044. Ribiro FLB and Coutino ALGA, 2005. Comparison btwn lmnt, dg and comprssd storag scms for itrativ solutions in finit lmnt analyss. Int. J. Num. Mt. Engrg. 63(4): 569-588. Ribiro FLB, Galão AC and Landau L, 2001. Edg-basd finit lmnt mtod for sallow-watr quations. Int. J. for Num. Mt. in Fluids, 36: 659-685. Saad Y, 1996. Itrativ mtods for spars linar systms. PWS Publising Company: Boston. Soto O, Lönr R, Cbral JR and Camlli F, 2004. A stabilizd dg-basd implicit incomprssibl flow formulation. Computr Mtods in Applid Mcanics and Enginring, 193: 2139-2154. Tzduyar, TE, 1991. Stabilizd finit lmnt formulations for incomprssibl flow computations, Advancs in Applid Mcanics, 28: 1-44. Tzduyar, TE, Mittal S, Ray SE and Sin R, 1992. Incomprssibl flow computations wit stabilizd bilinar and linar qual-ordr-intrpolation vlocity-prssur lmnts. Comput. Mtods Appl. Mc. Engrg., 95: 221-242.