Improvement in Computational Efficiency for HCCI Engine Modeling by Using Reduced Mechanisms and arallel Computing Amr Ali *, Giulio Cazzoli, Song-Charng Kong, Rolf D. Reitz Engine Research Center, University of Wisconsin-Madison and Christopher J. Montgomery Reaction Engineering International Abstract Detailed chemistry was used with engine CFD code for HCCI engine combustion modeling in order to achieve a more accurate analysis. The present study improved the computational efficiency by using reduced mechanisms and parallel computing schemes. In the reduced mechanism, the number of species and reactions were reduced using a systematic approach by the introduction of quasi-steady-state assumptions. Results of using the reduced mechanism in the engine study agreed reasonably well with those of using the original mechanism. On the other hand, the CFD/Detailed Chemistry code was further parallelized using two different methods, OpenM and the Message-assing Interface (MI). The two methods have shown significant reduction in the computational time using multiple CUs. These results indicate that highly efficient HCCI engine simulations using detailed chemistry with CFD are attainable by using a combination of the above approaches. Introduction Homogeneous Charge Compression Ignition (HCCI) engines are being given much attention and research effort because of their potential to reduce engine emissions significantly and to increase engine efficiency, especially at part loads. HCCI operation is based on burning a homogeneous mixture under conditions that do not allow for high temperature combustion. The relatively low temperature combustion significantly reduces NOx emissions and the homogeneity of the mixture reduces soot emissions. HCCI homogeneous mixtures can be achieved by using a premixed mixture or by directly injecting the fuel into the cylinder. In the latter case, early or late injection should be used to increase the ignition delay period so the fuel gets enough time to mix with air. The advantage of fuel injection over premixing is to provide some control on the ignition timing. Controlling the start of ignition is one of the difficult problems with HCCI engines where uncontrolled combustion occurs when the mixture is compressed and the temperature increases, resulting in hot spots that start the ignition in many locations simultaneously. In this type of combustion knocking and very high combustion rates may result unless the mixture is diluted. This dilution is usually achieved using high airto-fuel ratios and/or high EGR rates. The high combustion rates generated in HCCI engines currently limit its efficiency to part loads. A review of HCCI engine research was given by Stanglmaier and Roberts (999). Various types of models have been developed to simulate HCCI combustion including zero-dimensional models (Dec, ), multi-zone models (Aceves et al.,, Fiveland and Assanis, ), and multidimensional CFD models coupled with detailed chemistry (Kong et al.,, ). The feasibility of each model varies, and the user needs to compromise between computer time and model accuracy. The combustion process in HCCI engines does not involve flame propagation as in Spark Ignition engines or flame diffusion, as in Diesel engines; therefore combustion in HCCI is mainly controlled by chemical kinetics. Thus a detailed chemistry mechanism is essential in modeling HCCI combustion. For accurate simulations of HCCI engines a detailed chemistry mechanism must be coupled with CFD calculations to account for temperature gradients in the cylinder and to capture the details of fuel spray and evaporation in the case of fuel injection. However, a coupled CFD and detailed chemistry simulation requires substantial amounts of memory and CU time which may be impractical with current computer capabilities. In the present paper two approaches are pursued to improve the computational efficiency of the coupled CFD and detailed chemistry models. In the first approach we * Corresponding author: aali@erc.wisc.edu Associated Web site: http://www.erc.wisc.edu/ h.d. student at DIEM, University of Bologna, Italy and currently visiting the Engine Research Center, University of Wisconsin-Madison.
minimize the size of the reaction mechanisms while retaining essential features of the detailed chemistry. The general idea of reducing complex kinetic schemes is by introducing the quasi-steady-state (QSS) assumption. Several systematic approaches have been proposed by eters and Kee (987), Smooke (99), Frenklach (99), eters and Rogg (993), etc. The number of species and reactions required for simulation of combustion processes depends on the nature of the autoignition and combustion processes that need to be reproduced by the reduced mechanisms. In the present study, a computer program that automates the reduction procedure was used to allow large mechanisms to be reduced quickly (Chen, 997). In the second approach, parallel computing is utilized to speed up the computations. Additional computational power is achieved in parallel computing by aggregating the capabilities of multiple processors. The use of Message-assing Interface (MI) in parallel computing to expedite the detailed chemistry calculations was recently reported by Senecal et al., 3. Two different methods of parallel computing are used in this paper, the Message-assing Interface MI (MI- Forum, 995, Jimack and Touheed, 3) and OpenM (Chandra et al.,, OpenM web site, 3). OpenM is designed basically for shared memory computer systems, while MI is a more general and standard approach that can be implemented in almost all computer systems. However, this generality of MI requires additional effort in explicit programming and implementation. A comparison of the code performance with the two methods implemented will be presented. Table Caterpillar engine specifications and conditions. Engine Caterpillar 3E SCOTE Bore Stroke 37.mm 5. mm Compression Ratio.: Displacement. Liters Connecting Rod Length. mm Squish Height.57 mm Combustion Chamber Geometry In-piston Mexican Hat with Sharp Edged Crater iston Articulated Charge Mixture Motion Quiescent Max. Injection ressure 9 Ma Number of Nozzle Holes Nozzle Hole Diameter. mm Included Spray Angle 3 Injection Rate Shape Rising Experiments All experiments were performed on a Caterpillar 3E single cylinder oil test engine, as described by Klingbeil et al. (). The fuel injector was a production style Caterpillar electronic unit injector (EUI). The test data at Mode (i.e., 8 rev/min, 5% load) were used for model validation. Detailed engine specifications and conditions are listed in Table. Engine simulation Model The present study used a multi-dimensional engine CFD code coupled with a detailed chemistry code. The CFD code is a version of KIVA-3V (Amsden, 997) with improvements in various physical and chemistry models developed at the Engine Research Center, University of Wisconsin-Madison. The major model improvements include the spray atomization, drop wall impingement, wall heat transfer, and piston-ring crevice flow models. The RNG k-ε turbulence model was used for in-cylinder flow simulations. A detailed reaction mechanism for n-heptane was used to simulate diesel fuel chemistry (Golovitchev, ). The CHEMKIN chemistry solver (Kee et al., 989) was integrated into the KIVA-3V code for solving the chemistry during multidimensional engine simulations. The chemistry and flow solutions were then coupled. Details of the model can be found in the original literature in which various HCCI engine configurations have been simulated, including premixed and direct-injection cases (Kong et al., 3). Automated Mechanism Reduction An interactive Computer Assisted Reduction Mechanism (CARM) code was used to generate the reduced mechanism for the present study (Chen, 997). CARM automates the following basic procedure of generating a reduced mechanism:() Identification of a short mechanism containing only the most essential species and reactions. () Identification of appropriate steady-state approximations. (3) Elimination of reactions through use of the algebraic equations obtained in the previous step. () Solution of the coupled and nonlinear set of algebraic equations obtained in the previous steps to find the reaction rates of the remaining species. As inputs, CARM uses an existing detailed mechanism and a set of test problem results representing conditions of interest to rank species by error induced by assuming they are in steady state. A reduced mechanism was generated along with a new subroutine CKWY that is used in CHEMKIN package to compute the reaction rates. Details of CARM can be found in the original literature (Chen, 997). The CARM code was used to reduce an n-heptane mechanism of species and 5 reactions. The final reduced mechanism consists of 5 species and steps as listed in Table. The reduced mechanism was first validated by predicting the ignition delay as a function of temperature under various initial conditions. Figure shows the example comparison of ignition delay predictions under typical engine conditions by both the original mechanism ( species) and the reduced
mechanism (5 species). Results of using a detailed mechanism by Curran et al. (998) and a curve-fitting correlation (EMIT model by Galligan, ) are also shown. Good agreements are seen. Table Reaction steps of the reduced mechanism HO=O+.5H HO+OH=O+HO HO=O+HO C7H+O=HO+C7H5O O+C7H5O=OH+O+CHO+C5HCHO O+CH=HO+CH3 CH+O=OH+CH3 CH+HO+CH3+C5HCHO=C7H+O+CO+CH C7H+O=CH+HO+C3H C7H+O=CH+HO+C3H+CH5 HO+C3H=O+CH+CH3.5H+C3H=.5CH+CH3 C3H=.5H+C3H5 C3H5=.5H+C3H.5H+OH+C3H=.5CH+CH3+CHO.5CH+OH+CH3+C3H=CO+.5H+CH CO+CH3O=CO+CH3 CH3O=.5H+CHO HO+CH3=OH+CH3O O+CH3=CH3O O+CH=OH+CH5 arallelization A performance study was carried out on the coupled KIVA/CHEMKIN code to identify the most time demanding parts of the code so parallelization can be applied incrementally starting with those time demanding parts without subverting the entire code. Code performance tools are provided by many modern microprocessors that are designed for high performance computing. Most of our computations were carried out on a SGI ORIGIN multiprocessor on which, SpeedShop is available for code performance measurement. It was found that for a D simulation, with 5 grid cells and an extended chemistry mechanism described by species and 5 reactions, 9% of the computational time was spent in the chemistry subroutines. When a sector mesh of degrees with 5 cells was simulated this percentage increased to 98%. These results indicated that a parallelization of the chemistry part alone could be almost as efficient as parallellizing the entire code especially for 3D detailed chemistry simulations. Thus, for the simulations with parallel computing shown in this paper the code starts the calculations with a single processor (the master processor) until it reaches the chemistry subroutines where the computational load is divided between the total number of available processors. After the chemistry calculations are completed the different processors return their part of the information to the master processor. This process is repeated for every time step, as is shown in Figure. Ignition Delay (ms) Ignition Delay (ms).. EGR=.%, bar,.3 EGR=%, p= bar, ф=.3..7.8.9...3..5. /T EGR=3%, bar,. EGR=3%, p= bar, ф=...7.8.9...3..5. /T detailed spec 5spec emit detailed spec Figure Comparisons of ignition delay (ms) predictions by various mechanisms for various EGR, pressure (p, bar) and equivalence ratios, φ. To Next Time Step Start Chemistry Subroutines End (Master). (Master) Figure Schematic of the parallel computing process. ( stands for processor). 5spec emit 3
According to Amdahl s law (Amdahl, 97) the theoretical speed-up (S) achieved by parallelizing a fraction (F) of the code is given by, S =, () F ( F) + N where N is the number of processors. The choices made about parallel computing are not only tied to the kind of job that must be done, but are also strongly associated with the kind of the architecture of the system where the job is run. The main division is made depending on how the memory is managed: is it centralized or distributed? The first architecture (known as shared memory) assumes that each processor equally shares access to all the system resources and has read and write access to all of the memory in the machine through a logically direct connection. Also in this class are distributed shared memory (DSM) systems (e.g., SGI Origin ). In the second architecture (known as distributed memory) each processor is assumed to have exclusive access to a particular part of the system memory-memory that is typically located physically near the processor. In this paper two methods are used for code parallelization: MI and OpenM. OpenM is comprised of a set of compiler directives that describe the parallelism in the source code, along with a supporting library of subroutines available to applications (Chandra et al., ). OpenM is designed primarily for shared memory multiprocessors. On the other hand, the message passing programming model has now been very effectively standardized by MI which is portable, widely available, and can be used with both distributed and shared memories. However, message passing is generally regarded as a difficult way to program. Results Application of Reduced Mechanism The reduced chemistry mechanism shown in Table was used to simulate HCCI combustion using the KIVA/CHEMKIN code. Since the environment inside the engine features time varying pressure and temperature, the mechanism needs to be further tested in such conditions. The computations used a.5-degree sector mesh assuming a homogeneous mixture. Figure 3 shows comparisons of computed cylinder pressure and heat release rate data using the original and reduced mechanisms. It can be seen that a good level of agreement was obtained although the reduced mechanism predicted a slightly earlier first-stage ignition. Evolutions of major combustion species including fuel, CO, and CO are shown in Figure. Note that the computer time was improved by a factor of.7 (from 3 to 7 minutes on an SGI Origin machine during the main combustion period, i.e., BTDC to ATDC) by using the present reduced mechanism. The speed-up is close to that expected due to the reduced number of species since /5=.. Cylinder ressure (Ma) 8 Original Mech Reduced Mech - - -3 - - Crank Angle (ATDC) 35 3 5 5 Figure 3 Comparisons of calculated cylinder pressure and heat release rate using both mechanisms. Total mass of fuel (g) Total Species Mass (g).7..5..3...5 Original Mech Reduced Mech -3-5 - -5 - -5 5 Crank Angle (ATDC).3.5..5. CO CO Original Mech Reduced Mech -3-5 - -5 - -5 5 Crank Angle (ATDC) Figure Comparisons of the total in-cylinder mass of fuel, CO and CO using the two mechanisms. 5 Heat Release Rate (J/deg)
arallel Computing To benchmark the performance of the KIVA/CHEMKIN code for the calculations carried out on single and multiprocessors using OpenM and MI, a.5-degree sector mesh with 5 cells is considered. The computations were started after intake valve closure (IVC) at 3 to + deg ATDC assuming a homogeneous mixture. The computations were carried out on a SGI ORIGIN with 3 MIS R 3 Mhz processors. Figure 5 shows how much faster the problem can be solved when using multiple processors. For each number of processors there are three columns representing the theoretical expected speed-up as predicted by Amdahl s law (equation with F=.9), the speed-up obtained by implementing OpenM, and the speed-up with MI implemented, respectively. Significant speed-up is obtained as shown in the figure (e.g., the code is times faster with 8 processors using OpenM). As the number of processors increases the speed-up obtained with both OpenM and MI becomes less than the theoretical speed-up. This is expected because as the number of processors increases the time spent in communication between processors, which is not accounted for in Amdahl s law, increases. OpenM shows better performance than MI with a speed-up very close to the theoretical one. This improved performance increases as the number of processors increases indicating that the communication time needed in OpenM, which is optimized for shared memories, is lower than that needed for MI. The column at the far right of the graph represents the theoretical speed-up if an infinite number of processors are used for the calculations. At this limit the computational time is the time spent in the serial part of the code since the computational time for the parallel part goes to zero at this limit. On the other hand, MI has the advantage of running on distributed memory systems as well. The same above simulation was also run on a cluster of C s (all 8 MHz entium III connected with a Mbit fast Ethernet, running Linux..). Figure compares the speed-up in the simulation using the Origin and the C cluster along with the theoretical speed-up predicted by Amdahl s law. Due to the differences between the two systems and the two Fortran compilers used by them, the chemistry subroutines (the parallelized part of the code) requires different fractions of the total computational time. This fraction was found to be 9% on the Origin and 95% on the C cluster. Thus, a theoretical curve (Amdahl s law) is shown in Figure for each computer system. The speed up obtained using the C cluster is higher than that obtained with the Origin, because the parallelized fraction of the code is larger. The performance of the code is also closer to the theoretical one on the C cluster. 3D simulations were also carried out for a -degree sector mesh of 5 cells with fuel injection starting at - deg ATDC and lasting for 7 degrees. The simulation starts at 3 deg ATDC and ends at 35 deg ATDC. Tables 3 and show the computational time and the actual and theoretical speed-up obtained using MI and OpenM, respectively, on the Origin for different numbers of processors. The tables show significant reduction in the computational time. For instance, using processors led to a reduction of the computational time from almost 9 days to less than one day. When combined with the factor of.7 speed-up with the use of reduced chemistry, these runs are now possible in a matter of hours. These results indicate that parallel computing is a very efficient means to improve the code efficiency and speed-up the calculations. Speed-Up 8 limit OpenM MI 8 Infinity No. of rocessors Figure 5 Comparison of speed-up obtained using OpenM and MI along with the theoretical speed-up expected as predicted by Amdahl s law. Speed-up 8 7 5 3 C-Theoretical C-Actual SGI-theoretical SGI-Actual 8 Number of rocessors Figure Comparison of speed-up obtained using MI on SGI Origin and C Cluster. Shown also are the theoretical speed-up expected as predicted by Amdahl s law. 5
Conclusions The computational efficiency of HCCI engine simulations was improved by using both reduced mechanisms and parallel computing. The reduced mechanism was generated by the interactive CARM code and was shown to predict ignition and combustion process reasonably well under HCCI engine conditions. The performance of KIVA/CHEMKIN code was evaluated showing that most of the computational time is spent in the chemistry subroutines. OpenM and MI were implemented successfully in the chemistry part of the code leading to significant reduction in the computational time. OpenM showed slightly better performance than MI on Silicon Graphics Origin shared memory machine especially as the number of processors increases. Table 3 Computational time and speed-up in 3D simulations using MI. No. of rocessors Computational Time (hours) Actual Theoretical. 7.7.9 5.5 3.8.53 8.7 9.3.7.9 Table Computational time and speed-up in 3D simulations using OpenM No. of rocessors Computational Time (hours) Actual Theoretical. 9.7.79.9 8 37.9 5.7 7. Acknowledgment The authors acknowledge the financial support of NSF under grant no. DMI-9593, Caterpillar Inc., DOE/Sandia National Laboratories, and TACOM. The authors also thank rof. J.-Y. Chen at University of California at Berkley for the help with the generation of the reduced mechanism. References Aceves, S. M., Flowers, D. L., Westbrook, C. K., Smith, J.R., itz, W., Dibble, R., Christensen, M. and Johansson, B., SAE aper --37 (). Amdahl, G., in roc. 97 AFIS Conf., volume 3, page 83. AFIS ress (97). Amsden, A. A. Technical Report LA-333-MS, Los Alamos National Laboratory (997). Chandra, R., Dagum, L., Kohr, D., Maydan, D., McDonald, J. and Menon, R., Morgan Kaufmann ublishers (). Chen, J.-Y., Workshop on Numerical Aspects of Reduction in Chemical Kinetics, CERMICS-ENC, Cite Descartes, Champus sur Marne, France, Dept. (997). Curran, H. J., Gauffuri,., itz, W. J., and Westbrook, C. K., Combustion and Flame. :9, 998. Dec, J. SAE aper --39 (). Fiveland, S. B. and Assanis, D. N. SAE --757 (). Frenklach, M., rogress in Astronautics and Aeronautics, 35, 9-5 (99). Galligan, D., hd Thesis, University of Wisconsin- Madison (). Golovitchev, V. L., http://www.tfd.chalmers.se/~valeri/mech.html. Chalmers Univ of Tech, Goteborg, Sweden (). Jimack,. K. and Touheed, N., DE Unit School of Computer Studies University of Leeds, http://www.scs.leeds.ac.uk/cpde/tutorial.html (3). Kee, R. J., Rupley, F. M. and Miller, J. A., Technical Report SAND 89-89, Sandia National Laboratories, Livermore, CA (989). Klingbeil, A. E. MS Thesis, University of Wisconsin- Madison (). Kong, S. C., Marriott, C. D., Reitz, R. D. and Christensen, M. SAE aper -- (). Kong, S. C. and Reitz, R. D. in 9 th International Symposium on Combustion (). Kong, S.C., atel, A., Yin, Q., Klingbeil, A. and Reitz, R.D. SAE 3--87 (3). MI-Forum, University of Tennessee, (995), http://www.mpi-forum.org/ OpenM web site, http://www.openmp.org, (3). eters, N. and Kee, R.J., Combust. Flame, 8:7 (987). eters, N. and Rogg, B., Reduced Reaction Mechanisms for Applications inn Combustion Systems, Lecture Notes in hysics, 5, Springer-Verlag (993). Senecal,. K., omraning, E., Richards, K. J., Briggs, T. E., Choi, C. Y., McDavid, R. M., and atterson, M. A., SAE aper 3--3 (3). Smooke, M. D., Reduced Kinetic Mechanisms and Asymptotic Approximations for Methane-Air Flames, Lecture Notes in hysics, 38, Springer-Verlag, -8 (99). Stanglmaier, R. H. and Roberts, C.E. SAE aper 999- -38 (999).