Stochastic Modelling of Electron Transport on different HPC architectures www.hp-see.eu E. Atanassov, T. Gurov, A. Karaivan ova Institute of Information and Communication Technologies Bulgarian Academy of Science (emanouil, gurov, anet)@parallel.bas.bg Supported by SuperCA++, Grant #ДЦВП02/1 with NSF of Bulgaria
OUTLINE Bulgarian and regional HPC resourses Monte Carlo modelling of semiconductor devices Improvements to Monte Carlo Numerical results Conclusions and future work
Bulgarian HPC Infrastructure The biggest HPC resources for research in Bulgaria is the supersupercomputer IBM BlueGene/P with 8192 cores Two HPC clusters with Intel CPUs and Infiniband interconnection at IICT-BAS and IOCCP-BAS 8196 CPU cores 576 CPU cores 4x 480 GPU cores - vendors: HP and Fujitsu In addition GPU-enabled servers equipped with state of the art GPUs are available for applications that can take advantage of them. 1 Gb/s Ethernet fiber optics links between centers 1 Gbps 100 Mbps 800 CPU cores HPC Linux Cluster
Bulgarian HPC Resources HPC Cluster at IICT-BAS 3 chassis HP Cluster Platform Express 7000, 36 blades BL 280c, dual Intel Xeon X5560 @ 2.8Ghz (total 576 cores), 24 GB RAM 8 servers HP DL 380 G6, dual Intel X5560 @ 2.8 GHz, 32 GB RAM Fully non-blocking DDR Infiniband interconnection Voltaire Grid director 2004 nonblocking DDR Infiniband switch, 2 disk arrays with 96 TB, 2 lustre fs Peak performance 3.2 TF, achieved performance more than 3TF, 92% efficiency. HP ProLiant SL390s G7 Server with 4 M2090 graphic cards
Regional HPC Infrastructure HP-SEE project provides access to regional HPC centers: BlueGene/P in Romania, 4096 cores several HPC clusters with Infiniband one SMP machine with 1152 cores, 6 TB RAM, 10TF, Intel Xeon X7542 (Nehalem EX), @ 2.67GHz GPU capabilities being added in several installations.
Simulation of electron transport in semiconductors Application area: SET is developed for solving various computationally intensive problems which describe ultrafast carrier transport in semiconductors. Expected results and their consequences studies memory and quantum effects during the relaxation process due to electron-phonon interaction in semiconductors; present version explores electron kinetics in GaAs nano-wires. Studying the quantum effects that occur at nanometer and femtosecond scale have important scientific results - novel advanced methods, investigation of novel physical phenomena
Quantum-kinetic equation (inhomogeneous case) The integral form of the equation: Kernels:
Quantum-kinetic equation (cont.) Bose function: The phonon energy (ħω) depends on : Electron energy: The electron-phonon coupling constant according to Fröhlich polar optical interaction: The Fourier transform of the square of the ground state wave function:
Monte Carlo method Backward time evolution of the numerical trajectories Wigner function: Energy (or momentum) distribution: Density distribution:
Monte Carlo Method (cont.) Biased MC estimator: Weights: The Markov chain: Initial density function Transition density function:
Monte Carlo ξs[jg(f)] = g(z,kz,t)/pin(z,kz,t)w0fw,0(.,kz,0) + g(z,kz,t)/pin(z,kz,t) j=1swjαfw,0 (., kz,jα, tj), where fw,0(.,kz,jα,tj)= fw,0(z + h(kz,j 1,q z,j,tj 1,t j,tj),kz,j,tj), if α = 1, fw,0(z + h(kz,j 1,q z,j,tj 1,t j,tj),kz,j 1,tj), if α = 2 Wjα = Wj 1αKα(kzj 1,kj,tj,tj)/(pαptr(kj 1,kj,tj,tj)), W0α=W0=1, 2, j = 1,..., s. 1/N i=1 N (ξs[jg(f)])i Jg(f) α = 1,
Monte Carlo modelling of semiconductor devices The variance increases exponentially with respect to the relaxation time T. The application requires accumulating the results of billions of trajectories Improvements in variance and execution time can be achieved with low-discrepancy sequences (quasirandom numbers). The use of quasirandom numbers requires a robust and flexible implementation, since it is not feasible to ignore failures and missing results of some trajectories, unlike in Monte Carlo. GPU resources are efficient in computations using the lowdiscrepancy sequences of Sobol, Halton, etc. Variance reduction in case of pure MC can be achieved using different transition density functions.
Quasirandom approach We adopted a hybrid approach, where evolution times are sampled using modified Halton sequence, and space parameters are modeled using pseudorandom sequences Scrambled modified Halton sequence [Atanassov 2003]: xn(i) = j=0m imod (aj(i)kij+1 + bj(i),pi) pi j-1 (scramblers bj(i), modifiers ki in [0, pi 1] ) The use of quasirandom numbers offers significant advantage because the rate of convergence is almost O(1/N ) vs O(1/sqrt(N)) for regular pseudorandom numbers. The disadvantage is that it is not acceptable to lose some part of the computations and it therefore the execution mechanism should be more robust and lead to repeatable results.
Monte Carlo modelling of semiconductor devices Variance reduction approach because of the high variance, it is justified to study and optimize the transfer functions. Thus a parallel version of the genetic optimisation library galib was developed and succesfully run on the BlueGene/P. It was used to optimise the transfer function related to the evolution time (instead of constant). So far gains are not more than 20% but we are considering the possibility to optimise the other kernels, which are more complex and probably will lead to better results.
Monte Carlo modelling of semiconductor devices Various physically interesting quantities, expressed as linear functionals of the solution for the wigner function, can be computed. Example results for 175fs relaxation times
Numerical results Results on Blue Gene/P Cores Time Seconds 2048 3:21:22 12082 1024 6:31:38 23498
Numerical results Results with electric field, 180fs, on Intel X5560 @2.8Ghz, Infiniband cluster Nodes Cores Time Seconds Samples 8 64 7:03:43 25423 10^9 8 128 5:16:02 18962 10^9 16 128 3:31:45 12705 10^9 16 256 2:39:12 9552 10^9 1 1 27:07 1627 10^6
Numerical results Time evolution
Using cloud storage for results Users register at web portal and obtain access to cloud storage at IICT-BAS Access via windows or linux app Can use curl or libcurl clients from BlueGene/P home directory has 72G free and is 97% used
Status of GPU-based version Generators for the scrambled Sobol sequence and modified Halton sequence have been developed and tested. For Monte Carlo we use CURAND. Code tested on our PC cluster of GTX 295, our M2090 cards and Amazon EC2 nodes equiped with M2050 cards (2$ per hour). The code has been refactored to enable the main computations to be put in a GPU kernel function. One kernel, related to initialization of pseudo-random or quasirandom numbers, invoked once. Recent results the code compiles. What remains to be done verification, testing and performance tuning.
Conclusions and future work The code have excellent scalability on clusters and supercomputers. Considering that the problem at hand is highly CPU intensive, it is justified to attempt to tune the transition densities before moving to more demanding computations. Access to cloud storage provides simple security model (signed http requests) which also offers easy deployment across all the available architectures.