Scientific Computing Group, Universität Hamburg (Germany)
Local Project ID:
HPC Platform used:
Hazel Hen of HLRS
Molecular dynamics simulations have become an important research and development tool in various process engineering applications. Amongst others, molecular dynamics allows to gain a better understanding of complex multicomponent mixtures, such as vapor-liquid interfacial systems, including bubble formation or droplet coalescence. The latter has, for example, impacts on fuel injection processes in combustion or in spray cooling.
Studying droplet coalescence at the molecular level is computationally very demanding for several reasons. Although nanometer-sized droplets appear to be very small, (up to) hundred millions of molecules are required to model the droplets and the surrounding vapor phase. The computation of molecular trajectories is typically carried out via time stepping with time step sizes in the order of femtoseconds. With droplet coalescence occurring at time scales of nanoseconds, a droplet coalescence study results in the computation of millions of time steps. This, together with the large number of molecules, implies an extreme computational load that demands for supercomputing power and, thus, the exploitation of hundred thousands of processors.
To exploit supercomputers in molecular dynamics, the computational domain is typically split into subdomains. Molecule trajectories are computed within each subdomain by a particular process, that is on a particular compute core or processor. With molecules being densely packed within the droplets on the one hand and populating the rest of the computational domain at rather low density on the other hand. With the droplets slowly merging, a uniform splitting and distribution of the computational domain on processes will result in computational load imbalances: processes handling the droplet regions will need to compute significantly more molecule trajectories than processes that are supposed to take care of vapor regions. Moreover, various algorithms are available to actually compute the molecule trajectories, with one or the other algorithm being favorable depending on, e.g., the local density and particle distribution.
Load-Balanced High-Performance Molecular Dynamics with Auto-Tuning
In this project, the highly optimized, massively parallel molecular dynamics software ls1 mardyn is extended and used to study various droplet coalescence scenarios. An auto-tuning extension is incorporated into ls1 mardyn, which—at runtime—detects and automatically switches to the best solution strategy. An example is shown in Figure 1, considering two variants of shared-memory parallelization for an evaporating droplet: after approx. 12000 time steps, the auto-tuning approach automatically switches from a coloring (c08) approach to a slicing (sli) approach, resulting in optimal compute time throughout the course of the simulation.
To achieve optimal load distribution among the processes, a load balancing algorithm based on k-d trees is employed in ls1 mardyn, cf. Figure 2. This algorithm recursively decomposes the computational domain, such that each resulting subdomain carries approximately the same computational load. This approach and corresponding communication routines between the processes is being improved in the project. This includes non-blocking computation of domain-global quantities of interest such as global pressure or energy values, as well as the improvement of molecule migration between neighboring processes through the eighth-shell method; the latter is work in progress.
The overall performance of ls1 mardyn has been tuned and investigated in scalability experiments on up to 7168 compute nodes of the supercomputer Hazel Hen at HLRS. Up to twenty trillion molecules could be simulated in a world-record simulation, corresponding to a five-fold increase in terms of the size of the molecular system, compared to previous large-scale scenarios. The simulation performed at a throughput rate of 189 billion molecule updates (i.e., advancing a molecule by one time step) per second and 1.33 Petaflops (in single-precision).
Future work will focus on the integration of all components, that is auto-tuning and the eighth-shell method, and carry out full production runs on selected droplet coalescence scenarios. Figure 3 shows an exemplary simulation, containing two argon droplets with a diameter of 50nm.
Related project: TaLPas: Task-based Load Balancing and Auto-Tuning in Particle Simulations. Funded by the Federal Ministry of Education and Research (BMBF), grant number 01IH16008, www.talpas.de
Further material and reading:
• N. Tchipev, S. Seckler, M. Heinen, J. Vrabec, F. Gratl, M. Horsch, M. Bernreuther, C.W. Glass, C. Niethammer, N. Hammer, B. Krischok, M. Resch, D. Kranzlmüller, H. Hasse, H.-J. Bungartz, P. Neumann. TweTriS: Twenty Trillion-atom simulation. Accepted for publication in International Journal of High Performance Computing Applications. 2018
• P. Neumann, N. Tchipev, S. Seckler, M. Heinen, J. Vrabec, H.-J. Bungartz. PetaFLOP Molecular Dynamics for Engineering Applications. Accepted for publication in High Performance Computing in Science and Engineering ’18, Transactions of the High Performance Computing Center Stuttgart. 2018
Dr. rer. nat. Philipp Neumann
Bundesstr. 45a, D-20146 Hamburg (Germany)
e-mail: philipp.neumann [@] uni-hamburg.de
HLRS Project ID: GCS-mddc