Parallel Multigrid Methods for Non-linear, Time-dependent and Incompressible Fluid Flow

**Principal Investigator:**

Barbara Wohlmuth

**Affiliation:**

Lehrstuhl für Numerische Mathematik, Technische Universität München

**Local Project ID:**

pr74ne

**HPC Platform used:**

SuperMUC and SuperMUC-NG of LRZ

**Date published:**

Coupled multi physics problems describe a large class of physical phenomena. For a better understanding of such problems, large scale simulations are particularly valuable and important. In this project, the focus lies on developing new methods for solving large scale simulations of coupled non-linear and time-dependent problems. Particularly, the interest lies in the Navier–Stokes equations, coupled to a transport equation for the concentration, which describes diluted polymeric fluids. Moreover, the goal is to perform simulations for geodynamical model problems which involve non-linearities. This research is part of the SPPEXA project TerraNeo [5] in collaboration with Prof. Dr. Hans-Peter Bunge (Ludwig-Maximilian Universität München) and Prof. Dr. Ulrich Rüde (Friedrich-Alexander Universität Erlangen-Nürnberg).

For solving the arising partial differential equations, finite element methods are a popular discretization scheme, since they allow flexible and unstructured meshes. Additionally, and in contrast to finite differences, regularity requirements for the solution are lower. For the resulting systems of equations, preconditioned Krylov subspace methods, e.g., preconditioned conjugate gradient (CG) or preconditioned minimal residual, are widely used solvers. For large scale problems, CG allows a straightforward parallelization, but it suffers from not having an optimal complexity. Multigrid methods (MG), in contrast, have optimal complexity. Geometric MG relies on a given hierarchy of nested meshes, allowing an extremely high performance in terms of unknowns computed per second. However, in case of geometric MG, the generated coarse grids require a good implementation strategy to achieve an efficient parallel execution.

The HHG framework [7, 10], used here, is based on a distributed memory parallel communication model using the message passing interface (MPI). The unknowns of the refined meshes are grouped in data structures corresponding to the geometric primitives (i.e. elements, faces, edges, and vertices) of the input mesh. They are communicated and updated according to these groups and caching strategies are employed. Moreover, certain dependencies that exist between primitives of the same dimension are ignored for the sake of parallel efficiency, i.e., these dependencies are treated using a block Jacobi-like execution instead of a Gauß–Seidel fashion. This leads to the following processing structure: first all vertices of the input grid are updated, then all unknowns on the edges, followed by all faces, and finally all elements. Serendipitously, this modification has a negligible effect on smoothing properties and multigrid convergence. OpenMP parallelization is implemented for additional node-level parallelism. This hybrid parallelism can lead to advantages since it allows to reduce the number of MPI processes and increases the available amount of main memory that is available per process. Also within this project, the HHG framework has been redesigned, resulting in the HyTeG framework [11] which introduces more types of finite elements and achieves better generality and performance through code generation.

The project is split into two sub-projects which are presented separately, namely FLUID and GEO.

**Efficient multi-physics solver for incompressible fluid flow (FLUID)**

The FLUID sub-project aims to develop an efficient multi-physics solver for incompressible fluid flow coupled to a transport equation. For this purpose, the HHG software framework has been extended to solve multi-physics problems in this direction. It was already shown that it is possible to solve the Stokes problem with excellent scalability for large-size problems using HHG. Therefore, based on the efficient solver for the stationary Stokes problem [8, 9], preconditioners and iterative solvers for the time-dependent and non-linear generalized Navier-Stokes equations have been implemented. Such systems of equations typically arise in the study of polymeric fluids where the concentration of the polymers is quite low (2-3%). A typical example of such a fluid is blood plasma, which may be modeled by a Carreau-Yasuda type viscosity model depending on the concentration and the velocity gradient. For the spatial discretization, linear conforming finite elements on block-structured tetrahedral meshes are used. Since a discretization by equal-order finite elements does not satisfy the discrete inf-sup condition one has to either stabilize the method or use the so-called P1-iso-P2/P1 element in which different levels of refinement are used for the velocity and the pressure. The advective terms are discretized via an upwind scheme using finite volumes on the dual mesh with a coupling to the finite elements on the primal mesh. For the integration in time, the implicit-explicit (IMEX) splitting scheme [6] is used, in which the viscous parts of the differential equations are treated implicitly and the non-linear advective parts explicitly.

Applying this method results in a two-step process for each time-step. In a first step, the Navier-Stokes equations are solved with the non-linear term being moved to the right-hand side by using the solution from the previous time-step. Solving this system of equations corresponds to solving a Stokes problem. In a second step, the polymer concentration is being transported by the resulting velocity field along with some physical diffusion.

In order to study the method, benchmark simulations were performed. The largest simulations were conducted on SuperMUC Phase 1, considering arterial blood flow for patient specific arteries. Therefore, the simulation of a part of an artery affected by an aneurysm has been investigated. Geometries from arteries obtained from patients were used for this purpose. An illustration of one such artery can be found in Figure 1.

At the inflow boundary of this artery a parabolic velocity profile with a periodically changing peak value is prescribed. This mimics the velocity resulting from the pulsating heart beat; see Figure 2 for a plot of the first five periods. At the outflow boundaries, homogeneous Neumann boundary conditions are prescribed. In Figure 3, the streamlines of the velocity are illustrated. It can be observed that the aneurysm is introducing vortices which disturb the blood flow.

**Large scale simulations for geodynamical model problems (GEO)**

The GEO sub-project aims to develop new methods for simulating Earth mantle convection with large viscosity contrasts. As a prerequisite to solving such problems efficiently, Darcy type equations with highly variable coefficients were studied. For this purpose, a novel approach which can be used to assemble finite element stiffness matrices for linear finite elements on-the-fly and in a computationally efficient way was introduced. This approach modifies the stiffness matrix from the assembly with a constant coefficient by appropriately scaling the matrix components. Theoretical cost considerations show that this novel approach requires roughly only one third of the floating-point operations compared to a classical finite element assembly scheme with nodal integration. Numerical experiments verify the theoretical expectations with respect to accuracy and run-time. For details, please refer to [1].

Since these considerations only apply to scalar Darcy type equations, this work has been extended to the case of linear elasticity and the Stokes problems, in which the variable coefficient enters in the enhanced strain-displacement tensor [2]. The performance of this method has been analyzed on SuperMUC-NG by performing a roofline analysis and comparing it to conventional compressed matrix based approaches. In order to show the applicability of this method, a benchmark Stokes problem with 1.03·10^{11} degrees of freedom has been solved using 12288 compute cores on SuperMUC-NG. In this scenario, the newly introduced matrix-free method reduced the compute time by about 25%. Moreover, we performed numerical experiments with non-linear viscosity models, resulting in the streamlines and viscosity in a channel shown in Figure 4.

Since a computational and scalable solver for the Stokes equations is essential for efficient Earth mantle convection simulations, in [3,4,5], other novel matrix-free approaches for large scale geodynamical simulations in three dimensions have been developed and investigated. These new approaches are tested in a geophysical setup for which a temperature-dependent viscosity model is used and the asthenosphere, the more viscous upper part of the mantle, is modeled by a viscosity jump at around 410km depth. Temperature and plate velocity data are employed that originate from real world measurements. To study the performance, a weak scaling experiment has been performed on SuperMUC Phase I up to 47 250 cores. The global resolution of the Earth’s mantle increases to less than 1.7km in our experiments. To solve the arising linear system of equations, an all-at-once multigrid V-cycle scheme is applied and the residual is reduced by eight orders of magnitude. An overall parallel efficiency of more than 83% is achieved, as it can be observed in Table 1. In Figure 5, the results of our simulations are illustrated by showing velocity streamlines underneath the Himalaya mountain range.

Cores | DOFs | Global resolution | Time-to-solution | Parallel efficiency |

5 580 | 1.3x10^{11} | 3.4km | 1 347s | 1.00 |

12 000 | 2.7x10^{11} | 2.8km | 1 493s | 0.97 |

21 600 | 4.8x10^{11} | 2.3km | 1 468s | 0.92 |

47 250 | 1.1x10^{12} | 1.7km | 1 609s | 0.83 |

**Research Team**

Daniel Drzisga, Dr. Markus Huber, Prof. Dr. Barbara Wohlmuth (PI) – all: Lehrstuhl für Numerische Mathematik, Technische Universität München

**Publications**

[1] Bauer, S., Drzisga, D., Mohr, M., Rüde, U., Waluga, C. and Wohlmuth, B., 2018. A stencil scaling approach for accelerating matrix-free finite element implementations. *SIAM Journal on Scientific Computing*, *40*(6), pp.C748-C778.

[2] Drzisga, D., Rüde, U. and Wohlmuth, B., 2020. Stencil scaling for vector-valued PDEs on hybrid grids with applications to generalized Newtonian fluids. *To be published in SIAM Journal on Scientific Computing.*

[3] Bauer, S., Huber, M., Mohr, M., Rüde, U. and Wohlmuth, B., 2018, June. A new matrix-free approach for large-scale geodynamic simulations and its performance. In *International Conference on Computational Science* (pp. 17-30). Springer, Cham.

[4] Bauer, S., Huber, M., Ghelichkhan, S., Mohr, M., Rüde, U. and Wohlmuth, B., 2019. Large-scale simulation of mantle convection based on a new matrix-free approach. *Journal of Computational Science*, *31*, pp.60-76.

[5] Bauer, S., Bunge H.-P., Drzisga, D., Ghelichkhan, S., Huber, M., Kohl, N., Mohr, M., Rüde, U., Thönnes, D., Wohlmuth, W., 2020. TerraNeo—Mantle Convection Beyond a Trillion Degrees of Freedom – SPPEXA 2016-2019. In* Lecture Notes in Computational Science and Engineering, vol. 136.*

**Additional references**

[6] Ascher, U.M., Ruuth, S.J. and Spiteri, R.J., 1997. Implicit-explicit Runge-Kutta methods for time-dependent partial differential equations. Applied Numerical Mathematics, 25(2-3), pp.151-167.

[7] Bergen, B., Gradl, T., Hulsemann, F. and Rude, U., 2006. A massively parallel multigrid method for finite elements. Computing in science & engineering, 8(6), pp.56-62.

[8] Drzisga, D., John, L., Rude, U., Wohlmuth, B. and Zulehner, W., 2018. On the analysis of block smoothers for saddle point problems. SIAM Journal on Matrix Analysis and Applications, 39(2), pp.932-960.

[9] Gmeiner, B., Rüde, U., Stengel, H., Waluga, C. and Wohlmuth, B., 2015. Performance and scalability of hierarchical hybrid multigrid solvers for Stokes systems. SIAM Journal on Scientific Computing, 37(2), pp.C143-C168.

[10] Gmeiner, B., Rüde, U., Stengel, H., Waluga, C. and Wohlmuth, B., 2015. Towards textbook efficiency for parallel multigrid. Numerical Mathematics: Theory, Methods and Applications, 8(1), pp.22-46.

[11] Kohl, N., Thönnes, D., Drzisga, D., Bartuschat, D. and Rüde, U., 2019. The HyTeG finite-element software framework for scalable multigrid solvers. International Journal of Parallel, Emergent and Distributed Systems, 34(5), pp.477-496.

**Theses completed within this project**

Huber, M., 2018. Massively parallel and fault-tolerant multigrid solvers on peta-scale systems (Doctoral dissertation, Technische Universität München).

**Scientific Contact**

Daniel Drzisga

Lehrstuhl für Numerische Mathematik

Technische Universität München

Boltzmannstraße 3, D-85748 Garching (Germany)

e-mail: drzisga [@] ma.tum.de

*LRZ project ID: pr74ne*

*October 2020*