Abstract
In the steady-state Computational Fluid Dynamics (CFD) applications, a large percentage
of time is spent solving a large, sparse linear system of equations at every pseudo time-step.
This thesis focuses on improving the parallel performance of the linear solvers for solving
compressible flow CFD problems. The Alternating-Anderson Richardson (AAR) method was
recently proposed, which offers lower number of operations and number of global
communications, when compared to the Generalised Minimal Residual (GMRES) method.
However, its performance was investigated by solving matrices from matrix repository and
solving Helmholtz and Poisson equations with structured mesh. This thesis contributes by
investigating the performance of AAR in solving compressible flow problems. Results show
that the maximum speedup of AAR with respect to GMRES is around 17%. However, the
speedup increases with the number of linear iterations due to the slower growth of oper-
ations in AAR when compared to GMRES. Due to the load and cache misses imbalance,
the speedup of having lower number of global communications is only around 2% when 10
processes is used, but increases to a maximum speedup of around 10% when 640 processes
is used. In addition, this research investigates the efficiency and stability of mixed precision
linear solvers in solving compressible flow problems. The mixed precision arithmetic offers
lower cost of operations and lower cost of data transfer between memory hierarchy and
interconnect.
This thesis proposes a novel framework of inner-outer flexible GMRES (FGMRES), which is
named as the two-stage preconditioned FGMRES. The two-stage preconditioned FGMRES-
GMRES shows maximum speedup of around 70% with respect to GMRES in double preci-
sion and an additional speedup of around 50% can be achieved in mixed precision without
sacrificing the numerical stability of the linear solver. The speedup shown by the linear
solvers developed in this thesis allows faster CFD simulation of Hydra and thus reduces
the cost of turbomachinery design for Rolls-Royce. Future work should focus on imple-
menting the two-stage preconditioned FGMRES in GPUs by exploring the potential of
half-precision arithmetic. In addition, the potential of the two-stage preconditioned FGM-
RES can be explored in other applications, such as the linear analysis of aeroelasticity and
aerodynamic shape optimisation. These applications usually require the solution of the
linear systems of equations to be more accurate and thus larger number of linear iterations
is needed.