Efficient vectorised Cuda kernels for high-order finite element flow solvers
2020-04-09T09:33:59Z (GMT) by
In this work, we develop efficient kernels for elemental operators of matrix-free solvers of the Helmholtz equation, which are the core operations for more complete Navier-Stokes solvers. We consider straight-sided and deformed quadrilateral elements from unstructured high-order meshes. We investigate two types of efficient CUDA kernels for a range of polynomial orders; a first type which maps each elemental operation to a CUDA-thread, and a second that maps each element to a CUDA-block. Our results show that the first option is beneficial for small elements with low polynomial order, whereas the second option is beneficial for larger elements. For both options we show the importance of the right layout of data structures, and analyse the effect of utilising different memory spaces on the GPU.