Presentation

P49 - The Task-Based GPU-Enabled Distributed Eigensolver available in DLA-Future
Presenter
DescriptionDLA-Future implements an efficient GPU-enabled distributed eigenvalue solver using asynchronous methods based on the C++ std::execution API. Using a task-based approach reduces the number of synchronization points and allows for simple overlapping of communication and computation which helps improve performance relative to fork join parallelism techniques as found in other libraries such as LAPACK and ScaLAPACK.
In certain cases when multiple algorithms with suitable problem sizes are run independently, they can be co-scheduled to run at the same time producing noticeable improvements in time to solution.
We present results of our task-based generalized eigensolver and show the current optimization status using both multicore-only and GPU-enabled systems (including both Nvidia and AMD devices). We also present full application results generated with CP2K and SIRIUS, where DLA-future support was easily added thanks to the C-API provided, which is compatible with the widely used ScaLAPACK interface.
In certain cases when multiple algorithms with suitable problem sizes are run independently, they can be co-scheduled to run at the same time producing noticeable improvements in time to solution.
We present results of our task-based generalized eigensolver and show the current optimization status using both multicore-only and GPU-enabled systems (including both Nvidia and AMD devices). We also present full application results generated with CP2K and SIRIUS, where DLA-future support was easily added thanks to the C-API provided, which is compatible with the widely used ScaLAPACK interface.
TimeTuesday, June 410:01 - 10:01 CEST
LocationHG F 30 Audi Max
Session Chair
Event Type
Poster