Presentation
Challenges and Opportunities in Running Kubernetes Workloads on HPC
Presenter
DescriptionCloud and HPC increasingly converge in hardware platform capabilities and specifications,
nevertheless still largely differ in the software stack and how it manages available resources.
The HPC world typically favors Slurm for job scheduling, whereas Cloud deployments rely on
Kubernetes to orchestrate container instances across nodes. Running hybrid workloads is possible by using bridging mechanisms that submit jobs from one environment to the other.
However, such solutions require costly data movements, while operating within the constraints
set by each setup's network and access policies. In this presentation, we introduce an container-based approach design that enables running unmodified Kubernetes workloads directly on HPC systems, by having users deploy their own private Kubernetes mini Cloud, which internally converts container lifecycle management commands to use the HPC system-level Slurm infrastructure for scheduling and Singularity/Apptainer as the container runtime. We consider this approach to be practical for deployment in HPC centers, as it requires minimal pre-configuration and retains existing resource management and accounting policies.
nevertheless still largely differ in the software stack and how it manages available resources.
The HPC world typically favors Slurm for job scheduling, whereas Cloud deployments rely on
Kubernetes to orchestrate container instances across nodes. Running hybrid workloads is possible by using bridging mechanisms that submit jobs from one environment to the other.
However, such solutions require costly data movements, while operating within the constraints
set by each setup's network and access policies. In this presentation, we introduce an container-based approach design that enables running unmodified Kubernetes workloads directly on HPC systems, by having users deploy their own private Kubernetes mini Cloud, which internally converts container lifecycle management commands to use the HPC system-level Slurm infrastructure for scheduling and Singularity/Apptainer as the container runtime. We consider this approach to be practical for deployment in HPC centers, as it requires minimal pre-configuration and retains existing resource management and accounting policies.
TimeWednesday, June 512:00 - 12:30 CEST
LocationHG E 1.1
Session Chair
Event Type
Minisymposium
Climate, Weather, and Earth Sciences
Engineering
Computational Methods and Applied Mathematics