Cedana (/ce'dana/) is Save/Migrate/Resume (SMR) for containerized CPU and GPU workloads. Our solution sits between the linux kernel and your workloads to save the full state of your container enabling you to save, migrate and resume your workloads across instances and vendors.
Our API enables users to integrate this powerful capability into their clusters.
We are building a global, real-time system for compute that is designed to 1) maximize compute utilization by eliminating idle resources, 2) seamlessly access all available compute, 3) increase reliabiltiy and availability through node failures, and 4) support policy-based orchestration with workload-level SLAs. This changes how we allocate resources to high performance computing, numerical simulation, and training/running AI models.
Check out our documentation.