Releases: kubedl-io/kubedl
Releases · kubedl-io/kubedl
v0.5.0
Features
In this release, we have brought some major features that helps cluster admins to manage workloads easier and run more effciently.
- Enable data caching across different jobs and decouple lifecycle between job and cache system.
- Introduce job-coordinator to schedule and admit jobs in multi tenants queues.
- Introduce a new workload named ElasticBatch job, which abstracts offline inference jobs.
v0.4.3
v0.4.2
Changes Since v0.4.1
API Improvements
- Introduce job queue api (we'll implement job-level queue in next release)
Workloads
- support distributed communication style of torch-elastic both on normal container network/host network.
- upgrade vendor to k8s 1.21 to improve performance and other optimizations.
v0.4.1
Version v0.4.1 is a stable release, which introduces a lot of stability fixes, API improvements and code optimizations.
Changes Since v0.4.0
API Improvements
- Introduce modelPath, description, imageTag to Model/ModelVersion specification.
- Introduce CacheBackend to integrate with cloud native distributed cache systems for training jobs.
- Introduce Notebook to enable juypter virtual environment capability.
Workloads
- Bug fixes of MPIJob implementations.
- Bug fixes of Cron scheduling.
Runtime & Dashboard
- Optimize error and stack-tracing messages.
- Support volcano gang scheduler protocol.
- Remove authentic of dashboard backend.
- Set TerminationMessageFallbackToLogsOnError as default termination policy.
- Scale in extra pods/services when expected replicas decreases.
- Refactor to improve code reusability and robustness.
- Support failover by failed reasons.
v0.4.0
v0.3.0
v0.2.0
v0.1.0
v0.1.0
is the first formally release version of KubeDL
, including a list of stable features:
- Support running prevalent ML/DL workloads in a single operator.
- Support submitting a job with artifacts synced from remote source such as github without rebuilding the image.
- Support advanced scheduling features such as gang scheduling with pluggable backend schedulers.
- Instrumented with unified prometheus metrics for different types of DL jobs, such as job launch delay, current number of pending/running jobs.
- Support job metadata persistency with a pluggable storage backend such as Mysql.
- Enable specific workload type according to the installed CRDs automatically or through the startup flags explicitly.
- A modular architecture that can be easily extended for more types of DL/ML workloads with shared libraries, see how to add a custom job workload.
The official docker.io/kubedl/kubedl:v0.1.0
is hosted under dockerhub