Releases · dstackai/dstack-enterprise

30 Jan 11:31

r4victor

0.18.38-v1

fa33161

0.18.38-v1 Latest

Latest

Intel Gaudi support

dstack now supports Intel Gaudi AI accelerators with SSH fleets. To use them with dstack first create an SSH fleet as usual* and then specify either accelerator name (i.e., Gaudi, Gaudi2, Gaudi3) or intel vendor in the run configuration:

type: dev-environment

python: "3.12"
ide: vscode

resources:
  gpu: gaudi2:8  # 8 × Gaudi 2

(*) Gaudi software and drivers should be already installed on the host, including drivers, hl-smi utility and Habana Container Runtime.

Fleets and Instances UI

The UI has been updated for easier fleet and instance management. The Fleets page now allows terminating fleets and can display both active and terminated fleets. The new Instances page displays active and terminated instances across all fleets.

Improved volume detachment

dstack users perviously reported cases when volumes were stuck in detaching state in AWS, and only manual forced detach allowed reusing the volumes. dstack now guarantees that the volumes are detached after the run terminates. If volumes stuck detaching, dstack automatically force detaches them after stop_duration (5 minutes by default). It also force detaches the volumes when you abort the run. You can also set stop_duration: off to turn off force detach and wait for soft detach indefinitely.

Note: Force detaching a volume is a last resort measure and may corrupt the file system. Contact your cloud support if you're experience volumes stuck in the detaching state.

We also fixed a number of bugs that could lead to dstack thinking a volume is attached to the instance even after it was successfully detached in the cloud.

What's Changed

Add Intel Gaudi support for SSH fleets by @un-def in dstackai/dstack#2216
Support models with non-standard finish_reason by @jvstme in dstackai/dstack#2229
[Internal]: Ensure all files end with a newline by @jvstme in dstackai/dstack#2227
[chore]: Refactor gateway modules by @jvstme in dstackai/dstack#2226
[chore]: Move connection pool to proxy deps by @jvstme in dstackai/dstack#2235
[chore]: Update migration ffa99edd1988 by @jvstme in dstackai/dstack#2217
[chore]: Update/remove dstack-proxy TODOs by @jvstme in dstackai/dstack#2239
[UI] New UI for fleets and instances by @olgenn in dstackai/dstack#2236
Improve UX when no offers found by @jvstme in dstackai/dstack#2240
Implement volumes force detach by @r4victor in dstackai/dstack#2242

Full Changelog: dstackai/dstack@0.18.37...0.18.38

Contributors

un-def, olgenn, and 2 other contributors

Assets 2

24 Jan 12:56

jvstme

0.18.37-v1

fa33161

0.18.37-v1

This release fixes a bug introduced in 0.18.36-v1.

What's Changed

[Docs]: Update Distributed Tasks docs by @jvstme in dstackai/dstack#2212
Fix Internal Server Error in services w/o gateways by @jvstme in dstackai/dstack#2225

Full Changelog: dstackai/dstack@0.18.36...0.18.37

Contributors

jvstme

Assets 2

24 Jan 12:56

jvstme

0.18.36-v1

fa33161

0.18.36-v1

Vultr

Cluster placement

The vultr backend can now provision fleets with cluster placement.

type: fleet

nodes: 4
placement: cluster

resources:
  gpu: 8:MI300X

backends: [vultr]

Nodes in such a cluster will be interconnected and can be used to run distributed tasks.

Performance

The update optimizes the performance of dstack server, allowing a single server replica to handle up to 150 active runs, jobs, and instances. Capacity can be further increased by using PostgreSQL and running multiple server replicas.

Last, getting instance offers from backends when you run dstack apply has also been optimized and now takes less time.

What's changed

Increase max active resources supported by server by @r4victor in dstackai/dstack#2189
Implement bridge network mode for jobs by @un-def in dstackai/dstack#2191
[Internal] Fix python-json-logger deprecation warning by @jvstme in dstackai/dstack#2201
Fix local backend by @r4victor in dstackai/dstack#2203
Implement offers cache by @r4victor in dstackai/dstack#2197
Add /api/instances/list by @jvstme in dstackai/dstack#2199
Allow getting by ID in /api/project/_/fleets/get by @jvstme in dstackai/dstack#2200
Add termination reason and message to the runner API by @r4victor in dstackai/dstack#2204
Add vpc cluster support in Vultr by @Bihan in dstackai/dstack#2196
Fix instance_types not respected for pool instances by @r4victor in dstackai/dstack#2205
Delete manually created empty fleets by @r4victor in dstackai/dstack#2206
Return repo errors from runner by @r4victor in dstackai/dstack#2207
Fix caching offers with GPU requirements by @jvstme in dstackai/dstack#2210
Fix filtering idle instances by instance type by @jvstme in dstackai/dstack#2214
Add more project URLs on PyPI by @jvstme in dstackai/dstack#2215

Full changelog: dstackai/dstack@0.18.35...0.18.36

Contributors

un-def, Bihan, and 2 other contributors

Assets 2

16 Jan 11:14

jvstme

0.18.35-v1

fa33161

0.18.35-v1

Vultr

This release features Vultr as a new backend. This cloud provider offers a wide range of NVIDIA and AMD accelerators, from affordable fractional GPUs to multi-GPU bare metal hosts.

> dstack apply -b vultr --gpu 1.. --region ewr

 #   BACKEND  REGION  RESOURCES                                      PRICE
 1   vultr    ewr     2xCPU, 8GB, 1xA16 (2GB), 50.0GB (disk)         $0.059
 2   vultr    ewr     1xCPU, 5GB, 1xA40 (2GB), 90.0GB (disk)         $0.075
 3   vultr    ewr     1xCPU, 6GB, 1xA100 (4GB), 70.0GB (disk)        $0.123
 ...
 18  vultr    ewr     32xCPU, 375GB, 2xL40S (48GB), 2200.0GB (disk)  $3.342
 19  vultr    ewr     24xCPU, 240GB, 2xA100 (80GB), 1400.0GB (disk)  $4.795
 20  vultr    ewr     96xCPU, 960GB, 16xA16 (16GB), 1700.0GB (disk)  $7.534
 21  vultr    ewr     96xCPU, 1024GB, 4xA100 (80GB), 450.0GB (disk)  $9.589

See the documentation for instructions on configuring Vultr in your project.

Vast.ai

Previously, the vastai backend only allowed using Docker images where root is the default user. This limitation has been removed, so you can now run NVIDIA NIM or any other image regardless of the user.

Backward compatibility

If you are going to configure the vultr backend, make sure you update all your dstack CLI and API clients to the latest version. Clients prior to 0.18.35 will not work when Vultr is configured.

What's changed

[dstack-shim] Revamp logging and CLI by @un-def in dstackai/dstack#2176
Download dstack-runner to a well-known location by @un-def in dstackai/dstack#2179
Add Vultr Support by @Bihan in dstackai/dstack#2132
Support non-root Docker images in Vast.ai by @jvstme in dstackai/dstack#2185
Refactor idle instance termination by @jvstme in dstackai/dstack#2188
Retry instance termination in case of errors by @jvstme in dstackai/dstack#2190
Update PyPI Development Status classifier by @jvstme in dstackai/dstack#2192
Add Vultr to Concepts and Reference pages by @Bihan in dstackai/dstack#2186

Full changelog: dstackai/dstack@0.18.34...0.18.35

Contributors

un-def, Bihan, and jvstme

Assets 2

09 Jan 12:36

jvstme

0.18.34-v1

fa33161

0.18.34-v1

Idle duration

If provisioned fleet instances aren’t used, they are marked as idle for reuse within the configured idle duration. After this period, instances are automatically deleted. This behavior was previously configured using the termination_policy and termination_idle_time properties in run or fleet configurations.

With this update, we replace these two properties with idle_duration, a simpler way to configure this behavior. This property can be set to a specific duration or to off for unlimited time.

type: dev-environment
name: vscode

python: "3.11"
ide: vscode

# Terminate instances idle for more than 1 hour
idle_duration: 1h

resources:
  gpu: 24GB

Docker

Previously, dstack had limitations on Docker images for dev environments, tasks, and services. These have now been lifted, allowing images based on various Linux distributions like Alpine, Rocky Linux, and Fedora.

dstack now also supports Docker images with built-in OpenSSH servers, which previously caused issues.

Documentation

The documentation has been significantly improved:

Backend configuration has been moved from the Reference page to Concepts→Backends.
Major examples related to dev environments, tasks, and services have been relocated from the Reference page to their respective Concepts pages.

Deprecations

The termination_idle_time and termination_policy parameters in run configurations have been deprecated in favor of idle_duration.

What's changed

[dstack-shim] Implement Future API by @un-def in dstackai/dstack#2141
[API] Add API support to get runs by id by @r4victor in dstackai/dstack#2157
[TPU] Update TPU v5e runtime and update vllm-tpu example by @Bihan in dstackai/dstack#2155
[Internal] Skip docs-build on PRs from forks by @r4victor in dstackai/dstack#2159
[dstack-shim] Add API v2 compat support to ShimClient by @un-def in dstackai/dstack#2156
[Run configurations] Support Alpine and more RPM-based images by @un-def in dstackai/dstack#2151
[Internal] Omit id field in (API) Client.runs.get() method by @un-def in dstackai/dstack#2174
[dstack-shim] Remove API v1 by @un-def in dstackai/dstack#2160
[Volumes] Fix volume attachment with dstack backend by @un-def in dstackai/dstack#2175
Replace termination_policy and termination_idle_time with idle_duration: int|str|off by @peterschmidt85 in dstackai/dstack#2167
Allow running sshd in dstack runs by @jvstme in dstackai/dstack#2178
[Docs] Many docs improvements by @peterschmidt85 in dstackai/dstack#2171

Full changelog: dstackai/dstack@0.18.33...0.18.34

Contributors

un-def, Bihan, and 3 other contributors

Assets 2

27 Dec 15:33

jvstme

0.18.33-v1

fa33161

0.18.33-v1

This update fixes TPU v6e support and a potential gateway upgrade issue.

What's Changed

Fix runtime version for TPU v6e by @r4victor in dstackai/dstack#2149
Update state.json migration on gateways by @jvstme in dstackai/dstack#2152
Optimize gateway startup and service update time by @jvstme in dstackai/dstack#2153

Full Changelog: dstackai/dstack@0.18.32...0.18.33

Contributors

r4victor and jvstme

Assets 2

27 Dec 15:32

jvstme

0.18.32-v1

fa33161

0.18.32-v1

TPU

Trillium (`v6e`)

dstack adds support for the latest Trillium TPU (v6e), which became generally available in GCP on December 12th. The new TPU generation doubles the TPU memory and bursts performance, supporting larger workloads.

Resources

dstack now includes CPU, RAM, and TPU memory in Google Cloud TPU offers:

$ dstack apply --gpu tpu

 #  BACKEND  REGION        INSTANCE     RESOURCES                                           SPOT  PRICE   
 1  gcp      europe-west4  v5litepod-1  24xCPU, 48GB, 1xv5litepod-1 (16GB), 100.0GB (disk)  no    $1.56   
 2  gcp      europe-west4  v6e-1        44xCPU, 176GB, 1xv6e-1 (32GB), 100.0GB (disk)       no    $2.97   
 3  gcp      europe-west4  v2-8         96xCPU, 334GB, 1xv2-8 (64GB), 100.0GB (disk)        no    $4.95

Volumes

By default, TPU VMs contain a 100GB boot disk, and its size cannot be changed. Now, you can add more storage using Volumes.

Gateways

In this update, we've greatly refactored Gateways, improving their reliability and fixing several bugs.

Warning

If you are running multiple replicas of the dstack server, ensure all replicas are upgraded promptly. Leaving some replicas on an older version may prevent them from creating or deleting services and could result in minor errors in their logs.

What's changed

[dstack-shim] Rework resource management by @un-def in dstackai/dstack#2093
[Gateways] Restore dstack-proxy state on gateway restarts by @jvstme in dstackai/dstack#2119
[TPU] Support TPU v6e by @r4victor in dstackai/dstack#2124
[UI] Updated Backend config Info section by @peterschmidt85 in dstackai/dstack#2125
[UI] It's not possible to manage fleets by @olgenn in dstackai/dstack#2126
[UI] Improvements by @olgenn in dstackai/dstack#2127
[Gateways] Add migration from state.json on gateways by @jvstme in dstackai/dstack#2128
[Volumes] Forbid deleting backends with active instances or volumes by @r4victor in dstackai/dstack#2131
[TPU] Fix backward compatibility with new TPUs by @r4victor in dstackai/dstack#2138
Update gpuhunt to 0.0.17 by @r4victor in dstackai/dstack#2139
[Docs] Improve docs by @r4victor in dstackai/dstack#2135
[Gateways] Fix certbot process getting stuck in dstack-proxy by @jvstme in dstackai/dstack#2143
[Gateways] Run dstack-proxy on gateways by @jvstme in dstackai/dstack#2136
[Volumes] Support volumes for TPUs by @r4victor in dstackai/dstack#2144
[Gateways] Optimize dstack-gateway installation time by @jvstme in dstackai/dstack#2146
[Gateways] Fix OpenAI endpoint on Kubernetes gateways by @jvstme in dstackai/dstack#2147

Full changelog: dstackai/dstack@0.18.31...0.18.32

Contributors

un-def, olgenn, and 3 other contributors

Assets 2

18 Dec 11:56

r4victor

0.18.31-v1

fa33161

0.18.31-v1

Assigning service account to GCP VMs

Like all major clouds, GCP supports running a VM on behalf of a managed identity using a service account. Now you can assign a service account to a GCP VM with dstack by specifying the vm_service_account property in the GCP config:

type: gcp
project_id: myproject
vm_service_account: [email protected]
creds:
  type: default

Assigning a service account to a VM can be used to access GCP resources from within runs. Another use case is using firewall rules that rely on the service account as the target. Such rules are typical for Shared VPC setups when admins of a host project can create firewall rules for service projects based on their service accounts.

`$HOME` improvements

Following support for non-root users in Docker images, dstack improves handling of users' home directories. Most importantly, the HOME environment variable is set according to /etc/passwd, and the home directory is created automatically if it does not exist.

The update opens up new possibilities including the use of an empty volume for /home:

type: dev-environment
ide: vscode
image: ubuntu
user: ubuntu
volumes:
  - volume-aws:/home

AWS Volumes with non-Nitro instances

dstack users previously reported AWS Volumes not working with some instance types. This is now fixed and tested for all instance types supported by dstack including older Xen-based instances like the P3 family.

Deprecations

The home_dir and setup parameters in run configurations have been deprecated. If you're using setup, move setup commands to the top of init.

What's Changed

[shim] Implement multi-task state by @un-def in dstackai/dstack#2078
Support AWS volumes for Xen-based instances by @r4victor in dstackai/dstack#2088
Handle empty user when processing image manifest by @un-def in dstackai/dstack#2090
[Docs] Move Reference to a separate page for more space and better st… by @peterschmidt85 in dstackai/dstack#2092
Init VirtualRepo when --no-repo specified by @r4victor in dstackai/dstack#2098
Add missing backends docs reference by @r4victor in dstackai/dstack#2099
Support gateway features in dstack-proxy by @jvstme in dstackai/dstack#2087
[Docs] Add Repos page inside Concepts to explain how repos work #2096 by @peterschmidt85 in dstackai/dstack#2097
Allow specifying vm_service_account in GCP config by @r4victor in dstackai/dstack#2110
[shim] Create HOME if missing by @un-def in dstackai/dstack#2109
Disallow remote network connections in tests by @un-def in dstackai/dstack#2111
[Docs] Add Developers page featuring community links, ambassador program, contributing links, etc #2103 by @peterschmidt85 in dstackai/dstack#2104
[Docs] Refactor the reference guide #2112 by @peterschmidt85 in dstackai/dstack#2113
Support tests that access db from a new thread by @r4victor in dstackai/dstack#2116
Deprecate home_dir and setup by @un-def in dstackai/dstack#2115

Full Changelog: dstackai/dstack@0.18.30...0.18.31

Contributors

un-def, r4victor, and 2 other contributors

Assets 2

12 Dec 11:10

r4victor

0.18.30-v1

fa33161

0.18.30-v1

AWS Capacity Reservations and Capacity Blocks

dstack now allows provisioning AWS instances using Capacity Reservations and Capacity Blocks. Given a CapacityReservationId, you can specify it in a fleet or a run configuration:

type: fleet
nodes: 1
name: my-cr-fleet
reservation: cr-0f45ab39cd64a1cee

The instance will use the reserved capacity, so as long as you have enough, the provisioning is guaranteed to succeed.

Non-root users in Docker images

Previously, dstack always executed the workload as root, ignoring the user property set in the image. Now, dstack executes the workload with the default image user, and you can override it with a new user property:

type: task
image: nvcr.io/nim/meta/llama-3.1-8b-instruct
user: nim

The format of the user property is the same as Docker uses: username[:groupname], uid[:gid], and so on.

Improved `dstack apply` and repos UX

Previously, dstack apply used the current directory as the repo that's made available within the run at /workflow. The directory had to be initialized with dstack init before running dstack apply.

Now you can pass --repo to dstack apply. It can be a path to a local directory or a remote Git repo URL. The specified repo will be available within the run at /workflow. You can also specify --no-repo if the run doesn't need any repo. With --repo or --no-repo specified, you don't need to run dstack init:

$ dstack apply -f task.dstack.yaml --repo .
$ dstack apply -f task.dstack.yaml --repo ../parent_dir
$ dstack apply -f task.dstack.yaml --repo https://github.com/dstackai/dstack.git
$ dstack apply -f task.dstack.yaml --no-repo

Specifying --repo explicitly can be useful when running dstack apply from scripts, pipelines, or CI. dstack init stays relevant for use cases when you work with dstack apply interactively and want to set up the repo to work with once.

Lightweight `pip install dstack`

pip install dstack used to install all the dstack server dependencies. Now pip install dstack installs only the CLI and Python API, which is optimal for use cases when a remote dstack server is used. You can do pip install "dstack[server]" to install the server or do pip install "dstack[all]" to install the server with all backends supported.

Breaking changes

pip install dstack no longer install the server dependencies. If you relied on it to install the server, ensure you use pip install "dstack[server]" or pip install "dstack[all]".

What's Changed

[chore]: Move run_async to _internal/utils by @jvstme in dstackai/dstack#2057
Move server deps to dstack[server] extra by @r4victor in dstackai/dstack#2058
Add user property to run configurations by @un-def in dstackai/dstack#2055
[Blog] Exploring inference memory saturation effect: H100 vs MI300x by @peterschmidt85 in dstackai/dstack#2061
[Internal]: Fix building docs in CI by @jvstme in dstackai/dstack#2063
[chore]: Drop unused gateway-related runner code by @jvstme in dstackai/dstack#2062
[shim] Clean up and document API by @un-def in dstackai/dstack#2060
Improve RESP API docs by @r4victor in dstackai/dstack#2064
Allow underscores in custom GCP tags by @r4victor in dstackai/dstack#2065
Make repo optional when submitting runs via HTTP API by @r4victor in dstackai/dstack#2066
Fix changing configuration type with dstack apply by @r4victor in dstackai/dstack#2070
Fix instances stuck in busy status by @r4victor in dstackai/dstack#2071
[Minor] If errors should be passed silently, then in pythonic way by @dimitriillarionov in dstackai/dstack#2075
AWS Capacity Reservation support by @solovyevt in dstackai/dstack#1977
[Blog] Beyond Kubernetes: 2024 recap and what's next for AI infra by @peterschmidt85 in dstackai/dstack#2074
Fix reservation property backward compatibility by @un-def in dstackai/dstack#2077
Fix ~/.ssh write permissions check by @r4victor in dstackai/dstack#2079
Fix errors exit codes in dstack apply by @r4victor in dstackai/dstack#2081
Fix RESERVATIONS display in fleets table by @r4victor in dstackai/dstack#2082
Support --repo, --no-repo, and autoinit in dstack apply by @r4victor in dstackai/dstack#2080
Support AWS partitioned volumes by @r4victor in dstackai/dstack#2084
[shim] Update OpenAPI doc by @un-def in dstackai/dstack#2085

New Contributors

@dimitriillarionov made their first contribution in dstackai/dstack#2075
@solovyevt made their first contribution in dstackai/dstack#1977

Full Changelog: dstackai/dstack@0.18.29...0.18.30

Contributors

un-def, solovyevt, and 4 other contributors

Assets 2

04 Dec 10:45

r4victor

0.18.29-v1

fa33161

0.18.29-v1

Support `internal_ip` for SSH fleet clusters

It's now possible to specify instance IP addresses used for communication inside SSH fleet clusters using the internal_ip property:

type: fleet
name: my-ssh-fleet
placement: cluster
ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/dstack/key.pem
  hosts:
    - hostname: "3.79.203.200"
      internal_ip: "172.17.0.1"
    - hostname: "18.184.67.100"
      internal_ip: "172.18.0.2"

If internal_ip is not specified, dstack automatically detects internal IPs by inspecting network interfaces. This works when all instances have IPs belonging to the same subnet and are accessible on those IPs. The explicitly specified internal_ip enables networking configurations when the instances are accessible on IPs that do not belong to the same subnet.

UX enhancements for `dstack apply`

The dstack apply command gets many improvements including more concise and consistent output and better error reporting. When applying run configurations, dstack apply now prints a table similar to the dstack ps output:

✗ dstack apply
 Project                main                                 
 User                   admin                                
 ...                                  

Submit a new run? [y/n]: y
 NAME           BACKEND          RESOURCES       PRICE     STATUS   SUBMITTED 
 spicy-tiger-1  gcp              2xCPU, 8GB,     $0.06701  running  14:52     
                (us-central1)    100.0GB (disk)                               

spicy-tiger-1 provisioning completed (running)

What's Changed

[UX]: live table when provisioning dstack configuration runs #1978 by @Tob-iee in dstackai/dstack#2036
Fix returning metrics from deleted runs by @jvstme in dstackai/dstack#2038
[UI] Migrate the chat components to the new CloudScape chat componets by @olgenn in dstackai/dstack#2044
Recover unreachable instances by @un-def in dstackai/dstack#2043
UX enhancements for dstack apply by @jvstme in dstackai/dstack#2045
Implement /api/fleets/list endpoint by @r4victor in dstackai/dstack#2050
Remove padding in dstack apply live tables by @jvstme in dstackai/dstack#2048
Fix typo in dstack attach --help by @jvstme in dstackai/dstack#2054
Support specifying internal_ip for SSH fleet hosts by @r4victor in dstackai/dstack#2056

New Contributors

@Tob-iee made their first contribution in dstackai/dstack#2036

Full Changelog: dstackai/dstack@0.18.28...0.18.29

Contributors

un-def, olgenn, and 3 other contributors

Assets 2

Releases: dstackai/dstack-enterprise

0.18.38-v1

Intel Gaudi support

Fleets and Instances UI

Improved volume detachment

What's Changed

Contributors

0.18.37-v1

What's Changed

Contributors

0.18.36-v1

Vultr

Cluster placement

Performance

What's changed

Contributors

0.18.35-v1

Vultr

Vast.ai

Backward compatibility

What's changed

Contributors

0.18.34-v1

Idle duration

Docker

Documentation

Deprecations

What's changed

Contributors

0.18.33-v1

What's Changed

Contributors

0.18.32-v1

TPU

Trillium (v6e)

Resources

Volumes

Gateways

What's changed

Contributors

0.18.31-v1

Assigning service account to GCP VMs

$HOME improvements

AWS Volumes with non-Nitro instances

Deprecations

What's Changed

Contributors

0.18.30-v1

AWS Capacity Reservations and Capacity Blocks

Non-root users in Docker images

Improved dstack apply and repos UX

Lightweight pip install dstack

Breaking changes

What's Changed

New Contributors

Contributors

0.18.29-v1

Support internal_ip for SSH fleet clusters

UX enhancements for dstack apply

What's Changed

New Contributors

Contributors

Trillium (`v6e`)

`$HOME` improvements

Improved `dstack apply` and repos UX

Lightweight `pip install dstack`

Support `internal_ip` for SSH fleet clusters

UX enhancements for `dstack apply`