30 Sep 10:32

r4victor

d4ea467

0.18.16-v1

0.18.16

The update includes all the features and bug fixes from version 0.18.16.

New versioning policy

Starting with this release, dstack adopts a new versioning policy to provide better server and client backward compatibility and improve the upgrading experience. dstack continues to follow semver versioning scheme ({major}.{minor}.{patch}) with the following principles:

The server backward compatibility is maintained across all minor and patch releases. The specific features can be removed but the removal is preceded with deprecation warnings for several minor releases. This means you can use older client versions with newer server versions.
The client backward compatibility is maintained across patch releases. A new minor release indicates that the release breaks client backward compatibility. This means you don't need to update the server when you update the client to a new patch release. Still, upgrading a client to a new minor version requires upgrading the server too.

Perviously, dstack never guaranteed client backward compatibility, so you had to always update the server when updating the client. The new versioning policy makes the client and server upgrading more flexible.

Note: The new policy only takes affect after both the clients and the server are upgraded to 0.18.16. The 0.18.15 server still won't work with newer clients.

dstack attach

The CLI gets a new dstack attach command that allows attaching to a run. It establishes the SSH tunnel, forwards ports, and streams run logs in real time:

 ✗ dstack attach silent-panther-1
Attached to run silent-panther-1 (replica=0 job=0)
Forwarded ports (local -> remote):
  - localhost:7860 -> 7860
To connect to the run via SSH, use `ssh silent-panther-1`.
Press Ctrl+C to detach...

This command is a replacement for dstack logs --attach with major improvements and bugfixes.

CloudWatch-related bugfixes

The releases includes several important bugfixes for CloudWatchLogStorage. We strongly recommend upgrading the dstack server if it's configured to store logs in CloudWatch.

Deprecations

dstack logs --attach is deprecated in favor of dstack attach and may be removed in the following minor releases.

What's Changed

Check client-server compatibility according to new versioning policy by @r4victor in dstackai/dstack#1730
[runner] fix MonotonicTimestamp by @un-def in dstackai/dstack#1728
Gateway-in-server early prototype by @jvstme in dstackai/dstack#1718
Implement dstack attach command by @r4victor in dstackai/dstack#1733
Respect CloudWatch timestamp constraints by @un-def in dstackai/dstack#1732
Add AMD examples with vLLM, Axolotl and Trl by @Bihan in dstackai/dstack#1693
dstack-proxy naming tweaks by @jvstme in dstackai/dstack#1734
Fix Failed to attach via Python API by @r4victor in dstackai/dstack#1739
Support calling RunCollection.get_plan() without repo by @r4victor in dstackai/dstack#1741

Full Changelog: dstackai/dstack@0.18.15...0.18.16

Contributors

un-def, Bihan, and 2 other contributors

Assets 2

25 Sep 11:25

r4victor

0.18.15-v1

d4ea467

0.18.15-v1

0.18.15

The update includes all the features and bug fixes from version 0.18.15.

Cluster placement groups

Instances of AWS cluster fleets are now provisioned into cluster placement groups for better connectivity. For example, when you create this fleet:

type: fleet
name: my-cluster-fleet
nodes: 4
placement: cluster
backends: [aws]

dstack will automatically create a cluster placement group and use it to provision the instances.

On-prem and VM-based fleets improvements

All available Nvidia driver capabilities are now requested by default, which makes it possible to run GPU workloads requiring OpenGL/Vulkan/RT/Video Codec SDK libraries. (dstackai/dstack#1714)
Automatic container cleanup. Previously, when the run completed, either successfully or due to an error, its container was not deleted, which led to ever-increasing storage consumption. Now, only the last stopped container is preserved and is available until the next run is completed. (dstackai/dstack#1706)

Major bug fixes

Fixed a bug where under some conditions logs wouldn't be uploaded to CloudWatch Logs due to size limits. (dstackai/dstack#1712)
Fixed a bug that prevented running services on on-prem instances. (dstackai/dstack#1716)

Changelog

Fix cli connection issue with TPU by @Bihan in dstackai/dstack#1705
Rename --default to --yes and no-default to --no in dstack config and dstack server by @peterschmidt85 in dstackai/dstack#1709
[CI] Fix shim/runner release versions by @un-def in dstackai/dstack#1704
Document run diagnostic logs by @r4victor in dstackai/dstack#1710
[shim] Add old container cleanup routine by @un-def in dstackai/dstack#1706
Write events to CloudWatch in batches by @un-def in dstackai/dstack#1712
[shim] Request all Nvidia driver capabilities by @un-def in dstackai/dstack#1714
Added showing dstack version on the UI by @olgenn in dstackai/dstack#1717
Add missing project SSH key to on-prem instances by @un-def in dstackai/dstack#1716
Simplify handling missing GatewayConfiguration by @jvstme in dstackai/dstack#1724
[shim] Fix container logs processing by @un-def in dstackai/dstack#1721
Support AWS placement groups for cluster fleets by @r4victor in dstackai/dstack#1725

Full Changelog: dstackai/dstack@0.18.14...0.18.15

Contributors

un-def, olgenn, and 4 other contributors

Assets 2

18 Sep 10:17

r4victor

0.18.14-v1

d4ea467

0.18.14-v1

0.18.14

The update includes all the features and bug fixes from version 0.18.14.

Multi-replica server deployment

Previously, the dstack server only supported deploying a single instance (replica). However, with 0.18.14, you can now deploy multiple replicas, enabling high availability and zero-downtime updates

Note

Multi-replica server deployment requires using Postgres instead of the default SQLite. To configure Postgres, set the DSTACK_DATABASE_URL environment variable.

Make sure to update to version 0.18.14 before configuring multiple replicas.

Major bug-fixes

[Bugfix] dstack init --git-identity doesn't accept backslashes in path on Windows by @un-def in dstackai/dstack#1686
[Bugfix] Use -tmpfs /dev/shm:rw,nosuid,nodev,exec,size=X instead of --shm-size=X @un-def in dstackai/dstack#1690
[Bugfix] dstack-shim is not updated when fleet is recreated by @un-def in dstackai/dstack#1698

Other

[Bugfix] Fix SSHAttach.reuse_ports_lock() when no grep matches by @un-def in dstackai/dstack#1700
[Bugfix] Fix logger exception on instance provisioning timeout by @un-def in dstackai/dstack#1697
[Internal] Add JobProvisioningData.base_backend by @r4victor in dstackai/dstack#1682
[Internal] Add Run.error by @r4victor in dstackai/dstack#1684
[Internal] Return server_version in /api/server/get_info by @r4victor in dstackai/dstack#1685
[Internal] Allow gateway to connect to replicated server by @jvstme in dstackai/dstack#1688
[Internal] Adjust gateway management for multiple server replicas by @r4victor in dstackai/dstack#1691
[Internal] Skip gateway update if gateway was updated recently by @r4victor in dstackai/dstack#1695
[Internal] Remove redundant logger.error by @r4victor in dstackai/dstack#1702

Full changelog: dstackai/dstack@0.18.13...0.18.14

Contributors

un-def, r4victor, and jvstme

Assets 2

11 Sep 14:29

peterschmidt85

0.18.13-v1

d4ea467

0.18.13-v1

0.18.13

The update includes all the features and bug fixes from version 0.18.13.

Windows

You can now use the CLI on Windows (WSL 2 is not required).

Ensure that Git and OpenSSH are installed via Git for Windows.

During installation, select Git from the command line and also from 3-rd party software
(or Use Git and optional Unix tools from the Command Prompt), and Use bundled OpenSSH checkboxes.

Spot policy

Previously, dev environments used the on-demand spot policy, while tasks and services used auto. With this update, we've changed the default spot policy to always be on-demand for all configurations. Users will now need to explicitly specify the spot policy if they want to use spot instances.

Troubleshooting

The documentation now includes a Troubleshooting guide with instructions on how to report issues.

Changelog

[UX] Add Windows support by @un-def in dstackai/dstack#1675
[UX] Changed the default spot_policy to on-demand by @r4victor in dstackai/dstack#1657 and dstackai/dstack#1660
[UI] Minor UI improvements by @olgenn in dstackai/dstack#1658
[UX] Check SSH keys when SSH fleet creation before submission by @r4victor in dstackai/dstack#1661
[Docs] Add TPU examples with Optimum TPU and vLLM by @Bihan in dstackai/dstack#1663
[Troubleshooting] Do not auto-delete failed instances by @r4victor in dstackai/dstack#1665
[Docs] Document SQLite to Postgres migration by @r4victor in dstackai/dstack#1678
[Internal] Implement Postgres locking by @r4victor in dstackai/dstack#1651
[Internal] Refactor SSHTunnel by @jvstme in dstackai/dstack#1669
[Internal] Replace String with Text for long database columns by @r4victor in dstackai/dstack#1677
[Internal] Take advisory lock on server init by @r4victor in dstackai/dstack#1674

All commits: dstackai/dstack@0.18.12...0.18.13

Contributors

un-def, olgenn, and 3 other contributors

Assets 2

04 Sep 12:47

peterschmidt85

0.18.12-v1

d4ea467

0.18.12-v1

0.18.12

The update includes all the features and bug fixes from version 0.18.12.

Features

Added support for ECDSA and Ed25519 keys for on-prem fleets by @swsvc in #1641

Major bugfixes

Fixed the order of CloudWatch log events in the web interface by @un-def in #1613
Fixed a bug where CloudWatch log events might not be displayed in the web interface for old runs by @un-def in #1652
Prevent possible server freeze on SSH connections by @jvstme in #1627

Other changes

[CLI] Show run name before detaching by @jvstme in #1607
Increase time waiting for OCI Bare Metal instances by @jvstme in #1630
Update lambda regions by @r4victor in #1634
Change CloudWatch group check method by @un-def in #1615
Add Postgres tests by @r4victor in #1628
Fix lambda tests by @r4victor in #1635
[Docs] Fixed a bug where search included non-existing pages that land to 404 by @peterschmidt85 in #1646
[Docs] Introduce the Providers page by @peterschmidt85 in #1653
[Docs] Update RunPod & DataCrunch setup guides by @jvstme in 1608
[Docs] Add information about run log storage by @un-def in #1621
[Internal] Update packer templates docs by @jvstme in #1619

Full changelog: dstackai/dstack@0.18.11...0.18.12

Contributors

un-def, r4victor, and 3 other contributors

Assets 2

22 Aug 12:57

peterschmidt85

0.18.11-v1

d4ea467

0.18.11-v1

0.18.11

The update includes all the features and bug fixes from version 0.18.11.

AMD

With the latest update, you can now specify an AMD GPU under resources. Below is an example.

type: service
name: amd-service-tgi

image: ghcr.io/huggingface/text-generation-inference:sha-a379d55-rocm
env:
  - HUGGING_FACE_HUB_TOKEN
  - MODEL_ID=meta-llama/Meta-Llama-3.1-70B-Instruct
  - TRUST_REMOTE_CODE=true
  - ROCM_USE_FLASH_ATTN_V2_TRITON=true
commands:
  - text-generation-launcher --port 8000
port: 8000

resources:
  gpu: MI300X
  disk: 150GB

spot_policy: auto

model:
  type: chat
  name: meta-llama/Meta-Llama-3.1-70B-Instruct
  format: openai

Note

AMD accelerators are currently supported only with the runpod backend. Support for on-prem fleets and more backends
is coming soon.

GPU vendors

The gpu property now accepts the vendor attribute, with supported values: nvidia, tpu, and amd.

Alternatively, you can also prefix the GPU name with the vendor name followed by a colon, for example: tpu:v2-8 or amd:192GB, etc. This change ensures consistency in GPU requirements configuration across vendors.

Encryption

dstack now supports encryption of sensitive data, such as backend credentials, user tokens, etc. Learn more on the reference page.

Storing logs in AWS CloudWatch

By default, the dstack server stores run logs in ~/.dstack/server/projects/<project name>/logs. To store logs in AWS CloudWatch, set the SERVER_CLOUDWATCH_LOG_GROUP environment variable.

Project manager role

With this update, it's now possible to assign any user as a project manager. This role grants permission to manage project users but does not allow management of backends or resources.

Default permissions

By default, all users can create and manage their own projects. If you want only global admins to create projects, add the following to ~/.dstack/server/config.yml:

default_permissions:
  allow_non_admins_create_projects: false

Other

[Bugfix] Provision AWS instances in all eligible availability zones by @r4victor in dstackai/dstack#1585
[Feature] Support the vendor property under resources.gpu @un-def in dstackai/dstack#1558
[UI] Fix logs appearance in the dark theme by @olgenn in dstackai/dstack#1579
[Docs] Document projects #1547 by @peterschmidt85 in dstackai/dstack#1548
[Internal] Improve gateway auth issues troubleshooting by @jvstme in dstackai/dstack#1569
[Internal] Force root in Kubernetes runs by @jvstme in dstackai/dstack#1555

Full changelog: dstackai/dstack@0.18.10...0.18.11

Contributors

un-def, olgenn, and 3 other contributors

Assets 2

13 Aug 15:20

peterschmidt85

0.18.10-v1

d4ea467

0.18.10-v1

0.18.10

The update includes all the features and bug fixes from version 0.18.10.

Environment variables interpolation

Previously, it wasn't possible to use environment variables to configure credentials for a private Docker registry. With this update, you can now use the following interpolation syntax to avoid hardcoding credentials in the configuration.

type: dev-environment
name: train

env:
  - DOCKER_USER
  - DOCKER_USERPASSWORD

image: dstackai/base:py3.10-0.4-cuda-12.1
registry_auth:
  username: ${{ env.DOCKER_USER }}
  password: ${{ env.DOCKER_USERPASSWORD }}

Network interfaces for port forwarding

When you run a dev environment or a task with dstack apply, it automatically forwards the remote ports to localhost. However, these ports are, by default, bound to 127.0.0.1. If you'd like to make a port available on an arbitrary host, you can now specify the host using the --host option.

For example, this command will make the port available on all network interfaces:

dstack apply --host 0.0.0.0 -f my-task.dstack.yml

Major bugfixes

[Bugfix] Fix http services running on 443 in the logs by @r4victor in dstackai/dstack#1522
[Bugfix] Ensure dstack CLI exits with non-zero exit code on errors by @r4victor in dstackai/dstack#1529
[Bugfix] Forece the use of the root user in custom Docker images by @jvstme in dstackai/dstack#1538
[Bugfix] Update Docker to 27.1.1 in dstack VM images by @jvstme in dstackai/dstack#1536

Other

[Feature] Add --host HOST arg to dstack apply command by @un-def in dstackai/dstack#1531
[Feature] Interpolate env in registry_auth by @r4victor in dstackai/dstack#1540
[Docs] Document the nvcc property by @peterschmidt85 in dstackai/dstack#1526
[Interna] Fix unlocking on transaction rollback by @r4victor in dstackai/dstack#1537
[Internal] Bump base dstack image version to 0.5 by @jvstme in dstackai/dstack#1541

All changes: dstackai/dstack@0.18.9...0.18.10

Contributors

un-def, r4victor, and 2 other contributors

Assets 2

07 Aug 16:13

peterschmidt85

0.18.9-v1

d4ea467

0.18.9-v1

0.18.9

The update includes all the features and bug fixes from version 0.18.9.

Base Docker image with `nvcc`

If you don't specify a custom Docker image, dstack uses its own base image with essential CUDA drivers, python, pip, and conda (Miniforge). Previously, this image didn't include nvcc, needed for compiling custom CUDA kernels (e.g., Flash Attention).

With version 0.18.9, you can now include nvcc.

type: task

python: "3.10"
# This line ensures `nvcc` is included into the base Docker image
nvcc: true

commands:
  - pip install -r requirements.txt
  - python train.py

resources:
  gpu: 24GB

Environment variables for on-prem fleets

When you create an on-prem fleet, it's now possible to pre-configure environment variables. These variables will be used when installing the dstack-shim service on hosts and running workloads.

For example, these environment variables can be used to configure dstack to use a proxy:

type: fleet
name: my-fleet

placement: cluster

env:
- HTTP_PROXY=http://proxy.example.com:80
- HTTPS_PROXY=http://proxy.example.com:80
- NO_PROXY=localhost,127.0.0.1

ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 3.255.177.51
    - 3.255.177.52

Examples

New examples include:

Llama 3.1 recipes for inference and fine-tuning
Spark cluster setup
Ray cluster setup

Other

[Bugifx] Fix filtering offers by disk size by @jvstme in dstackai/dstack#1517
[Bugifx] Run containers as root for all images by @r4victor in dstackai/dstack#1499
[Docs] Document GCP permissions for volumes by @r4victor in dstackai/dstack#1501
[Docs] Another batch of docs improvements #1497 by @peterschmidt85 in dstackai/dstack#1498
[Bugfix] Fix creating TensorDock instances by @jvstme in dstackai/dstack#1506
[Bugfix] Launch TensorDock instances with correct disk size by @jvstme in dstackai/dstack#1508
[Bugfix] Set timeouts to TensorDock API requests by @jvstme in dstackai/dstack#1509
[Docs] Update TensorDock setup instructions by @jvstme in dstackai/dstack#1512
[Internal] Implement API endpoint for listing volumes across projects by @r4victor in dstackai/dstack#1519
[Internal] Include Volume.deleted in the API by @r4victor in dstackai/dstack#1520
[Docs] Update the Axolotl example #1493 by @peterschmidt85 in dstackai/dstack#1494
[Internal] Print docker image pulling errors to shim.log by @jvstme in dstackai/dstack#1503
[Feature] Add env setting to fleet config for on-prem fleets by @un-def in dstackai/dstack#1505

Full changelog: https://github.com/dstackai/dstack/releases/0.18.9

Contributors

un-def, r4victor, and 2 other contributors

Assets 2

01 Aug 15:39

peterschmidt85

0.18.8-v1

c43ace7

0.18.8-v1

0.18.8

The update includes all the features and bug fixes from version 0.18.8.

GCP volumes

Now, volumes are also supported for the gcp backend:

type: volume
name: my-gcp-volume
backend: gcp
region: europe-west1
size: 100GB

Previously, volumes were only supported for aws and runpod.

Major bugfixes

The update fixes a major bug introduced in 0.18.7 that could prevent instances from being terminated in the cloud.

Other

[Docs] Updated Alignment Handbook example by @peterschmidt85 in dstackai/dstack#1475
[Fleets] Ensure on-prem fleets' service is initialized after the network goes online by @un-def in dstackai/dstack#1480
[Fleets] Ensure on-pre, fleets' update the previous configuration by @un-def in dstackai/dstack#1479
[UI] Fixed not-working user token rotation by @r4victor in dstackai/dstack#1487

Full changelog: https://github.com/dstackai/dstack/releases/0.18.8

Contributors

un-def, r4victor, and peterschmidt85

Assets 2

29 Jul 14:12

peterschmidt85

0.18.7-v1

c43ace7

0.18.7-v1

0.18.7

The update brings all the features and bug fixes introduced in version 0.18.7.

Fleets

With fleets, you can now describe clusters declaratively and create them in both cloud and on-prem with a single command. Once a fleet is created, it can be used with dev environments, tasks, and services.

Cloud fleets

To provision a fleet in the cloud, specify the required resources, number of nodes, and other optional parameters.

type: fleet
name: my-fleet
placement: cluster
nodes: 2
resources:
  gpu: 24GB

On-prem fleets

To create a fleet from on-prem servers, specify their hosts along with the user, port, and SSH key for connection via SSH.

type: fleet
name: my-fleet
placement: cluster
ssh_config:
  user: ubuntu
  identity_file: ~/.ssh/id_rsa
  hosts:
    - 3.255.177.51
    - 3.255.177.52

To create or update the fleet, simply call the dstack apply command:

dstack apply -f examples/fleets/my-fleet.dstack.yml

Learn more about fleets in the documentation.

Deprecating `dstack run`

Now that we support dstack apply for gateways, volumes, and fleets, we have extended this support to dev environments, tasks, and services. Instead of using dstack run WORKING_DIR -f CONFIG_FILE, you can now use dstack apply -f CONFIG_FILE.

Also, it's now possible to specify a name for dev environments, tasks, and services, just like for gateways, volumes, and fleets.

type: dev-environment
name: my-ide

python: "3.11"

ide: vscode

resources:
  gpu: 80GB

This name is used as a run name and is more convenient than a random name. However, if you don't specify a name, dstack will assign a random name as before.

Major bugfixes

Important

This update fixes the broken kubernetes backend, which has been non-functional since a few previous updates.

Other

[UX] Make --gpu override YAML's gpu by @r4victor in dstackai/dstack#1455
dstackai/dstack#1431
[Performance] Speed up listing runs for Python API and CLI by @r4victor in dstackai/dstack#1430
[Performance] Speed up project loading by @r4victor in dstackai/dstack#1425
[Bugfix] Remove busy offers from the top of offers list by @jvstme in dstackai/dstack#1452
[Bugfix] Prioritize cheaper offers from the pool by @jvstme in dstackai/dstack#1453
[Bugfix] Fix spot offers suggested for on-demand dev envs by @jvstme in dstackai/dstack#1450
[Feature] Implement dstack volume delete by @r4victor in dstackai/dstack#1434
[UX] Instances were always shown as provisioning for container backends by @r4victor in * [Docs] Fix typos by @jvstme in dstackai/dstack#1426
[Docs] Fix a bad link by @tamanobi in dstackai/dstack#1422
[Internal] Add DSTACK_SENTRY_PROFILES_SAMPLE_RATE by @r4victor in dstackai/dstack#1428
[Internal] Update ruff to 0.5.3 by @jvstme in dstackai/dstack#1421
[Internal] Update GitHub Actions dependencies by @jvstme in dstackai/dstack#1436
[UX] Make --gpu override YAML's gpu: by @r4victor in dstackai/dstack#1455
[Bugfix] Respect regions for runpod by @r4victor in dstackai/dstack#1460

Full changelog: 0.18.7

Contributors

tamanobi, r4victor, and jvstme

Assets 2

Releases: dstackai/dstack-enterprise

0.18.16-v1

0.18.16

New versioning policy

dstack attach

CloudWatch-related bugfixes

Deprecations

What's Changed

Contributors

0.18.15-v1

0.18.15

Cluster placement groups

On-prem and VM-based fleets improvements

Major bug fixes

Changelog

Contributors

0.18.14-v1

0.18.14

Multi-replica server deployment

Major bug-fixes

Other

Contributors

0.18.13-v1

0.18.13

Windows

Spot policy

Troubleshooting

Changelog

Contributors

0.18.12-v1

0.18.12

Features

Major bugfixes

Other changes

Contributors

0.18.11-v1

0.18.11

AMD

GPU vendors

Encryption

Storing logs in AWS CloudWatch

Project manager role

Default permissions

Other

Contributors

0.18.10-v1

0.18.10

Environment variables interpolation

Network interfaces for port forwarding

Major bugfixes

Other

Contributors

0.18.9-v1

0.18.9

Base Docker image with nvcc

Environment variables for on-prem fleets

Examples

Other

Contributors

0.18.8-v1

0.18.8

GCP volumes

Major bugfixes

Other

Contributors

0.18.7-v1

0.18.7

Fleets

Cloud fleets

On-prem fleets

Deprecating dstack run

Major bugfixes

Other

Contributors

Base Docker image with `nvcc`

Deprecating `dstack run`