Skip to content

Releases: run-house/runhouse

v0.0.37

12 Dec 00:41
aa92087
Compare
Choose a tag to compare

Highlights

⚠️⚠️ Envs are no longer supported ⚠️⚠️

The Runhouse Env class (rh.Env or rh.env) is no longer supported. Instead, we introduce the concept of a process to handle running on a specific worker, and the Image replaces the default_env of a cluster, specifying any setup steps to take on the cluster.

Process

To specify env vars, compute, conda envs, or anything necessary to run on a specific process, you can create an process through the cluster.ensure_cluster_process() function

Instead of:

env = rh.env(name="my_env", reqs=["numpy, "pandas"], env_vars=MY_ENV_VARS, compute={"CPU": 0.5})
rh.function(local_fn).to(cluster, env=env)

You can do:

cluster.install_packages(["numpy, "pandas"])
process = cluster.ensure_process_created("my_process", env_vars=MY_ENV_VARS, compute={"CPU": 0.5})
rh.function(local_fn).to(cluster, process=process)

Image

Image is a new primitive that makes it easier to specify environment and setup steps to start the cluster with. It replaces the previous default_env cluster argument. It supports installing packages, running commands, creating a conda env, loading from a machine/docker image, and more.

my_image = (
    rh.Image(name="base_image")
    .setup_conda_env(
        conda_env_name="base_env",
        conda_yaml={"dependencies": ["python=3.11"], "name": "base_env"},
    )
    .install_packages(["numpy", "pandas"])
    .set_env_vars({"OMP_NUM_THREADS": 1})
)
cluster = rh.cluster("rh-cpu", instance_type="CPU:2+", image=my_image)
cluster.up_if_not()

Den Launcher

Now you can manage clusters (up, teardown, and status checks) via the Runhouse launcher / control plane. This is especially useful for cases where a local sky database would not be available, like upping a cluster in distributed workflows. Set launcher="den" in the cluster factory or include launcher: den in your local ~/.rh/config.yaml to use this feature.

You can also track cluster status, memory consumption, and view logs for Den launched clusters in the resources dashboard.

New Features

  • introduce DockerRegistrySecret for pulling from private or cloud provider registries

Updates

  • rename cluster launched_properties as compute_properties, and save generated internal and external ips in there
  • turn autosave to False by default, configurable in your rh config
  • allow cluster name to be passed into runhouse server cli commands
  • add rh cli alias

Bugfixes

  • cluster factory to properly handle differences when loading ints and autostop_mins

Examples

  • updated examples to remove env dependency, and follow the new process/Image flow

v0.0.36

13 Nov 15:04
ddb9dee
Compare
Choose a tag to compare

Highlights

Enrich CLI commands (and corresponding python APIs) for interacting with Runhouse clusters.

runhouse cluster: list, down, up, keep-warm, logs, status, ssh

runhouse server: start, restart, stop, status

New Features

Improvements

  • Cluster to reuse secrets keys instead of generating new secrets per cluster (#1338, #1344)
  • Log streaming for nested and multinode (#1375, #1377)

Bugfixes

  • Cluster reloading fixes (#1290, #1291)
  • Don't refresh when initializing on-demand clusters via Sky (#1258)
  • Fix notebook support (#1390)
  • Fix multinode K8s (#1376)

Deprecations

  • Python 3.7 support (#1281)
  • Replacing cluster num_instances with num_nodes (#1380, #1405)
  • Replace cluster address with head_ip (#1370)

Build

  • Pin skypilot to 0.7.0 for faster cluster start times

Examples

  • Running Flux1 Schnell on AWS EC2 (#1275)
  • Distributed Examples
    • Distributed Pool (#1280)
    • Pytorch HPO and ResNet (#1378)
    • Dask LGBM Train (#1379)

v0.0.35

18 Sep 03:52
d41bece
Compare
Choose a tag to compare

Quick release to fix autostop loop for ondemand clusters (#1262). Also, adding Python 3.12 support and dropping Python 3.7 support (#1074).

v0.0.34

12 Sep 11:45
4b5a6b3
Compare
Choose a tag to compare

Highlights

This release expands reporting and monitoring of cluster status, cloud properties, and utilization metrics (via runhouse status and the Den UI), and improves mutli-node and multi-cluster support broadly.

Improvements

Bugfixes

  • Install from explicit dest_path instead of relative path. by @rohinb2 in #1238
  • Temporary fix for pydantic version error with FastAPI. by @rohinb2 in #1237
  • Run ssh commands for docker clusters via ssh + docker exec as opposed to SSH Proxy. by @rohinb2 in (#1235, #1251)

Deprecations

Examples

Full Changelog: 0.0.33...v0.0.34

v0.0.33

04 Sep 17:42
bbbd641
Compare
Choose a tag to compare

Highlights

Docker dev & prod workflows

We released a new guide showing how you can use the same docker image on a Runhouse cluster for both local development and production.

Local Development: We now support passing in requirements to the Runhouse env object instead of preemptively sending the env to the function, ensuring that local changes are synced over.

dev_env = rh.env(name="dev_env", env_vars={"HF_TOKEN": "****"}, reqs=["diffusers", "transformers"])

# we pass in the env object to ensure any requirements specified in the env will be synced onto the cluster properly
dev_fn = rh.function(is_transformers_available).to(cluster, env=dev_env)

Production: The docker image holds all the packages and versions we want, ensuring reproducibility and reduce ingress costs of installing dependencies on each production cluster.

Improvements

  • If no package version is specified, install the same version that is installed locally (#1058)
  • Convert folder to a Module (#995)
  • Support constructing an empty env with only a name (#1095)
  • Support loading an on-demand cluster from name locally in addition to Den (#1093)
  • Add option to skip loading resources from Den (#1096)
  • Update client connection flow to reduce check server calls (#1098, #1141)
  • Updating Folder APIs when interacting with a cluster (#1147, #1116, #1108)
  • Add launched properties to ondemand cluster config (#1139)
  • Stream logs in SSH setup commands (#1211)
  • Logging updates and improvements (#1177, #1178, #1204)
  • Update command running for Kubernetes, and support Docker images with Kubernetes clusters (#1173, #1174)

Clean up

  • Remove sshfs dependency (#1129)
  • Convert cluster rsync method to public (#1082)

Bug Fixes

  • Remove callable param from Folder APIs (#1101)
  • Fix folder construction in the Package (#1083)
  • importlib support for Python 3.7 (#1073)
  • Fix conda installation on cluster (#1176)
  • Have modules created in an env be put within that env (#1194)

Deprecations

Deprecate no longer maintained resources and functionality

Examples

  • Docker workflow guide (#1094)
  • FastAPI RAG app with custom vector embeddings (#1118)
  • Torch image classification example (#1086)
  • Airflow Torch training with Runhouse (#1086)
  • LoRA fine-tuning from a notebook example (#1086)

v0.0.32

26 Jul 08:51
Compare
Choose a tag to compare

Highlights

Changes to reqs, packages, and env installation

More explicit sycning behavior between local and remote development, and more flexible support for package installations on the cluster.

working_dir env argument
Previously, you could specify a "working directory" as part of your Runhouse env to sync over to a cluster. By default, this working directory is based on the folder that you are running your Runhouse Python code from (perhaps totally disconnected from the location of the coe you're sending over). This behavior changes in this release, and we now instead sync over the package in which the local class or function being sent to the cluster belongs. This eliminates unexpected sharp edges users encountered with working_dir setting, and we look to deprecate working_dir sometime soon (though this should not be disruptive). To specify folders to sync, one can directly pass local paths into the reqs of the Runhouse env.

env = rh.env(
            name="env_with_local_package",
            reqs=["pandas", "torch", "~/path/to/package"],
    )

Local path detection
When sending over a function or module that imports local code to a remote cluster, we now automatically detect [1] the package on the filesystem corresponding to that code, add it to the env requirements, and sync it over to the cluster.

Local package detection
You can now pass the name of a local Python package, whether installed from source or editably, and we'll detect the correct directory to sync over. This is important for teams collaborating on shapred repos, who may not have their clones saved in the same place.

Increased package installation support
Pip install will now work properly for local folders, in addition to package strings. Including a requirement that is installed locally will also sync appropriately to the cluster.

  • Pip install method will correctly work for local folders as well (#998)
  • Importing from a given file will now detect [1] a package on the filesystem to sync based on that file, add that to the envs reqs, and then sync it (#914)

Updated Connection Flow

Improvements to detecting and checking the HTTP server connection. Improved and faster approach to fallbacks and retries when unable to reach the client. As part of the connection refactor, ondemand clusters are no longer automatically brought up when there is a call made to/on it. Rather, one needs to manually call up() or up_if_not() to bring up the cluster if it is not already running.

Deprecations

Remove support and clean up docs for features that are no longer maintained. See release notes below for complete list.

Release Notes

Improvements

Reqs, Packages & Envs Revamp

  • Automatically add module pointers and reqs to env if they don't exist (#914, #968)
  • Change pointers to store absolute path (#989)
  • Insert at the beginning of sys.path for packages (#990)
  • Including a req that is installed locally will now sync appropriately (#997)
  • Support pip installing folders (#998)

Connection Flow

  • Fix tunnel hanging at termination (#957, #959)
  • Convert cluster client to a property (#975)
  • Add wrapper for running client methods (#981)
  • Move check server logic into cluster client call (#982, #983)
  • Handle Ray start in runhouse restart more reliably (#1059)

Other

  • Expand supported Ray versions (#720)
  • Handle SSH secret syncing in login (#887)
  • Add log support for clusters with port forwarding (#928)
  • Remove use of pexpect from password cluster login (#940)
  • Improve and wrap conda install command on the cluster (#954)
  • Add runhouse ssh support to local unsaved clusters (#1012)
  • Check IPs for ondemand clusters when loading from Den (#1018)
  • Add memory percent to cluster status (#1030)
  • Don’t log full config (#1063)

Bug Fixes

  • Propagate self.env if it was set for a module (#916)
    • Previously, if you pass reqs to a resource constructor as a list, it would get wiped when actually running .to
  • Update conda env run commands to handle bash commands (#920)
  • Update config comparison check in cluster factory (#955)
    • Due to string v int mismatch in ips, sometimes cluster would reconstruct rather than load from Den
  • Quote pip installs correctly (#1032)
    • Previously, package strings like numpy==1.24.6 wouldnt install correctly due to parsing
  • Pass through missing k8s cluster factory args (#1067)

BC-Breaking

Package & Envs

  • Locate working dir in breath first approach (#908)
    • Search for local of module based on it's file location, rather than runhouse or current working directory
  • Remove default working dir from \. to None (#915)
    • Working directory is no longer synced over by default when sending to a cluster. Instead, it can be specified by passing in the corresponding Package in the env reqs
  • Selectively sync runhouse for default restart behavior (#1020)
    • Only resync runhouse during restart server if it is a local editable package, or if it is explicitly set to True

cluster.run()

  • Disallow passing node when on cluster (#946)
  • Remove run name support for running via ssh (#936)
  • Remove port_forward argument (#934)

Deprecations

  • Remove telemetry APIs and otel requirements (#986)
  • Remove KVstore, mapper, tables, and queue (#994)
  • Remove Run from docs (#1011)
  • Deprecate system arg from function and module factories

Other

  • No longer automatically up an ondemand cluster when performing a call on it
    • Previously, when making a call to or on an ondemand cluster that is not already running, Runhouse automatically brings up the cluster. This behavior changes in this release, and one will need to add run cluster.up() or cluster.up_if_not() to start the ondemand cluster.

Doc & Examples

  • Add llama 3 fine tuning example (#939, #942)
  • Update module API example (#950)
  • Update architecture overview (#1036)
  • Add up_if_not() in examples and tutorials (#999, #1009)
  • Minor fixes in docs (#1008, #1040, #1057)

[1] The detection algorithm for what to sync when you have an importable function or module is as follows:

  • Start with the file that contains this importable function or module on the filesystem
  • Go up directories till you find any of the following config files ".git", "setup.py", "setup.cfg", "pyproject.toml", "requirements.txt"

v0.0.31

09 Jul 20:28
Compare
Choose a tag to compare

Quick release to allow passing Sky kwargs to ondemand_cluster (#978).

Full Changelog: v0.0.30...v0.0.31

v0.0.30

18 Jun 21:52
916c60a
Compare
Choose a tag to compare

Highlights

[Alpha] On-demand Docker Clusters

This release adds support for using a base Docker image in conjunction with an on-demand cluster. By specifying the image_id field in the format docker:<registry>/<image>:<tag> in the cluster factory, the corresponding Docker container will be downloaded when the cluster is launched. The Runhouse server is then started inside the Docker container, ensuring that anything that goes through Runhouse will be run inside the container environment.

For more information on usage, such as setting up environment variables for using private Docker registries, please refer to the User Guide.

docker_cluster = rh.ondemand_cluster(
    name="pytorch_cluster",
    image_id="docker:nvcr.io/nvidia/pytorch:23.10-py3",
    instance_type="CPU:2+",
    provider="aws",
)
docker_cluster.up_if_not()

New Features

  • Docker cluster support (#803, #852, #830, #905)
  • Add support for running in a marimo notebook (#892)

Bug Fixes

  • Handle string system for package .to (#875)
  • Properly save config.yaml for default conda env cluster (#910)
  • Minor fixes (#874)

v0.0.29

17 Jun 23:00
Compare
Choose a tag to compare

Highlights

This release improves autostop stability and robustness considerably, and introduces the ability to send an env or module to a specific node in a multinode cluster.

Improvements

Bugfixes

  • [bug] Make disable_den_auth actually sync. by @rohinb2 in #865
  • Move config.yaml creation to restart server() by @BelSasha in #868
  • Bump SkyPilot Version to 0.6.0 and fix remote SkyPilot dependencies on Start by @dongreenberg in #855
  • Consolidate periodic loops into one function updating Den and updating autostop. by @rohinb2 in #873
  • Fix cluster factory bug with den_auth clusters not being saved. by @rohinb2 in #878
  • Remove resource conversion check for secrets by @carolineechen in #881

Docs

Testing

Full Changelog: v0.0.28...v0.0.29

v0.0.28

30 May 18:47
61da772
Compare
Choose a tag to compare

Highlights

runhouse status: Improving visibility into cluster utilization and memory consumption

Improved Cluster Status

Runhouse now provides a more comprehensive view of the cluster's utilization and memory consumption, providing more coverage over the true utilization numbers across each worker and head node of a the cluster.

Information surfaced includes: PID, CPU utilization, memory consumption, and GPU utilization (where relevant).
This data can be viewed as part of the runhouse status CLI command:

GPU Cluster

>> runhouse status
/sashab/rh-basic-gpu
😈 Runhouse Daemon is running 🏃
Runhouse v0.0.28
server pid: 29486server port: 32300den auth: Trueserver connection type: sshbackend config:
  • resource subtype: OnDemandCluster
  • use local telemetry: False
  • domain: None
  • server host: 0.0.0.0
  • ips: ['35.171.157.49']
  • resource subtype: OnDemandCluster
  • autostop mins: autostop disabled
Serving 🍦 :
• _cluster_default_env (runhouse.Env)
  This environment has only python packages installed, if such provided. No resources were found.
• np_pd_env (runhouse.Env) | pid: 29672 | node: head (35.171.157.49)
  CPU: 0.0% | Memory: 0.13 / 16 Gb (0.85%)
  • /sashab/summer (runhouse.Function)
  • mult (runhouse.Function)
• sd_env (runhouse.Env) | pid: 29812 | node: head (35.171.157.49)
  CPU: 1.0% | Memory: 4.47 / 16 Gb (28.95%)
  GPU: 0.0% | Memory: 6.89 / 23 Gb (29.96%)
  • sd_generate (runhouse.Function)

CPU cluster

>> runhouse status

/sashab/rh-basic-cpu
😈 Runhouse Daemon is running 🏃
Runhouse v0.0.28
server pid: 29395server port: 32300den auth: Trueserver connection type: sshbackend config:
  • resource subtype: OnDemandCluster
  • use local telemetry: False
  • domain: None
  • server host: 0.0.0.0
  • ips: ['52.207.212.159']
  • resource subtype: OnDemandCluster
  • autostop mins: autostop disabled
Serving 🍦 :
• _cluster_default_env (runhouse.Env)
  This environment has only python packages installed, if such provided. No resources were found.
• sd_env (runhouse.Env) | pid: 29716 | node: head (52.207.212.159)
  CPU: 0.0% | Memory: 0.13 / 8 Gb (1.65%)
  This environment has only python packages installed, if such provided. No resources were found.
• np_pd_env (runhouse.Env) | pid: 29578 | node: head (52.207.212.159)
  CPU: 0.0% | Memory: 0.13 / 8 Gb (1.71%)
  • /sashab/summer (runhouse.Function)
  • mult (runhouse.Function)

Improvements

  • Cluster status displays additional information. (#653)
  • Polling den with cluster status data (#806)
  • Prevent exposing user Runhouse API tokens on the cluster by saving a modified hashed API token (#797)
  • Use env vars in default env creation (#798)
  • Login flow improvements (#796)

Bug Fixes

  • Fix undefined path when pip installing a folder (#826)
  • Don't pass basic auth to password cluster HTTP calls (#823)
  • Fix env installations that contain a provider secret (#822)
  • Refresh sys.path upon loading a new module (#818)

Docs & Examples

  • Update domain docs (#812)
  • Add default env section to envs tutorial (#810)
  • Minor improvements to parallel embedding example (#795)