12 Dec 00:41

jlewitt1

aa92087

v0.0.37 Latest

Latest

Highlights

⚠️⚠️ Envs are no longer supported ⚠️⚠️

The Runhouse Env class (rh.Env or rh.env) is no longer supported. Instead, we introduce the concept of a process to handle running on a specific worker, and the Image replaces the default_env of a cluster, specifying any setup steps to take on the cluster.

Process

To specify env vars, compute, conda envs, or anything necessary to run on a specific process, you can create an process through the cluster.ensure_cluster_process() function

Instead of:

env = rh.env(name="my_env", reqs=["numpy, "pandas"], env_vars=MY_ENV_VARS, compute={"CPU": 0.5})
rh.function(local_fn).to(cluster, env=env)

You can do:

cluster.install_packages(["numpy, "pandas"])
process = cluster.ensure_process_created("my_process", env_vars=MY_ENV_VARS, compute={"CPU": 0.5})
rh.function(local_fn).to(cluster, process=process)

Image

Image is a new primitive that makes it easier to specify environment and setup steps to start the cluster with. It replaces the previous default_env cluster argument. It supports installing packages, running commands, creating a conda env, loading from a machine/docker image, and more.

my_image = (
    rh.Image(name="base_image")
    .setup_conda_env(
        conda_env_name="base_env",
        conda_yaml={"dependencies": ["python=3.11"], "name": "base_env"},
    )
    .install_packages(["numpy", "pandas"])
    .set_env_vars({"OMP_NUM_THREADS": 1})
)
cluster = rh.cluster("rh-cpu", instance_type="CPU:2+", image=my_image)
cluster.up_if_not()

Den Launcher

Now you can manage clusters (up, teardown, and status checks) via the Runhouse launcher / control plane. This is especially useful for cases where a local sky database would not be available, like upping a cluster in distributed workflows. Set launcher="den" in the cluster factory or include launcher: den in your local ~/.rh/config.yaml to use this feature.

You can also track cluster status, memory consumption, and view logs for Den launched clusters in the resources dashboard.

New Features

introduce DockerRegistrySecret for pulling from private or cloud provider registries

Updates

rename cluster launched_properties as compute_properties, and save generated internal and external ips in there
turn autosave to False by default, configurable in your rh config
allow cluster name to be passed into runhouse server cli commands
add rh cli alias

Bugfixes

cluster factory to properly handle differences when loading ints and autostop_mins

Examples

updated examples to remove env dependency, and follow the new process/Image flow

Assets 2

13 Nov 15:04

jlewitt1

v0.0.36

ddb9dee

v0.0.36

Highlights

Enrich CLI commands (and corresponding python APIs) for interacting with Runhouse clusters.

runhouse cluster: list, down, up, keep-warm, logs, status, ssh

runhouse server: start, restart, stop, status

New Features

Cluster list support (#1225, #1227, #1231, #1233, #1245)
Runhouse cluster & server CLI support (#1268, #1301)
Default ssh key to use for clusters (#1357, #1358, #1359, #1365)
Kubeconfig secret (#1346)
Distributed Pool - runhouse, Ray, PyTorch, and Dask (#1304, #1305, #1378, #1379)

Improvements

Cluster to reuse secrets keys instead of generating new secrets per cluster (#1338, #1344)
Log streaming for nested and multinode (#1375, #1377)

Bugfixes

Cluster reloading fixes (#1290, #1291)
Don't refresh when initializing on-demand clusters via Sky (#1258)
Fix notebook support (#1390)
Fix multinode K8s (#1376)

Deprecations

Python 3.7 support (#1281)
Replacing cluster num_instances with num_nodes (#1380, #1405)
Replace cluster address with head_ip (#1370)

Build

Pin skypilot to 0.7.0 for faster cluster start times

Examples

Running Flux1 Schnell on AWS EC2 (#1275)
Distributed Examples
- Distributed Pool (#1280)
- Pytorch HPO and ResNet (#1378)
- Dask LGBM Train (#1379)

Assets 2

18 Sep 03:52

carolineechen

v0.0.35

d41bece

v0.0.35

Quick release to fix autostop loop for ondemand clusters (#1262). Also, adding Python 3.12 support and dropping Python 3.7 support (#1074).

Assets 2

12 Sep 11:45

dongreenberg

v0.0.34

4b5a6b3

v0.0.34

Highlights

This release expands reporting and monitoring of cluster status, cloud properties, and utilization metrics (via runhouse status and the Den UI), and improves mutli-node and multi-cluster support broadly.

Improvements

Cluster status, cloud info, and utilization reporting improvements (#1209, #1224, #1226, #1197, #1242, #1246, #1249)
Install default_env on all nodes (#1240)
Introduce cluster.a_up to parallelize or launch clusters asyncronously by @dongreenberg in #1247

Bugfixes

Install from explicit dest_path instead of relative path. by @rohinb2 in #1238
Temporary fix for pydantic version error with FastAPI. by @rohinb2 in #1237
Run ssh commands for docker clusters via ssh + docker exec as opposed to SSH Proxy. by @rohinb2 in (#1235, #1251)

Deprecations

Deprecate function.send_secrets (#1091)
Deprecate some old unused function methods by @dongreenberg in #1092
remove sagemaker github action by @jlewitt1 in #1244

Examples

Add Multicloud Airflow example by @py-rh in #1219

Full Changelog: 0.0.33...v0.0.34

Contributors

dongreenberg, jlewitt1, and 2 other contributors

Assets 2

04 Sep 17:42

jlewitt1

v0.0.33

bbbd641

v0.0.33

Highlights

Docker dev & prod workflows

We released a new guide showing how you can use the same docker image on a Runhouse cluster for both local development and production.

Local Development: We now support passing in requirements to the Runhouse env object instead of preemptively sending the env to the function, ensuring that local changes are synced over.

dev_env = rh.env(name="dev_env", env_vars={"HF_TOKEN": "****"}, reqs=["diffusers", "transformers"])

# we pass in the env object to ensure any requirements specified in the env will be synced onto the cluster properly
dev_fn = rh.function(is_transformers_available).to(cluster, env=dev_env)

Production: The docker image holds all the packages and versions we want, ensuring reproducibility and reduce ingress costs of installing dependencies on each production cluster.

Improvements

If no package version is specified, install the same version that is installed locally (#1058)
Convert folder to a Module (#995)
Support constructing an empty env with only a name (#1095)
Support loading an on-demand cluster from name locally in addition to Den (#1093)
Add option to skip loading resources from Den (#1096)
Update client connection flow to reduce check server calls (#1098, #1141)
Updating Folder APIs when interacting with a cluster (#1147, #1116, #1108)
Add launched properties to ondemand cluster config (#1139)
Stream logs in SSH setup commands (#1211)
Logging updates and improvements (#1177, #1178, #1204)
Update command running for Kubernetes, and support Docker images with Kubernetes clusters (#1173, #1174)

Clean up

Remove sshfs dependency (#1129)
Convert cluster rsync method to public (#1082)

Bug Fixes

Remove callable param from Folder APIs (#1101)
Fix folder construction in the Package (#1083)
importlib support for Python 3.7 (#1073)
Fix conda installation on cluster (#1176)
Have modules created in an env be put within that env (#1194)

Deprecations

Deprecate no longer maintained resources and functionality

Table (#1216)
SageMaker Cluster (#1222)
Provenance and Runs (#1221)
File and Blob (#1213)

Examples

Docker workflow guide (#1094)
FastAPI RAG app with custom vector embeddings (#1118)
Torch image classification example (#1086)
Airflow Torch training with Runhouse (#1086)
LoRA fine-tuning from a notebook example (#1086)

Assets 2

26 Jul 08:51

carolineechen

v0.0.32

3ad35d4

v0.0.32

Highlights

Changes to reqs, packages, and env installation

More explicit sycning behavior between local and remote development, and more flexible support for package installations on the cluster.

working_dir env argument
Previously, you could specify a "working directory" as part of your Runhouse env to sync over to a cluster. By default, this working directory is based on the folder that you are running your Runhouse Python code from (perhaps totally disconnected from the location of the coe you're sending over). This behavior changes in this release, and we now instead sync over the package in which the local class or function being sent to the cluster belongs. This eliminates unexpected sharp edges users encountered with working_dir setting, and we look to deprecate working_dir sometime soon (though this should not be disruptive). To specify folders to sync, one can directly pass local paths into the reqs of the Runhouse env.

env = rh.env(
            name="env_with_local_package",
            reqs=["pandas", "torch", "~/path/to/package"],
    )

Local path detection
When sending over a function or module that imports local code to a remote cluster, we now automatically detect [1] the package on the filesystem corresponding to that code, add it to the env requirements, and sync it over to the cluster.

Local package detection
You can now pass the name of a local Python package, whether installed from source or editably, and we'll detect the correct directory to sync over. This is important for teams collaborating on shapred repos, who may not have their clones saved in the same place.

Increased package installation support
Pip install will now work properly for local folders, in addition to package strings. Including a requirement that is installed locally will also sync appropriately to the cluster.

Pip install method will correctly work for local folders as well (#998)
Importing from a given file will now detect [1] a package on the filesystem to sync based on that file, add that to the envs reqs, and then sync it (#914)

Updated Connection Flow

Improvements to detecting and checking the HTTP server connection. Improved and faster approach to fallbacks and retries when unable to reach the client. As part of the connection refactor, ondemand clusters are no longer automatically brought up when there is a call made to/on it. Rather, one needs to manually call up() or up_if_not() to bring up the cluster if it is not already running.

Deprecations

Remove support and clean up docs for features that are no longer maintained. See release notes below for complete list.

Release Notes

Improvements

Reqs, Packages & Envs Revamp

Automatically add module pointers and reqs to env if they don't exist (#914, #968)
Change pointers to store absolute path (#989)
Insert at the beginning of sys.path for packages (#990)
Including a req that is installed locally will now sync appropriately (#997)
Support pip installing folders (#998)

Connection Flow

Fix tunnel hanging at termination (#957, #959)
Convert cluster client to a property (#975)
Add wrapper for running client methods (#981)
Move check server logic into cluster client call (#982, #983)
Handle Ray start in runhouse restart more reliably (#1059)

Other

Expand supported Ray versions (#720)
Handle SSH secret syncing in login (#887)
Add log support for clusters with port forwarding (#928)
Remove use of pexpect from password cluster login (#940)
Improve and wrap conda install command on the cluster (#954)
Add runhouse ssh support to local unsaved clusters (#1012)
Check IPs for ondemand clusters when loading from Den (#1018)
Add memory percent to cluster status (#1030)
Don’t log full config (#1063)

Bug Fixes

Propagate self.env if it was set for a module (#916)
- Previously, if you pass reqs to a resource constructor as a list, it would get wiped when actually running .to
Update conda env run commands to handle bash commands (#920)
Update config comparison check in cluster factory (#955)
- Due to string v int mismatch in ips, sometimes cluster would reconstruct rather than load from Den
Quote pip installs correctly (#1032)
- Previously, package strings like numpy==1.24.6 wouldnt install correctly due to parsing
Pass through missing k8s cluster factory args (#1067)

BC-Breaking

Package & Envs

Locate working dir in breath first approach (#908)
- Search for local of module based on it's file location, rather than runhouse or current working directory
Remove default working dir from \. to None (#915)
- Working directory is no longer synced over by default when sending to a cluster. Instead, it can be specified by passing in the corresponding Package in the env reqs
Selectively sync runhouse for default restart behavior (#1020)
- Only resync runhouse during restart server if it is a local editable package, or if it is explicitly set to True

cluster.run()

Disallow passing node when on cluster (#946)
Remove run name support for running via ssh (#936)
Remove port_forward argument (#934)

Deprecations

Remove telemetry APIs and otel requirements (#986)
Remove KVstore, mapper, tables, and queue (#994)
Remove Run from docs (#1011)
Deprecate system arg from function and module factories

Other

No longer automatically up an ondemand cluster when performing a call on it
- Previously, when making a call to or on an ondemand cluster that is not already running, Runhouse automatically brings up the cluster. This behavior changes in this release, and one will need to add run cluster.up() or cluster.up_if_not() to start the ondemand cluster.

Doc & Examples

Add llama 3 fine tuning example (#939, #942)
Update module API example (#950)
Update architecture overview (#1036)
Add up_if_not() in examples and tutorials (#999, #1009)
Minor fixes in docs (#1008, #1040, #1057)

[1] The detection algorithm for what to sync when you have an importable function or module is as follows:

Start with the file that contains this importable function or module on the filesystem
Go up directories till you find any of the following config files ".git", "setup.py", "setup.cfg", "pyproject.toml", "requirements.txt"

Assets 2

09 Jul 20:28

dongreenberg

v0.0.31

6861751

v0.0.31

Quick release to allow passing Sky kwargs to ondemand_cluster (#978).

Full Changelog: v0.0.30...v0.0.31

Assets 2

18 Jun 21:52

carolineechen

v0.0.30

916c60a

v0.0.30

Highlights

[Alpha] On-demand Docker Clusters

This release adds support for using a base Docker image in conjunction with an on-demand cluster. By specifying the image_id field in the format docker:<registry>/<image>:<tag> in the cluster factory, the corresponding Docker container will be downloaded when the cluster is launched. The Runhouse server is then started inside the Docker container, ensuring that anything that goes through Runhouse will be run inside the container environment.

For more information on usage, such as setting up environment variables for using private Docker registries, please refer to the User Guide.

docker_cluster = rh.ondemand_cluster(
    name="pytorch_cluster",
    image_id="docker:nvcr.io/nvidia/pytorch:23.10-py3",
    instance_type="CPU:2+",
    provider="aws",
)
docker_cluster.up_if_not()

New Features

Docker cluster support (#803, #852, #830, #905)
Add support for running in a marimo notebook (#892)

Bug Fixes

Handle string system for package .to (#875)
Properly save config.yaml for default conda env cluster (#910)
Minor fixes (#874)

Assets 2

17 Jun 23:00

dongreenberg

v0.0.29

670bf1a

v0.0.29

Highlights

This release improves autostop stability and robustness considerably, and introduces the ability to send an env or module to a specific node in a multinode cluster.

Improvements

Simplify and improve Autostop by @rohinb2 and @dongreenberg in #895, #894
Send env to a specific node_idx. by @rohinb2 in #835
Update secrets login flow to be more opt-in by @carolineechen in #880
Show information about active function calls in cluster.status() by @rohinb2 in #871 and #896

Bugfixes

[bug] Make disable_den_auth actually sync. by @rohinb2 in #865
Move config.yaml creation to restart server() by @BelSasha in #868
Bump SkyPilot Version to 0.6.0 and fix remote SkyPilot dependencies on Start by @dongreenberg in #855
Consolidate periodic loops into one function updating Den and updating autostop. by @rohinb2 in #873
Fix cluster factory bug with den_auth clusters not being saved. by @rohinb2 in #878
Remove resource conversion check for secrets by @carolineechen in #881

Docs

Clarify setup in docs and den quick start by @mkandler in #876
Update status docs by @BelSasha in #889
Llama 3 vLLM GCP example by @mkandler in #893
Fix bug in starting example code block by @mkandler in #884
Adds quotes to pip install in examples by @mkandler in #886
Update secrets login in api tutorial by @carolineechen in #882

Testing

Update multinode cluster fixtures. by @rohinb2 in #856
minor changes to cluster status tests by @BelSasha in #891
Group status tests together by @dongreenberg in #899
Reorganzize default env tests and consolidate fixture into GCP fixture by @dongreenberg in #900
Stop overwriting local dotenv in tests. by @dongreenberg in #901
Consolidate static cluster fixtures into one by @dongreenberg in #902
Change AutostopServlet into AutostopHelper, and test properly by @dongreenberg in #897
cluster status scheduler tests by @BelSasha in #869

Full Changelog: v0.0.28...v0.0.29

Contributors

mkandler, dongreenberg, and 3 other contributors

Assets 4

30 May 18:47

jlewitt1

v0.0.28

61da772

v0.0.28

Highlights

runhouse status: Improving visibility into cluster utilization and memory consumption

Improved Cluster Status

Runhouse now provides a more comprehensive view of the cluster's utilization and memory consumption, providing more coverage over the true utilization numbers across each worker and head node of a the cluster.

Information surfaced includes: PID, CPU utilization, memory consumption, and GPU utilization (where relevant).
This data can be viewed as part of the runhouse status CLI command:

GPU Cluster

>> runhouse status
/sashab/rh-basic-gpu
😈 Runhouse Daemon is running 🏃
Runhouse v0.0.28
server pid: 29486
• server port: 32300
• den auth: True
• server connection type: ssh
• backend config:
  • resource subtype: OnDemandCluster
  • use local telemetry: False
  • domain: None
  • server host: 0.0.0.0
  • ips: ['35.171.157.49']
  • resource subtype: OnDemandCluster
  • autostop mins: autostop disabled
Serving 🍦 :
• _cluster_default_env (runhouse.Env)
  This environment has only python packages installed, if such provided. No resources were found.
• np_pd_env (runhouse.Env) | pid: 29672 | node: head (35.171.157.49)
  CPU: 0.0% | Memory: 0.13 / 16 Gb (0.85%)
  • /sashab/summer (runhouse.Function)
  • mult (runhouse.Function)
• sd_env (runhouse.Env) | pid: 29812 | node: head (35.171.157.49)
  CPU: 1.0% | Memory: 4.47 / 16 Gb (28.95%)
  GPU: 0.0% | Memory: 6.89 / 23 Gb (29.96%)
  • sd_generate (runhouse.Function)

CPU cluster

>> runhouse status

/sashab/rh-basic-cpu
😈 Runhouse Daemon is running 🏃
Runhouse v0.0.28
server pid: 29395
• server port: 32300
• den auth: True
• server connection type: ssh
• backend config:
  • resource subtype: OnDemandCluster
  • use local telemetry: False
  • domain: None
  • server host: 0.0.0.0
  • ips: ['52.207.212.159']
  • resource subtype: OnDemandCluster
  • autostop mins: autostop disabled
Serving 🍦 :
• _cluster_default_env (runhouse.Env)
  This environment has only python packages installed, if such provided. No resources were found.
• sd_env (runhouse.Env) | pid: 29716 | node: head (52.207.212.159)
  CPU: 0.0% | Memory: 0.13 / 8 Gb (1.65%)
  This environment has only python packages installed, if such provided. No resources were found.
• np_pd_env (runhouse.Env) | pid: 29578 | node: head (52.207.212.159)
  CPU: 0.0% | Memory: 0.13 / 8 Gb (1.71%)
  • /sashab/summer (runhouse.Function)
  • mult (runhouse.Function)

Improvements

Cluster status displays additional information. (#653)
Polling den with cluster status data (#806)
Prevent exposing user Runhouse API tokens on the cluster by saving a modified hashed API token (#797)
Use env vars in default env creation (#798)
Login flow improvements (#796)

Bug Fixes

Fix undefined path when pip installing a folder (#826)
Don't pass basic auth to password cluster HTTP calls (#823)
Fix env installations that contain a provider secret (#822)
Refresh sys.path upon loading a new module (#818)

Docs & Examples

Update domain docs (#812)
Add default env section to envs tutorial (#810)
Minor improvements to parallel embedding example (#795)

Assets 2

Releases: run-house/runhouse

v0.0.37

Highlights

⚠️⚠️ Envs are no longer supported ⚠️⚠️

Process

Image

Den Launcher

New Features

Updates

Bugfixes

Examples

v0.0.36

Highlights

New Features

Improvements

Bugfixes

Deprecations

Build

Examples

v0.0.35

v0.0.34

Highlights

Improvements

Bugfixes

Deprecations

Examples

Contributors

v0.0.33

Highlights

Docker dev & prod workflows

Improvements

Clean up

Bug Fixes

Deprecations

Examples

v0.0.32

Highlights

Changes to reqs, packages, and env installation

Updated Connection Flow

Deprecations

Release Notes

Improvements

Bug Fixes

BC-Breaking

Doc & Examples

v0.0.31

v0.0.30

Highlights

[Alpha] On-demand Docker Clusters

New Features

Bug Fixes

v0.0.29

Highlights

Improvements

Bugfixes

Docs

Testing

Contributors

v0.0.28

Highlights

Improved Cluster Status

Improvements

Bug Fixes

Docs & Examples