Releases: run-house/runhouse
v0.0.37
Highlights
⚠️ ⚠️ Envs are no longer supported ⚠️ ⚠️
The Runhouse Env class (rh.Env
or rh.env
) is no longer supported. Instead, we introduce the concept of a process
to handle running on a specific worker, and the Image
replaces the default_env
of a cluster, specifying any setup steps to take on the cluster.
Process
To specify env vars, compute, conda envs, or anything necessary to run on a specific process, you can create an process through the cluster.ensure_cluster_process()
function
Instead of:
env = rh.env(name="my_env", reqs=["numpy, "pandas"], env_vars=MY_ENV_VARS, compute={"CPU": 0.5})
rh.function(local_fn).to(cluster, env=env)
You can do:
cluster.install_packages(["numpy, "pandas"])
process = cluster.ensure_process_created("my_process", env_vars=MY_ENV_VARS, compute={"CPU": 0.5})
rh.function(local_fn).to(cluster, process=process)
Image
Image is a new primitive that makes it easier to specify environment and setup steps to start the cluster with. It replaces the previous default_env
cluster argument. It supports installing packages, running commands, creating a conda env, loading from a machine/docker image, and more.
my_image = (
rh.Image(name="base_image")
.setup_conda_env(
conda_env_name="base_env",
conda_yaml={"dependencies": ["python=3.11"], "name": "base_env"},
)
.install_packages(["numpy", "pandas"])
.set_env_vars({"OMP_NUM_THREADS": 1})
)
cluster = rh.cluster("rh-cpu", instance_type="CPU:2+", image=my_image)
cluster.up_if_not()
Den Launcher
Now you can manage clusters (up, teardown, and status checks) via the Runhouse launcher / control plane. This is especially useful for cases where a local sky database would not be available, like upping a cluster in distributed workflows. Set launcher="den"
in the cluster factory or include launcher: den
in your local ~/.rh/config.yaml
to use this feature.
You can also track cluster status, memory consumption, and view logs for Den launched clusters in the resources dashboard.
New Features
- introduce
DockerRegistrySecret
for pulling from private or cloud provider registries
Updates
- rename cluster launched_properties as compute_properties, and save generated internal and external ips in there
- turn autosave to False by default, configurable in your rh config
- allow cluster name to be passed into runhouse server cli commands
- add rh cli alias
Bugfixes
- cluster factory to properly handle differences when loading ints and autostop_mins
Examples
- updated examples to remove env dependency, and follow the new process/Image flow
v0.0.36
Highlights
Enrich CLI commands (and corresponding python APIs) for interacting with Runhouse clusters.
runhouse cluster
: list
, down
, up
, keep-warm
, logs
, status
, ssh
runhouse server
: start
, restart
, stop
, status
New Features
- Cluster list support (#1225, #1227, #1231, #1233, #1245)
- Runhouse cluster & server CLI support (#1268, #1301)
- Default ssh key to use for clusters (#1357, #1358, #1359, #1365)
- Kubeconfig secret (#1346)
- Distributed Pool - runhouse, Ray, PyTorch, and Dask (#1304, #1305, #1378, #1379)
Improvements
- Cluster to reuse secrets keys instead of generating new secrets per cluster (#1338, #1344)
- Log streaming for nested and multinode (#1375, #1377)
Bugfixes
- Cluster reloading fixes (#1290, #1291)
- Don't refresh when initializing on-demand clusters via Sky (#1258)
- Fix notebook support (#1390)
- Fix multinode K8s (#1376)
Deprecations
- Python 3.7 support (#1281)
- Replacing cluster num_instances with num_nodes (#1380, #1405)
- Replace cluster address with head_ip (#1370)
Build
- Pin skypilot to 0.7.0 for faster cluster start times
Examples
v0.0.35
v0.0.34
Highlights
This release expands reporting and monitoring of cluster status, cloud properties, and utilization metrics (via runhouse status
and the Den UI), and improves mutli-node and multi-cluster support broadly.
Improvements
- Cluster status, cloud info, and utilization reporting improvements (#1209, #1224, #1226, #1197, #1242, #1246, #1249)
- Install default_env on all nodes (#1240)
- Introduce cluster.a_up to parallelize or launch clusters asyncronously by @dongreenberg in #1247
Bugfixes
- Install from explicit
dest_path
instead of relative path. by @rohinb2 in #1238 - Temporary fix for pydantic version error with FastAPI. by @rohinb2 in #1237
- Run ssh commands for docker clusters via
ssh
+docker exec
as opposed to SSH Proxy. by @rohinb2 in (#1235, #1251)
Deprecations
- Deprecate function.send_secrets (#1091)
- Deprecate some old unused function methods by @dongreenberg in #1092
- remove sagemaker github action by @jlewitt1 in #1244
Examples
Full Changelog: 0.0.33...v0.0.34
v0.0.33
Highlights
Docker dev & prod workflows
We released a new guide showing how you can use the same docker image on a Runhouse cluster for both local development and production.
Local Development: We now support passing in requirements to the Runhouse env
object instead of preemptively sending the env to the function, ensuring that local changes are synced over.
dev_env = rh.env(name="dev_env", env_vars={"HF_TOKEN": "****"}, reqs=["diffusers", "transformers"])
# we pass in the env object to ensure any requirements specified in the env will be synced onto the cluster properly
dev_fn = rh.function(is_transformers_available).to(cluster, env=dev_env)
Production: The docker image holds all the packages and versions we want, ensuring reproducibility and reduce ingress costs of installing dependencies on each production cluster.
Improvements
- If no package version is specified, install the same version that is installed locally (#1058)
- Convert folder to a Module (#995)
- Support constructing an empty env with only a name (#1095)
- Support loading an on-demand cluster from name locally in addition to Den (#1093)
- Add option to skip loading resources from Den (#1096)
- Update client connection flow to reduce check server calls (#1098, #1141)
- Updating Folder APIs when interacting with a cluster (#1147, #1116, #1108)
- Add launched properties to ondemand cluster config (#1139)
- Stream logs in SSH setup commands (#1211)
- Logging updates and improvements (#1177, #1178, #1204)
- Update command running for Kubernetes, and support Docker images with Kubernetes clusters (#1173, #1174)
Clean up
Bug Fixes
- Remove callable param from Folder APIs (#1101)
- Fix folder construction in the Package (#1083)
importlib
support for Python 3.7 (#1073)- Fix conda installation on cluster (#1176)
- Have modules created in an env be put within that env (#1194)
Deprecations
Deprecate no longer maintained resources and functionality
Examples
v0.0.32
Highlights
Changes to reqs, packages, and env installation
More explicit sycning behavior between local and remote development, and more flexible support for package installations on the cluster.
working_dir
env argument
Previously, you could specify a "working directory" as part of your Runhouse env to sync over to a cluster. By default, this working directory is based on the folder that you are running your Runhouse Python code from (perhaps totally disconnected from the location of the coe you're sending over). This behavior changes in this release, and we now instead sync over the package in which the local class or function being sent to the cluster belongs. This eliminates unexpected sharp edges users encountered with working_dir setting, and we look to deprecate working_dir sometime soon (though this should not be disruptive). To specify folders to sync, one can directly pass local paths into the reqs
of the Runhouse env.
env = rh.env(
name="env_with_local_package",
reqs=["pandas", "torch", "~/path/to/package"],
)
Local path detection
When sending over a function or module that imports local code to a remote cluster, we now automatically detect [1] the package on the filesystem corresponding to that code, add it to the env requirements, and sync it over to the cluster.
Local package detection
You can now pass the name of a local Python package, whether installed from source or editably, and we'll detect the correct directory to sync over. This is important for teams collaborating on shapred repos, who may not have their clones saved in the same place.
Increased package installation support
Pip install will now work properly for local folders, in addition to package strings. Including a requirement that is installed locally will also sync appropriately to the cluster.
- Pip install method will correctly work for local folders as well (#998)
- Importing from a given file will now detect [1] a package on the filesystem to sync based on that file, add that to the envs reqs, and then sync it (#914)
Updated Connection Flow
Improvements to detecting and checking the HTTP server connection. Improved and faster approach to fallbacks and retries when unable to reach the client. As part of the connection refactor, ondemand clusters are no longer automatically brought up when there is a call made to/on it. Rather, one needs to manually call up()
or up_if_not()
to bring up the cluster if it is not already running.
Deprecations
Remove support and clean up docs for features that are no longer maintained. See release notes below for complete list.
Release Notes
Improvements
Reqs, Packages & Envs Revamp
- Automatically add module pointers and reqs to env if they don't exist (#914, #968)
- Change pointers to store absolute path (#989)
- Insert at the beginning of sys.path for packages (#990)
- Including a req that is installed locally will now sync appropriately (#997)
- Support pip installing folders (#998)
Connection Flow
- Fix tunnel hanging at termination (#957, #959)
- Convert cluster client to a property (#975)
- Add wrapper for running client methods (#981)
- Move check server logic into cluster client call (#982, #983)
- Handle Ray start in runhouse restart more reliably (#1059)
Other
- Expand supported Ray versions (#720)
- Handle SSH secret syncing in login (#887)
- Add log support for clusters with port forwarding (#928)
- Remove use of pexpect from password cluster login (#940)
- Improve and wrap conda install command on the cluster (#954)
- Add runhouse ssh support to local unsaved clusters (#1012)
- Check IPs for ondemand clusters when loading from Den (#1018)
- Add memory percent to cluster status (#1030)
- Don’t log full config (#1063)
Bug Fixes
- Propagate self.env if it was set for a module (#916)
- Previously, if you pass reqs to a resource constructor as a list, it would get wiped when actually running
.to
- Previously, if you pass reqs to a resource constructor as a list, it would get wiped when actually running
- Update conda env run commands to handle bash commands (#920)
- Update config comparison check in cluster factory (#955)
- Due to string v int mismatch in ips, sometimes cluster would reconstruct rather than load from Den
- Quote pip installs correctly (#1032)
- Previously, package strings like
numpy==1.24.6
wouldnt install correctly due to parsing
- Previously, package strings like
- Pass through missing k8s cluster factory args (#1067)
BC-Breaking
Package & Envs
- Locate working dir in breath first approach (#908)
- Search for local of module based on it's file location, rather than runhouse or current working directory
- Remove default working dir from
\.
toNone
(#915)- Working directory is no longer synced over by default when sending to a cluster. Instead, it can be specified by passing in the corresponding Package in the env reqs
- Selectively sync runhouse for default restart behavior (#1020)
- Only resync runhouse during restart server if it is a local editable package, or if it is explicitly set to True
cluster.run()
- Disallow passing node when on cluster (#946)
- Remove run name support for running via ssh (#936)
- Remove port_forward argument (#934)
Deprecations
- Remove telemetry APIs and otel requirements (#986)
- Remove KVstore, mapper, tables, and queue (#994)
- Remove Run from docs (#1011)
- Deprecate
system
arg from function and module factories
Other
- No longer automatically up an ondemand cluster when performing a call on it
- Previously, when making a call to or on an ondemand cluster that is not already running, Runhouse automatically brings up the cluster. This behavior changes in this release, and one will need to add run
cluster.up()
orcluster.up_if_not()
to start the ondemand cluster.
- Previously, when making a call to or on an ondemand cluster that is not already running, Runhouse automatically brings up the cluster. This behavior changes in this release, and one will need to add run
Doc & Examples
- Add llama 3 fine tuning example (#939, #942)
- Update module API example (#950)
- Update architecture overview (#1036)
- Add
up_if_not()
in examples and tutorials (#999, #1009) - Minor fixes in docs (#1008, #1040, #1057)
[1] The detection algorithm for what to sync when you have an importable function or module is as follows:
- Start with the file that contains this importable function or module on the filesystem
- Go up directories till you find any of the following config files ".git", "setup.py", "setup.cfg", "pyproject.toml", "requirements.txt"
v0.0.31
Quick release to allow passing Sky kwargs to ondemand_cluster (#978).
Full Changelog: v0.0.30...v0.0.31
v0.0.30
Highlights
[Alpha] On-demand Docker Clusters
This release adds support for using a base Docker image in conjunction with an on-demand cluster. By specifying the image_id
field in the format docker:<registry>/<image>:<tag>
in the cluster factory, the corresponding Docker container will be downloaded when the cluster is launched. The Runhouse server is then started inside the Docker container, ensuring that anything that goes through Runhouse will be run inside the container environment.
For more information on usage, such as setting up environment variables for using private Docker registries, please refer to the User Guide.
docker_cluster = rh.ondemand_cluster(
name="pytorch_cluster",
image_id="docker:nvcr.io/nvidia/pytorch:23.10-py3",
instance_type="CPU:2+",
provider="aws",
)
docker_cluster.up_if_not()
New Features
Bug Fixes
v0.0.29
Highlights
This release improves autostop stability and robustness considerably, and introduces the ability to send an env or module to a specific node in a multinode cluster.
Improvements
- Simplify and improve Autostop by @rohinb2 and @dongreenberg in #895, #894
- Send env to a specific
node_idx
. by @rohinb2 in #835 - Update secrets login flow to be more opt-in by @carolineechen in #880
- Show information about active function calls in cluster.status() by @rohinb2 in #871 and #896
Bugfixes
- [bug] Make
disable_den_auth
actually sync. by @rohinb2 in #865 - Move config.yaml creation to restart server() by @BelSasha in #868
- Bump SkyPilot Version to 0.6.0 and fix remote SkyPilot dependencies on Start by @dongreenberg in #855
- Consolidate periodic loops into one function updating Den and updating autostop. by @rohinb2 in #873
- Fix cluster factory bug with den_auth clusters not being saved. by @rohinb2 in #878
- Remove resource conversion check for secrets by @carolineechen in #881
Docs
- Clarify setup in docs and den quick start by @mkandler in #876
- Update status docs by @BelSasha in #889
- Llama 3 vLLM GCP example by @mkandler in #893
- Fix bug in starting example code block by @mkandler in #884
- Adds quotes to pip install in examples by @mkandler in #886
- Update secrets login in api tutorial by @carolineechen in #882
Testing
- Update multinode cluster fixtures. by @rohinb2 in #856
- minor changes to cluster status tests by @BelSasha in #891
- Group status tests together by @dongreenberg in #899
- Reorganzize default env tests and consolidate fixture into GCP fixture by @dongreenberg in #900
- Stop overwriting local dotenv in tests. by @dongreenberg in #901
- Consolidate static cluster fixtures into one by @dongreenberg in #902
- Change AutostopServlet into AutostopHelper, and test properly by @dongreenberg in #897
- cluster status scheduler tests by @BelSasha in #869
Full Changelog: v0.0.28...v0.0.29
v0.0.28
Highlights
runhouse status
: Improving visibility into cluster utilization and memory consumption
Improved Cluster Status
Runhouse now provides a more comprehensive view of the cluster's utilization and memory consumption, providing more coverage over the true utilization numbers across each worker and head node of a the cluster.
Information surfaced includes: PID, CPU utilization, memory consumption, and GPU utilization (where relevant).
This data can be viewed as part of the runhouse status
CLI command:
GPU Cluster
>> runhouse status
/sashab/rh-basic-gpu
😈 Runhouse Daemon is running 🏃
Runhouse v0.0.28
server pid: 29486
• server port: 32300
• den auth: True
• server connection type: ssh
• backend config:
• resource subtype: OnDemandCluster
• use local telemetry: False
• domain: None
• server host: 0.0.0.0
• ips: ['35.171.157.49']
• resource subtype: OnDemandCluster
• autostop mins: autostop disabled
Serving 🍦 :
• _cluster_default_env (runhouse.Env)
This environment has only python packages installed, if such provided. No resources were found.
• np_pd_env (runhouse.Env) | pid: 29672 | node: head (35.171.157.49)
CPU: 0.0% | Memory: 0.13 / 16 Gb (0.85%)
• /sashab/summer (runhouse.Function)
• mult (runhouse.Function)
• sd_env (runhouse.Env) | pid: 29812 | node: head (35.171.157.49)
CPU: 1.0% | Memory: 4.47 / 16 Gb (28.95%)
GPU: 0.0% | Memory: 6.89 / 23 Gb (29.96%)
• sd_generate (runhouse.Function)
CPU cluster
>> runhouse status
/sashab/rh-basic-cpu
😈 Runhouse Daemon is running 🏃
Runhouse v0.0.28
server pid: 29395
• server port: 32300
• den auth: True
• server connection type: ssh
• backend config:
• resource subtype: OnDemandCluster
• use local telemetry: False
• domain: None
• server host: 0.0.0.0
• ips: ['52.207.212.159']
• resource subtype: OnDemandCluster
• autostop mins: autostop disabled
Serving 🍦 :
• _cluster_default_env (runhouse.Env)
This environment has only python packages installed, if such provided. No resources were found.
• sd_env (runhouse.Env) | pid: 29716 | node: head (52.207.212.159)
CPU: 0.0% | Memory: 0.13 / 8 Gb (1.65%)
This environment has only python packages installed, if such provided. No resources were found.
• np_pd_env (runhouse.Env) | pid: 29578 | node: head (52.207.212.159)
CPU: 0.0% | Memory: 0.13 / 8 Gb (1.71%)
• /sashab/summer (runhouse.Function)
• mult (runhouse.Function)
Improvements
- Cluster status displays additional information. (#653)
- Polling den with cluster status data (#806)
- Prevent exposing user Runhouse API tokens on the cluster by saving a modified hashed API token (#797)
- Use env vars in default env creation (#798)
- Login flow improvements (#796)
Bug Fixes
- Fix undefined path when pip installing a folder (#826)
- Don't pass basic auth to password cluster HTTP calls (#823)
- Fix env installations that contain a provider secret (#822)
- Refresh sys.path upon loading a new module (#818)