Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for caching multiple workflows #4

Merged
merged 20 commits into from
Aug 14, 2024

Conversation

elvispy
Copy link
Contributor

@elvispy elvispy commented Jul 26, 2024

This pull request addresses a need that we have at CERN (high energy physics) where we need to prepare Julia artifacts to support multiple workflows. The artifacts are prepared and then installed on CernVM-FS - a high performance distributed read-only file system used by the LHC experiments (and many other scientific communities).

The changes made implement the following features:

  • Support for user-defined DEPOT_PATH destination.
  • Support for Project.toml files that are generated by the project dependencies of an application (which may not be a package). This support includes being able to create multiple subdirectories of the target depot path and copying only the relevant (.toml) files to the final destination.
  • Support for caching artifacts to support multiple workflows.
  • Added usage example in the README.md file.

To make use of this package at CERN, we gather a list of directories that contain Project.toml files for a number of relevant workflows. Using DepotDelivery, we build all the dependencies into a single depot path inside CernVM-FS, with the new precompiled flag set to true. After publishing this depot path to CernVM-FS these files are visible to machines connected to the grid. This greatly reduces the startup time of applications running in our distributed computing infrastructure.

See the updated readme file for an example (current usage of the script is completely unaffected)

Thanks for creating such a useful script. With these extensions we can really nicely satisfy our use case.

Elvis Aguero and others added 9 commits July 24, 2024 17:13
When the project.toml does not refer to package dependencies, the
'name" may be missing.
For the use case of distributed systems with shared depot path
Distributed systems may benefit from having a depot path
with already precompiled cache objects
To avoid copying populated project folders, especially when the
.toml file is referring to a project and not a package dependency
Multiple users may benefit from sharing the depot path.
They can now populate the same folder by providing their workflows
into the build function via multiple dispatch.
Added docstrings for both versions of build(), and some other explaining comments
Added example with precompilation and custom depot path
Added few comments to explain second readme example
Copy link
Member

@joshday joshday left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! I added a commit so that CI runs. I'll be greedy and ask for a test for the multi-project build method and then I'll merge when CI goes green.

src/DepotDelivery.jl Outdated Show resolved Hide resolved
Elvis Aguero and others added 11 commits July 26, 2024 17:13
To copy full packages into dev/
Added message to explain that build() copies everything that coexists
with the Project.toml file. This is to avoid copying large files in the
case of user-made projects/applications (not packages).
Added the argument force=true inside cp, because the first test
set is not compliant anymore with the previous version, throwing
an error `ArgumentError: "directory" exists. `force=true` is required
to remove "directory" before copying.
Added three project toml files that consist of the packages
URI, Unzip and BitFlags, to test capabilities of the DepotDelivery.build
function.
Added two test sets to test that the packages were installed correctly and also that the cache files exist in the host machine.
The test consists on installing and precompiling three packages:
URIs, Unzip and BitFlags that exist in different Project.toml files
inside the directory MultipleWorkflows/
Updated number of required precompiled files to zero
@elvispy
Copy link
Contributor Author

elvispy commented Aug 12, 2024

@joshday , I have implemented some changes and added a test for my use case. All tests seem to pass, let me know if anything else is needed.

@joshday
Copy link
Member

joshday commented Aug 14, 2024

Looks great! Thanks for the contribution.

@joshday joshday merged commit 9ca352c into JuliaComputing:main Aug 14, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants