Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix github actions for large file storage #224

Merged
merged 22 commits into from
Mar 21, 2024
Merged

Fix github actions for large file storage #224

merged 22 commits into from
Mar 21, 2024

Conversation

CunliangGeng
Copy link
Member

@CunliangGeng CunliangGeng commented Mar 13, 2024

We used large file storage (lfs) for handling large test dataset (zip files).

The github action " actions/checkout" does not use lfs automatically, leading to failed tests, and so we have to set it explicitly (see the usage).

However, it's not free to use lfs and there is storage and bandwidth limit (see the comment), which we do not want.

This PR stopped the use of lfs and implemented an alternative way to handle large files. The approach is to put the large files in Zenodo, and download it before running tests if there is no local cache.

Major changes:

  • Stopped the use of lfs (instructions)
  • Separated tests to unit tests and integration tests
  • Uploaded large test dataset to Zenodo
  • Implemented code to prepare test datasets and folders before running tests
  • Excluded whole tests folder for distribution (i.e. pypi package)

@CunliangGeng CunliangGeng requested a review from gcroci2 March 13, 2024 16:19
@CunliangGeng CunliangGeng self-assigned this Mar 13, 2024
@CunliangGeng CunliangGeng marked this pull request as draft March 14, 2024 07:10
@CunliangGeng
Copy link
Member Author

Got an email from github on warning of the bandwidth usage of lfs (see below), it looks lfs is not a great solution for us. So I converted this PR to draft and will explore other solutions.

You’ve used 100% of your data plan for Git LFS on the organization NPLinker. Please purchase additional data packs to cover your bandwidth and storage usage:

  https://github.com/organizations/NPLinker/billing/data/upgrade

Current usage as of 13 Mar 2024 06:50PM UTC:

  Bandwidth: 1.04 GB / 1 GB (104%)
  Storage: 0.21 GB / 1 GB (21%)

@CunliangGeng CunliangGeng marked this pull request as ready for review March 18, 2024 07:37
Copy link
Contributor

@gcroci2 gcroci2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great PR! Only minor comments and suggestions.

pyproject.toml Show resolved Hide resolved
.github/workflows/build.yml Show resolved Hide resolved
MANIFEST.in Show resolved Hide resolved
tests/integration/conftest.py Outdated Show resolved Hide resolved
tests/integration/conftest.py Show resolved Hide resolved
tests/integration/conftest.py Outdated Show resolved Hide resolved
tests/unit/conftest.py Outdated Show resolved Hide resolved
CunliangGeng and others added 2 commits March 21, 2024 09:57
Co-authored-by: Giulia Crocioni <[email protected]>
Co-authored-by: Giulia Crocioni <[email protected]>
Copy link
Member Author

CunliangGeng commented Mar 21, 2024

Copy link
Member Author

CunliangGeng commented Mar 21, 2024

Merge activity

@CunliangGeng CunliangGeng merged commit 03a7420 into dev Mar 21, 2024
2 of 4 checks passed
@CunliangGeng CunliangGeng deleted the fix_actions_lfs branch March 21, 2024 11:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

2 participants