Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

expose scan_cache table generation to python #2437

Merged
merged 16 commits into from
Aug 20, 2024
Merged

Conversation

rsxdalv
Copy link
Contributor

@rsxdalv rsxdalv commented Aug 5, 2024

Allows for the functionality to be used with servers/gradio projects:

from huggingface_hub import scan_cache_dir
from huggingface_hub.commands.scan_cache import get_table

hf_cache_info = scan_cache_dir()

table = get_table(0, hf_cache_info)
print(table)

Allows for the functionality to be used with servers/gradio projects:

```python
from huggingface_hub import scan_cache_dir
from huggingface_hub.commands.scan_cache import get_table

hf_cache_info = scan_cache_dir()

table = get_table(0, hf_cache_info)
print(table)
```
Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rsxdalv, makes sense to me. The changes looks good to me. Left a minor comment regarding the helper signature.

In addition to this, could you also add

### scan_cache.get_table

[[autodoc]] huggingface_hub.commands.scan_cache.get_table

to the cache.md package reference.. This would add it to the official documentation under https://huggingface.co/docs/huggingface_hub/package_reference/cache. You must add a quick docstring to get_table to explain what it does, its inputs and an example. You can take inspiration from this docstring for example. Thanks a lot in advance!

src/huggingface_hub/commands/scan_cache.py Outdated Show resolved Hide resolved
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@rsxdalv
Copy link
Contributor Author

rsxdalv commented Aug 12, 2024

Thanks for the information! I added a docstring and the reference to en/package_reference/cache.md

I was not sure about the ko/package_reference/cache.md , so I did not add it there.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes @rsxdalv! I left minor comments mostly related to how the doc builder works.

Regarding the ko documentation, could you add

### scan_cache.get_table[[huggingface_hub.commands.scan_cache.get_table]]

[[autodoc]] huggingface_hub.commands.scan_cache.get_table

to it? (good catch, I forgot about it^^)

Other than that, we should be good to merge!

src/huggingface_hub/commands/scan_cache.py Outdated Show resolved Hide resolved
src/huggingface_hub/commands/scan_cache.py Outdated Show resolved Hide resolved
src/huggingface_hub/commands/scan_cache.py Outdated Show resolved Hide resolved
src/huggingface_hub/commands/scan_cache.py Outdated Show resolved Hide resolved
src/huggingface_hub/commands/scan_cache.py Show resolved Hide resolved
src/huggingface_hub/commands/scan_cache.py Outdated Show resolved Hide resolved
src/huggingface_hub/commands/scan_cache.py Outdated Show resolved Hide resolved
@rsxdalv
Copy link
Contributor Author

rsxdalv commented Aug 15, 2024

Thanks, I accepted all the changes and added the ko/cache.md reference.

Copy link
Contributor

@Wauplin Wauplin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thank you @rsxdalv, I think we are good to go now :) Let's wait for the CI to complete and then I'll merge.

EDIT: code quality seems to be complaining. To fix this, you must run make style locally which will fix the issues. Then you can run make quality to check everything's good. Finally, you can commit and push the changes.

@Wauplin
Copy link
Contributor

Wauplin commented Aug 20, 2024

Hi @rsxdalv, I've had a look at the CI issues. It was quite annoying as the docs were failing because get_table was not a first-class citizen but we still wanted it in the docs. In the end, I went for another solution that should also suits you. See changes in d958498 if you are interested.

Code example is now:

>>> from huggingface_hub.utils import scan_cache_dir
>>> hf_cache_info = scan_cache_dir()
HFCacheInfo(...)
>>> print(hf_cache_info.export_as_table())
REPO ID                                             REPO TYPE SIZE ON DISK NB FILES LAST_ACCESSED LAST_MODIFIED REFS LOCAL PATH
--------------------------------------------------- --------- ------------ -------- ------------- ------------- ---- --------------------------------------------------------------------------------------------------
roberta-base                                        model             2.7M        5 1 day ago     1 week ago    main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--roberta-base
suno/bark                                           model             8.8K        1 1 week ago    1 week ago    main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--suno--bark
t5-base                                             model           893.8M        4 4 days ago    7 months ago  main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--t5-base
t5-large                                            model             3.0G        4 5 weeks ago   5 months ago  main C:\\Users\\admin\\.cache\\huggingface\\hub\\models--t5-large

Sorry about pushing changes to your branch, I hope that's fine for you. Once the CI is green, I'll merged it! 😄 EDIT: it's (finally) green! 🎉

@Wauplin Wauplin merged commit 359093f into huggingface:main Aug 20, 2024
12 of 14 checks passed
@rsxdalv
Copy link
Contributor Author

rsxdalv commented Aug 20, 2024

Thank you for carrying this code change to it's inclusion! I am currently developing a module for managing the HF cache within my project, once it is more clear what works and what does not, I hope to make another PR.

One problem that I already know will need to be solved is distinguishing the different revisions from files. If a user sees that by deleting revision 3fd77fe... they will reclaim 10 GBs but actually those files are shared amongst 3 revisions that 'lock' them in, they will be confused. So now the only solution is to educate users about what are revisions and how 'results may warry' when deleting them.

@rsxdalv rsxdalv deleted the patch-1 branch August 20, 2024 10:22
@Wauplin
Copy link
Contributor

Wauplin commented Aug 20, 2024

Yes, I see what you mean. In the delete-cache command, each time an item is selected/unselected, we compute the actual size on disk that will be reclaimed. It's not as good as having it in the scan-cache command but it might help.

@rsxdalv
Copy link
Contributor Author

rsxdalv commented Aug 20, 2024

Actually that does help, thanks for letting me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants