Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

huggingface-cli upload - Validate README.md before file hashing #2451

Closed
hlky opened this issue Aug 14, 2024 · 2 comments · Fixed by #2452
Closed

huggingface-cli upload - Validate README.md before file hashing #2451

hlky opened this issue Aug 14, 2024 · 2 comments · Fixed by #2452

Comments

@hlky
Copy link
Contributor

hlky commented Aug 14, 2024

For large datasets file hashing can take some time.

Dataset Card validation happens after file hashing.

"It's better to fail early than to fail after all the files have been uploaded hashed."

@Wauplin
Copy link
Contributor

Wauplin commented Aug 15, 2024

Hi @hlky, thanks for noticing this! It would indeed be much better to validate the yaml before.

For reference, the files are hashed when the CommitOperationAdd objects are defined here. You can either call the /validate-yaml endpoint between the moment we have the list of paths and the moment we create the CommitOperationAdd objects. Otherwise, it should be possible to compute the upload_info attribute (which is when the hash is computed) only on demand (lazy-compute) instead of when initializing the object (see here).

Would you like to try to work on a PR to improve this?

@hlky
Copy link
Contributor Author

hlky commented Aug 15, 2024

Thanks for the reference, I've created a PR #2452

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants