-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected keyword argument 'hf' when downloading CSV dataset from S3 #6598
Comments
I am facing similar issue while reading a csv file from s3. Wondering if somebody has found a workaround. |
same thing happened to other formats like parquet |
I am facing similar issue while reading a parquet file from s3. |
Re-define the DownloadConfig might work:
|
This seemed to work for me. |
use pandas and then convert to |
I am currently facing the same issue while using a custom loading script with files located in a remote S3 instance. I was using the As stated before, the library forces the existence of a .../site-packages/s3fs/core.py", line 516, in set_session
self.session = aiobotocore.session.AioSession(**self.kwargs)
TypeError: __init__() got an unexpected keyword argument 'hf'. Meanwhile, if my {'key': '...',
'secret': '...',
'client_kwargs': {'endpoint_url': '...'}} it works alright. |
Did anyone look into similar issues with model upload? setting s3 for checkpointing return |
Describe the bug
I receive this error message when using
load_dataset
with "csv" path anddataset_files=s3://...
:I found a similar issue here: https://stackoverflow.com/questions/77596258/aws-issue-load-dataset-from-s3-fails-with-unexpected-keyword-argument-error-in
Full stacktrace:
Steps to reproduce the bug
s3://bucket/data.csv
Encountered in version
2.16.1
but also reproduced in2.16.0
and2.15.0
.Note: I encountered this in a unit test using a
moto
mock for S3, however since the error occurs before the session is instantiated, it should not be the issue.Expected behavior
No exception is raised, the boto3 session is created successfully, and the CSV file is downloaded successfully and returned as a dataset.
===
After some research I found that
DownloadConfig
has a__post_init__
method that always forces this value to be set in itsstorage_options
, even though in case of an S3 location the storage options get passed on to the S3 Session which does not expect this parameter. I assume this parameter is needed when reading from the huggingface hub and should not be set in this context.Unfortunately there is nothing the user can do to work around it. Even if you manually do something like:
the library will still reinsert this parameter when
download_config = self.download_config.copy()
in line 418 ofdownload_manager.py
(DownloadManager.download
).Therefore
load_dataset
currently cannot be used to read a dataset in CSV format from an S3 location.Environment info
datasets
version: 2.16.1huggingface_hub
version: 0.20.2fsspec
version: 2023.10.0The text was updated successfully, but these errors were encountered: