You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, so after resolving the encoding issue, I'm still getting a few errors with the most recent code on the following datasets:
Please try the following tasks later by running individual files: ['multi_news.py', 'reddit_tifu.py', 'search_qa.py', 'amazon_polarity.py', 'spider.py', 'jeopardy.py', 'gigaword.py', 'wiki_auto.py', 'wiki_bio.py', 'yahoo_answers_topics.py', 'yelp_review_full.py', 'dbpedia_14.py', 'definite_pronoun_resolution.py', 'kilt_wow.py']
When I try to run a few of them, they output the following:
(crossfit) > python multi_news.py
Using custom data configuration default
Downloading and preparing dataset multi_news/default (download: 245.06 MiB, generated: 667.74 MiB, post-processed: Unknown size, total: 912.80 MiB) to /home/ABCD/.cache/huggingface/datasets/multi_news/default/1.0.0/465b14e19b4d6a55c9bb9131ca1de642175872143c9b231bee1dce789311b449...
Traceback (most recent call last):
File "multi_news.py", line 32, in <module>
main()
File "multi_news.py", line 29, in main
train, dev, test = dataset.generate_k_shot_data(k=32, seed=seed, path="../data/")
File "/scratch/ABCD/CrossFit/tasks/fewshot_gym_dataset.py", line 79, in generate_k_shot_data
dataset = self.load_dataset()
File "multi_news.py", line 23, in load_dataset
return datasets.load_dataset('multi_news')
File "/ext3/miniconda3/envs/crossfit/lib/python3.6/site-packages/datasets/load.py", line 746, in load_dataset
use_auth_token=use_auth_token,
File "/ext3/miniconda3/envs/crossfit/lib/python3.6/site-packages/datasets/builder.py", line 579, in download_and_prepare
dl_manager=dl_manager, verify_infos=verify_infos, **download_and_prepare_kwargs
File "/ext3/miniconda3/envs/crossfit/lib/python3.6/site-packages/datasets/builder.py", line 639, in _download_and_prepare
self.info.download_checksums, dl_manager.get_recorded_sizes_checksums(), "dataset source files"
File "/ext3/miniconda3/envs/crossfit/lib/python3.6/site-packages/datasets/utils/info_utils.py", line 39, in verify_checksums
raise NonMatchingChecksumError(error_msg + str(bad_urls))
datasets.utils.info_utils.NonMatchingChecksumError: Checksums didn't match for dataset source files:
['https://drive.google.com/uc?export=download&id=1vRY2wM6rlOZrf9exGTm5pXj5ExlVwJ0C']
Running a curl on the URL yields:
<html lang=en><meta charset=utf-8><meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width"><title>Error 400 (Bad Request)!!1</title><style nonce="SpqF3pAZ+9nngUOG9GU6Gg">*{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{color:#222;text-align:unset;margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px;}* > body{background:url(//www.google.com/images/errors/robot.png) 100% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}pre{white-space:pre-wrap;}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.google.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0% 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}</style><main id="af-error-container" role="main"><a href=//www.google.com><span id=logo aria-label=Google role=img></span></a><p><b>400.</b> <ins>That’s an error.</ins><p>The server cannot process the request because it is malformed. It should not be retried. <ins>That’s all we know.</ins></main>
On the README file it says that Google Drive has a quota for daily download, but this error message looks like there may be something else going on.
The text was updated successfully, but these errors were encountered:
For these tasks, please try adding ignore_verifications=True to the load_dataset function. E.g., dataset = load_dataset("kilt_tasks", "wow", ignore_verifications=True). This will skip the checksum verification phase during dataset loading.
We suspect that some dataset owners have updated their files, and this makes the checksum in huggingface datasets outdated. (See #2) Unfortunately we don't have control over this. You may get data samples that are slightly different from our original paper, but we expect the impact of this to be small.
Let me know if some tasks are still not working / missing.
Hi, so after resolving the encoding issue, I'm still getting a few errors with the most recent code on the following datasets:
When I try to run a few of them, they output the following:
Running a curl on the URL yields:
On the README file it says that Google Drive has a quota for daily download, but this error message looks like there may be something else going on.
The text was updated successfully, but these errors were encountered: