-
Notifications
You must be signed in to change notification settings - Fork 126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigating: arXiv API flakiness #129
Comments
Good example of flakiness between identical versions/protocols: #132 (comment) |
Good diagnosis for this issue. I guess there is not too much we can do unless they fix the backend. |
BTW I found arxiv treats requests differently for programatic clients and real browsers. I suspect this flakiness is on purpose. |
@liyucheng09 can you share any details on that investigation? In #127 I tried tweaking the user-agent. |
I tried about 300 attempts hourly today. More than 3000 in total. 0 out of 3000 suceeded. |
Hello! USER_AGENT = "feedparser/%s +https://github.com/kurtmckee/feedparser/" % __version__ Does the developer find out this funny? It will be much MUCH better, if from os import environ
# <...>
USER_AGENT = environ.get('PYTHON_FEEDPARSER_USER_AGENT', "feedparser/%s +https://github.com/kurtmckee/feedparser/" % __version__) # thank you for you joke, I I throw to the garbage myself and my 2 days for running my project that use langchain and ArXiVLoader |
@Ar4ikov I believe all currently-released versions of I think the most robust change is to make the HTTP calls from Nonetheless, my testing hasn't shown that updating the user agent makes the tests pass 100% of the time. Still searching, but I'll investigate this angle more. Update: I published the major version release. If you find any issues with the new version unrelated to the API instability, please open separate issues for those! I rolled this release in a hurry. |
The API seems much more stable now than it was over the weekend. CI is consistently succeeding locally. I'm going to close this issue for the time being. I'll reopen it in the future if I see similar instability (increased rate of unexpectedly empty first pages, |
I know this is closed, but I just wanted to add that over the last week or two I have started to experience this issue. The API calls occasionally return empty results erroneously. |
@jaypantone yeah, lots of inbound issues about this. I don't work for arXiv, so I can't affect a change there directly. Don't overload them with requests, but you might consider describing your issue on the arXiv mailing list:
I've pinned this issue in the hopes that more people find it rather than creating new ones. |
Description
The arXiv API seems to be degraded. I expect to see more bug reports about this until the underlying issue is resolved.
Behavior identified in #43 seems to have intensified or changed in character (e.g. increased clustering, such that retries are more likely to re-fail, perhaps because of cached bad responses).
Why can't you fix the API?
: I'm not affiliated with arXiv — I maintain a wrapper library for an API I don't administer. I've written the arxiv-api Google Group about this issue.
Why aren't you merging bug fixes?
: Some of the proposed changes here (e.g. consolidating on HTTPS, pinning a specific
feedparser
version, etc.) are probably good changes regardless of the API's stability. I'm hesitant to rush merging and releasing changes without having a strong sense, through integration tests, that they don't damage this library's behavior. That judgment is subject to change, esp. if this issue persists.Steps to reproduce
Versions
python
version: independent.arxiv.py
version:1.*.*
.Additional context
PRs directly addressing the instability:
The text was updated successfully, but these errors were encountered: