You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[error] [2025-01-13T13:46:08.126Z] Failed to run mwoffliner after [1s]: {
"stack": "Error: mwUrl [https://cyclowiki.org] is not valid.\n at file:///tmp/mwoffliner/lib/sanitize-argument.js:134:15\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async sanitize_mwUrl (file:///tmp/mwoffliner/lib/sanitize-argument.js:133:5)\n at async sanitize_all (file:///tmp/mwoffliner/lib/sanitize-argument.js:55:5)",
"message": "mwUrl [https://cyclowiki.org] is not valid."
}
[error] [2025-01-13T13:46:08.127Z]
**********
mwUrl [https://cyclowiki.org] is not valid.
**********
Explanation: first check of mwUrl seems to be failing. Could be caused by the fact that Cloudflare is protecting this website. To be investigated.
The text was updated successfully, but these errors were encountered:
Yeah, I saw this other issue, where we just skipped the test to solve it. Now that we have repro, I suspect we might be able to do something by passing a proper User-Agent. At least this is what we achieved to do in other scrapers. Not bullet-proof, but a "bad" User-Agent triggers much more easily Cloudflare protections. By "bad", I mean something which does not look at all like a browser.
benoit74
changed the title
cyclowiki is failing with mwUrl [https://cyclowiki.org] is not valid.
Avoid being blocked by Cloudflare
Jan 14, 2025
Not bullet-proof, but a "bad" User-Agent triggers much more easily Cloudflare protections. By "bad", I mean something which does not look at all like a browser.
Worth a try indeed. If it works, we should create an option for that.
mwoffliner version : 1.14.0
Task: https://farm.openzim.org/pipeline/1e755f21-4805-4cf8-8fa1-63fd5a5dc9d5/debug
Recipe: https://farm.openzim.org/recipes/cyclowiki.org_rus_all
Request: openzim/zim-requests#9
Log:
Explanation: first check of mwUrl seems to be failing. Could be caused by the fact that Cloudflare is protecting this website. To be investigated.
The text was updated successfully, but these errors were encountered: