-
-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REGRESSION?] API end-point detection does not take care properly of command line argument #1935
Comments
This bug could be reproduced since apiUrlDirector was responsible for building the endpoint in downloader.query() here. The problem is that mwWikiPath wasn't taken into consideration while building requests. Regarding parameter lists:
That means that the default logic is to place /wiki/ to the URL automatically if no mwWikiPath is specified. This logic is missing in the main so this leads to the side-effect (at least for Wikimedia wikis) - no /wiki/ attached, but articles scraped because this param is not needed for them. And that's why this bug has been missing previously. In fact, while regular scraping of Note the presence of /wiki/ and /w/api.php parts. /wiki/ should not be there but it is placed in this endpoint by default - I implemented this in the PR #1939. So now, for the right scrapping of Mediawiki wikis you need to explicitly specify mwWikiPath param like this Also, I noticed that mwWikiPath , mwModulePath, mwApiPath don't have test coverage at all, and this ticket brings this problem to light. |
Do you mean that If so, I'm not sure this is a good idea, see https://pokemon.fandom.com/:
|
@benoit74 , If you try to run this command: But if you set You will get a valid Fandom wiki action api endpoint to get site info: To get site info from Minecraft wiki, you can try this command as well (only from #1939) Note: scrape process will failed with error anyway because Mediawiki REST API (/rest.php) still not adapted in mwoffliner |
I see clearly a problem which is that in Zimfarm, we have no way to make the differrence bezween an empty string argument and an undefined argument. Usually we use the value "/index.php" for the wikipath when needed, but sounds more like a workaround. |
Fandom wiki supports it (check https://pokemon.fandom.com/wiki/Special:Version. Example: https://pokemon.fandom.com/rest.php/v1/page/Volcarona/html Another way to get article from Fandom wiki is by using Action API: |
Setting |
It will depend on the chosen renderer for the Fandom wiki, but probably no: both rest.php and action=parse don't need |
@kelson42 This is a blocker because I need to omit wikiPath somehow during the concatenation process in URL builder in #1939 but at the same time, set default value equal to |
@VadimKovalenkoSNF then default value of mwWikiPath has to be made empty string. I see no other solution. |
@kelson42 isn't this something we should fix on the Zimfarm? I don't think this is a big effort to implement something which says that I'm more concerned about the fact that once 1.4.0 is out and we decide to use it on the farm (probably immediately) we will have to adjust set of parameters: remove |
@benoit74 we have no tools for that. We've had such migrations in the past and I would run a custom script that connected to the mongo DB. I think tying offliner version and schema might bring more frustration given how organic the scrapers development is. History log of schedules modification seems more useful for instance in a similar vein. I've long been a supporter of the farm remaining a tool for skilled farmers… with the addition of Wizards to help create/update recipes by non-skilled people. |
@benoit74 I prefer to have default empty string for this specific parameter |
See:
The URL tested is
https://minecraft.wiki/w/api.php
and it should behttps://minecraft.wiki/api.php
given the command line argument! Pretty sure this is a regression we have introduced.The text was updated successfully, but these errors were encountered: