-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter ZIMs by humanid & aliases #808
base: main
Are you sure you want to change the base?
Conversation
Pretty sure that installgentoo_en_all_nopic_2019-09 is not the zim name metadata. It's the human friendly name. |
It's the filename (whatever it is on system, you can try changing a file name and see). |
@kelson42 Updated to use alias name. |
Codecov ReportBase: 70.51% // Head: 71.13% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## master #808 +/- ##
==========================================
+ Coverage 70.51% 71.13% +0.62%
==========================================
Files 53 53
Lines 3653 3756 +103
Branches 2029 2089 +60
==========================================
+ Hits 2576 2672 +96
- Misses 1075 1082 +7
Partials 2 2
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
@juuz0 Thx, it's great if can now work with both filename and aliases. I will test it. Reading the source code, I have too remarks:
|
@juuz0 Any feedback? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No unit tests have been added!
src/server/internalServer.cpp
Outdated
@@ -119,6 +119,9 @@ Filter get_search_filter(const RequestContext& request, const std::string& prefi | |||
try { | |||
filter.rejectTags(kiwix::split(request.get_argument(prefix+"notag"), ";")); | |||
} catch (...) {} | |||
try { | |||
filter.aliasName(request.get_argument(prefix + "book")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is that selection of books by alias/filename should not be combined with other filtering criteria. If so, then it rather must be done in InternalServer::selectBooks()
(similar to selection by id or name).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I didn't understand this fully. We need a similar way to filter using query string (just like lang/category/tag)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was not the point of my comment. Do the following queries make sense?
catalog/v2/entries?book=<bookid>&lang=eng
catalog/v2/entries?book=<bookid1>&book=<bookid2>&category=wikipedia
catalog/v2/entries?book=<bookid>&tag=nopic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@veloman-yunkan Not sure to understand your remark either (is that a principle remark about filtering by humanid or a just an implementation remark). Your example don't include any "human_id" argument? A request of the type catalog/v2/entries?book=<bookid>&lang=eng
can not fullfill the feature request because it is not stable over time. The goal here is to have a way to selection an arbitrary list of ZIM file over time. AFAIK, using the human ids and there corresponding aliases is the only way for the moment... and so far it has works fine for the Web URLS (for example library.kiwix.org/wikipedia_en_all_maxi/).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kelson42 In my examples <bookid>
is supposed to be the "human_id". The examples illustrate the following cases:
- selection by "human_id" is combined with filtering by language
- selection by "human_id" is combined with filtering by category
- selection by "human_id" is combined with filtering by tag
In my understanding, selection by human_id should be mutually exclusive with other ways of content filtering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@veloman-yunkan I understand you now. I'm not in favour of hardwiring exclusivity but yes, your examples don't really make sense. For example, if the humanid
is for a book in Armenian, then if lang=eng
then no result, but if lang=arm
then if will give one result. Etc... To me your remark left is mostly about where to implement things, and on that I have no opinion/fully trust you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kelson42 My vision is that human_id must not be combined with other selection criteria. My original proposal applies only to such a use model.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kelson42 In other words selection by humanid should work similar to selection by name or uuid (in which case other filtering criteria are ignored).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing fondamental against this, even if I would prefer to avoid a special treatment. @juuz0 good to you to fix the points noticed by @veloman-yunkan ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please get rid of the fixup commits for the next review.
std::string HumanReadableNameMapper::removeDateFromBookId(const std::string& bookId) { | ||
return replaceRegex(bookId, "", "_[[:digit:]]{4}-[[:digit:]]{2}$"); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please extract this function in a separate preparatory commit.
src/server/internalServer.cpp
Outdated
@@ -119,6 +119,9 @@ Filter get_search_filter(const RequestContext& request, const std::string& prefi | |||
try { | |||
filter.rejectTags(kiwix::split(request.get_argument(prefix+"notag"), ";")); | |||
} catch (...) {} | |||
try { | |||
filter.aliasName(request.get_argument(prefix + "book")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was not the point of my comment. Do the following queries make sense?
catalog/v2/entries?book=<bookid>&lang=eng
catalog/v2/entries?book=<bookid1>&book=<bookid2>&category=wikipedia
catalog/v2/entries?book=<bookid>&tag=nopic
TEST_F(LibraryTest, filterByAliasNames) | ||
{ | ||
// filtering for one book | ||
EXPECT_FILTER_RESULTS(kiwix::Filter().aliasNames({"zimfile"}), | ||
"Ray Charles" | ||
); | ||
|
||
// filerting for more than one book | ||
EXPECT_FILTER_RESULTS(kiwix::Filter().aliasNames({"zimfile", "example"}), | ||
"An example ZIM archive", | ||
"Ray Charles" | ||
); | ||
|
||
// filtering by alias name requires full text match | ||
EXPECT_FILTER_RESULTS(kiwix::Filter().aliasNames({"wrong_name"}), | ||
/* no results */ | ||
); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was rather expecting new tests in test/library_server.cpp
, that in particular would demonstrate how the new filter is expected to interact with other filters?
Extracts the duplicate code from publisherQuery() and nameQuery() into a new function parseQuery().
Adds mechanism to get a ZIM using its alias name. To make a search, one needs to visit: `TLD/?book=aliasNameHere`
Adds support for putting multiple `book` query parameter.
Adds test to check if filtering by alias name works.
151106d
to
2cd0579
Compare
I want to mention this new issue #828 Be sure that you are not assuming too much that we will use a |
This pull request has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
@juuz0 Any feedback? |
Fixes #807