Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translation of error pages #1032

Merged
merged 28 commits into from
Jan 29, 2024
Merged

Translation of error pages #1032

merged 28 commits into from
Jan 29, 2024

Conversation

veloman-yunkan
Copy link
Collaborator

Will fix #1019

Copy link

codecov bot commented Dec 5, 2023

Codecov Report

Attention: Patch coverage is 49.34211% with 77 lines in your changes missing coverage. Please review.

Project coverage is 39.34%. Comparing base (9c5f5c7) to head (dc3960c).
Report is 165 commits behind head on main.

Files with missing lines Patch % Lines
src/server/response.cpp 50.44% 4 Missing and 52 partials ⚠️
src/tools/otherTools.cpp 15.38% 1 Missing and 10 partials ⚠️
src/server/internalServer.cpp 44.44% 0 Missing and 5 partials ⚠️
src/server/request_context.cpp 70.00% 0 Missing and 3 partials ⚠️
src/server/i18n.cpp 50.00% 0 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1032      +/-   ##
==========================================
+ Coverage   38.95%   39.34%   +0.39%     
==========================================
  Files          58       58              
  Lines        3958     4061     +103     
  Branches     2181     2237      +56     
==========================================
+ Hits         1542     1598      +56     
- Misses       1086     1091       +5     
- Partials     1330     1372      +42     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@veloman-yunkan
Copy link
Collaborator Author

veloman-yunkan commented Dec 6, 2023

@mgautierfr This PR is still work in progress, however I would like to learn your feedback regarding the small enhancement that I made in mustache.hpp in order to reduce the amount of extra coding that I would need to do without it.

Conceptually, the backend part is complete (I only have to mark those error pages that should be dynamically-translatable and have KIWIX_RESPONSE_DATA in them). The frontend will only have to take advantage of the response parameter data included in the error pages and update the error page text accordingly.

@kelson42
Copy link
Collaborator

@mgautierfr We need a feedback here

@mgautierfr
Copy link
Member

mgautierfr commented Dec 19, 2023

I'm a bit puzzled with the idea to add new specific api feature to mustache.
It would need that we ask packagers to use a patched mustache library and it would be probably (and legitimately) refused.
We should at least propose a patch upstream (and make it accepted) before we go with this.

Why not using lambda (#751 (comment)) ?
It would make us:

  • adapt a bit the templates (to add the "call" to the translation lambda)
  • Create the lambda code but it would be pretty close from what you have done in evaluate function (maybe even simply has you would not have to traverse data objects)

Is there any traps I haven't seen ?


But I really like the separation of the response from the connection. It is a step in the direction of #740 (even if this may not be the purpose)

@kelson42
Copy link
Collaborator

We definitly should rely on unmodified versions of external dependencies... but indeed, we can do upstream PR.

@veloman-yunkan
Copy link
Collaborator Author

Why not using lambda (#751 (comment)) ?

@mgautierfr

Generation of the error page content is a two step process.

There is a shared template for error pages with parameters such as the page title & heading. This is the mandatory minimum needed to build an error page. However, optionally, a list of additional detail lines can be provided.

  1. The error details are generated in C++ code by using parameterized message facilities. Currently (outside this PR) the translation of this additional info happens eagerly in the backend and the translated/instantiated string goes into step 2.
  2. The HTML of the error page is produced by instantiating a mustache template with values for the page title, heading and details text.

Consider it on an example:

    return HTTP404Response(request)
           + noSuchBookErrorMsg(bookName);

First, noSuchBookErrorMsg(bookName) is executed and instantiates a i18n message template for the given book name. Then the resulting std::string (with any explicit information about the book name erased from it) participates in another level of mustache template instantiation. The idea of this PR was to preserve all the data used to generate the response HTML and embed that data in the response so that it can be rerendered in the front-end.

This two-step nature of error response generation was missing from the HTTPErrorResponse unit tests but now I added a commit illustrating it.

When I started writing this reply I didn't at all see how lambdas could be used to address that problem. Now I can sense some solution relying on lambdas, however it seems artificial and limited (it won't handle additional levels involved in making up the final text; suppose that the bookName parameter in the example above is not a terminal string, but some text obtained as a result of translating a parameterized message: bookName = getBookNameMsg(book, name) ).

I will think further, but will appreciate if you hint at the solution that you had in mind if it really is capable of addressing the described challenge.

@veloman-yunkan
Copy link
Collaborator Author

I also implemented the translation of the error page in the front-end. In the worst case (if the modification to mustache.hpp is not accepted) I can switch our ParameterizedMessage from kainjow::mustache::data to a relatively small class implemented in kiwix code.

@veloman-yunkan
Copy link
Collaborator Author

I completed the draft version of the solution that doesn't need mustache.hpp to be patched. I will clean up the PR tomorrow.

This should have been done in PR#997 in order to better guarantee
a lasting solution to issue#995.
This will simplify testing of Response utilities.
... whereupon `ContentResponseBlueprint::generateResponseObject()` (and
`ContentResponseBlueprint` as a whole) no longer needs to be
polymorphic.
ContentResponseBlueprint::m_data is now an opaque data member
implemented in the .cpp and ready to be switched from
kainjow::mustache::data to a different implementation.
Now when parameterized messages are added to an error response, they are
not immediately instantiated (translated). Instead the message id and
the parameters of the message are recorded. The instantiation of the
messages happens right before generating the final content of the
response.
Note that it is declared in stringTools.h but its definition remains in
otherTools.cpp (to minimize the diff).
- More familiar escape sequences for tab, newline and carriage return
  symbols.

- Quote symbol is escaped by default too, however that behaviour can
  be disabled for uses in HTML-related contexts where quotes should then
  be replaced with the character entity "
Now the data used to generate an error response can be made to be
embedded in the response as a JS object KIWIX_RESPONSE_DATA.
This is a shortcut change since it doesn't make sense to send the error
page template with every error response (the viewer can fetch it from
the server once but that's slightly more work).
Non-HTML-encoded HTML-template data causes problems in HTML
even when it appears inside JS string (resulting in the <script> tag being
closed by a </script> appearing inside a JS string).

Besides, the KIWIX_RESPONSE_DATA and KIWIX_RESPONSE_TEMPLATE variables
are set on the window object so that they can be accessed from the top
context.

This commit eliminates the need for the `escapeQuote` parameter in
`escapeForJSON()` (that was introduced earlier in this PR) since now it
is set to false in all call contexts. However from the consistency point
of view, the default and intuitive behaviour of `escapeForJSON()` should
be to escape the quote symbols, which justifies the existence of that
parameter.
@veloman-yunkan
Copy link
Collaborator Author

Clean-up is complete, yet there is still some work to be done on this PR. I will move forward once @mgautierfr approves the current solution. Note that to test it, you must use the last-but-one commit ("Demo of error page translation") - the extra commit was added to prove that this PR doesn't utterly break the CI.

Copy link
Member

@mgautierfr mgautierfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good to me, but I'm not sure that Variant is fully supported on our compilers.

We will check that and if we are good... we are good.
Else we will have to find another solution (or discuss about our compilers toolchain)

typedef std::map<std::string, Data> Object;

private:
std::variant<std::string, bool, List, Object> data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that variant is correctly supported ?

From https://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2017 it is in gcc 7.1 and we are using gcc 6.3 for aarch64 (because of kiwix/kiwix-build#634)

I will setup a build with this branch on kiwix-build to check that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And, sadly, this doesn't compile on aarch64 focal (https://github.com/kiwix/kiwix-build/actions/runs/7492497587/job/20396122209)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I provided a work-around.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good. So now I'm happy with your proposal. Have you finish it or you still have things to do ? (In #1032 (comment) you said that it is no)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have to enable front-end side translation of error pages and fix the unit tests that will become broken by that change.

@veloman-yunkan veloman-yunkan changed the title [WIP] Translation of error pages Translation of error pages Jan 17, 2024
@veloman-yunkan veloman-yunkan marked this pull request as ready for review January 17, 2024 14:49
@veloman-yunkan
Copy link
Collaborator Author

The PR is now functionally ready, however some changes may need to be made depending on the answers to the following questions:

  1. Is sending the template data with every response that may need to be translated in the frontend acceptable? That was a shortcut intended to achieve the desired functionality with the least effort and will also simplify fixing Search results page is not translated #1028.
  2. Is it OK that translation data is also embedded into error responses that are not currently expected to be displayed by the viewer (e.g. responses to invalid requests to /catalog, /raw, /suggest, etc, endpoints)?

@kelson42
Copy link
Collaborator

@mgautierfr Review please?

@mgautierfr
Copy link
Member

Is sending the template data with every response that may need to be translated in the frontend acceptable? That was a shortcut intended to achieve the desired functionality with the least effort and will also simplify fixing
#1028.

I think we are good on this. We may in the future introduce a endpoint error_template to get the template from an error code (and giving this error code in the response) but for now, we can go without it. Template is a not of data, and it send only for error.

Is it OK that translation data is also embedded into error responses that are not currently expected to be displayed by the viewer (e.g. responses to invalid requests to /catalog, /raw, /suggest, etc, endpoints)?

Yes also here. This double the error response size, but error response are pretty small and it is only for error, so pretty rare.

Also, the test cases against vulnerabilities in kiwix-serve seem to suggest
that KIWIX_RESPONSE_DATA should be HTML-encoded too.

I totally agree.

test/server.cpp Outdated Show resolved Hide resolved
@veloman-yunkan
Copy link
Collaborator Author

Also, the test cases against vulnerabilities in kiwix-serve seem to suggest
that KIWIX_RESPONSE_DATA should be HTML-encoded too.

I totally agree.

@mgautierfr You better had objected in this case 😃 . I figured out that the issue is rather an edge case and provided a simpler fix.

This commit demonstrates front-end-side translation of an error page
for a URL like /viewer#INVALIDBOOK/whatever (where INVALIDBOOK should
be a book name NOT present in the library).

Known issues:

- This change breaks a couple of subtests in the
  ServerTest.Http404HtmlError unit test.

- Changing the UI language while an error page is displayed in the
  viewer doesn't retranslate it.
This undoes frontend-side translation of the demo case with the purpose
of having "clean" unit tests to support further work on this PR.
std::variant is not supported by the old version of gcc used under
aarch64.
But the question is do we need all of them to be translatable in the
frontend? Maybe only responses to /random, /content and /search endpoints (that
are displayed in the viewer) should be translatable?

Also, the test cases against vulnerabilities in kiwix-serve seem to suggest
that KIWIX_RESPONSE_DATA should be HTML-encoded too.
Added a test case demonstrating how a bad error response could be
generated if </script> appears inside KIWIX_RESPONSE_DATA. That seems to
be the only problematic interaction between HTML-like syntax inside
javascript code (hence the deleted XXX comments on the other two test
cases).
@mgautierfr
Copy link
Member

@mgautierfr You better had objected in this case 😃 . I figured out that the issue is rather an edge case and provided a simpler fix.

I had missed that, your comment opens my eyes.


We are good
(I've just rebase-fixup)

@mgautierfr mgautierfr merged commit d2f20db into main Jan 29, 2024
16 checks passed
@mgautierfr mgautierfr deleted the error_response_i18n branch January 29, 2024 09:58
@veloman-yunkan
Copy link
Collaborator Author

(I've just rebase-fixup)

Thank you. Note, however, that one of the fixup commits implied (and that was stated in its commit message) that the description of the commit-to-be-amended had to be corrected too. I have now added that information as a commit comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

UI language preference ignored for kiwix-serve error pages
3 participants