-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(windows): handle CRLF specific to windows #10697
Conversation
Added a function to replace `\r\n` with `\n` that is U+000D U+000A with U+000A. And another to do the reverse that is insert '\r` infront of any `\n` charcters to support the CRLF for windows. Supporting unit tests also added. The function have been integerated into the code and are used before the set_if_need Keyman Core call and then in the processing of actions returned from the Core.
User Test ResultsTest specification and instructions
Test Artifacts |
The user tests do not appear to test anything relating to CRLF. Is that something we can do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This LGTM. Would be good to have a user test for the CRLF scenarios. Minor nits otherwise.
Correct the current test was good regression to make sure nothing was broken. I was working on using a ldml keyboard with rules for the CRLF test. However, it seemed to fail but the debugger and logging was giving some "interesting" results, and proving a bit tricky to test, as it appears something else can strip the CR. Still investigating. |
@mcdurdin I have just been doing some logging around the contexts and finding some surprising results.
Using Keyman Developer I was able to create a keyboard with |
And with that I guess this has to move back to draft |
Which compliant apps are you testing with? I am guessing some apps will treat each paragraph as a separate context, but others may inline the New thought: ideally, we track what the app gives us in the input context, and give the same pattern back. So we have a function (serious pseudocode ahead warning):
And, this could even be buried in Core, ideally, so that the Engine doesn't need to do this legwork. We could pass Seems like too much for 17.0 though! |
This is putting a more of platform knowledge into the core, but it is ok as it we keep it away from the core logic of the processors. I assume when we use the This would be more robust to handling any nuances from different platforms and apps. for example would also handle any MacOS apps that also used just The platforms would need to make sure to set the platform default line break when loading the |
I have implemented the normalize/ restore algorithm but not in core at this late stage as it would effect all platforms. |
@bharanidharanj The testing of Word needs to be for an installed desktop version. It can be 365 but still needs to be an installed version. It failed the test because you used the online version in a web browser that doesn't support the Text Services Framework(TSF). Which is needed for this test. Do you have a test machine with Word installed? If not let me know and I will modify the test to suit. @keymanapp-test-bot retest TEST_NEWLINE_LF_ONLY |
@rc-swag I think it would be better to modify the test to suit. Thanks. |
@bharanidharanj I have updated the |
@rc-swag Okay, Thanks. I will do it. |
Testing showed a different approach was required and has since been implemented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat uncomfortable about this PR because it makes significant string handling changes very late in the beta cycle. I think we may need some more exercising of the boundary conditions, and I think we should be able to avoid truncation of buffers with a small rework per my notes?
size_t buf_length = wcslen(win_out_str) * 2; | ||
LPWSTR buf_string = new WCHAR[buf_length]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't allow space for final terminating nul if every char is doubled
while (*in_ptr != L'\0') { | ||
if (*in_ptr == '\n') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not consistently using L
prefix, here and elsewhere in this PR
break; | ||
case lbCR: | ||
*buf_ptr++ = '\r'; | ||
*in_ptr++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*in_ptr++; | |
in_ptr++; |
auto diff = final_length + 1 - win_out_size_in_chars; // +1 for null termination | ||
wcscpy_s(win_out_str, win_out_size_in_chars, buf_string + diff); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could break a surrogate pair
} | ||
*buf_ptr = '\0'; // Null terminate the modified string | ||
|
||
// may now need to truncate the string preserving the end closest the caret. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of doing this, couldn't we just return the longer buffer?
|
||
BOOL | ||
context_char32_char16(const km_core_usv *core_output, LPWSTR win_out_str, uint32_t win_output_size_in_char) { | ||
if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char <= 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char <= 0) { | |
if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char == 0) { |
uint32_t cannot be <0
auto final_length = wcsnlen_s(buf, buf_length); | ||
if (final_length < win_output_size_in_char) { | ||
wcscpy_s(win_out_str, win_output_size_in_char, buf); | ||
} else { | ||
auto diff = final_length + 1 - win_output_size_in_char; // +1 for null termination | ||
wcscpy_s(win_out_str, win_output_size_in_char, buf + diff); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same risk of splitting a surrogate pair
@@ -78,3 +78,58 @@ TEST(AppContext, AppContext_Delete) { | |||
testContext.Delete(); | |||
|
|||
} | |||
|
|||
// Test normalize and restore line break functions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need some tests for surrogate pair edge cases, especially boundary conditions
Based on this I am going to close this and update the issue #10471 for 18.0 but for the algorithm to be in the core (and to check developer debugger). That is where I ultimately wanted it to live and it would then cover linux and macos also. This was just to get a reference solution over the line for the LDML and version 17. With the buffer in the core truncation I would have done it differently however when I looked at it was just kicking the problem down the road because it had truncation in there anyway around a defined MAX_CONTEXT. Anyway, for the core I can revisit it as it will sit on a different layer. |
This will be moved to version 18.0 as it is too late in the beta cycle. The logic will also be contained in the core. |
Agree that this can wait until 18.0; if necessary we can back-port to 17.0 stable. |
Fixes: #10471
Added a function to replace
\r\n
with\n
that is U+000D U+000A with U+000A. And another to do the reverse, that is insert\r
in front of any\n
characters to support the CRLF for Windows. Supporting unit tests were also added.This PR took a change after my first commit. Initially, I had nice symmetrical pre_process_context post_process_context pair. Then I realised I needed to deal with the UTF-32 encoding. It was more efficient just to process that and insert the CR at the same time, otherwise we a just adding another parse and making an extra buffer we don't need. I have written comments in the code that when if we remove the action queue then we can then make it more
User Testing
TEST_HIEROGLYPHIC_WINDOWS
TEST_NEWLINE_LF_ONLY
Install the PR build of Keyman
Row lf
TEST_NEWLINE_CR_ONLY
Install the PR build of Keyman
https://www.editpad.org/
Row lf