Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(windows): handle CRLF specific to windows #10697

Closed
wants to merge 8 commits into from

Conversation

rc-swag
Copy link
Contributor

@rc-swag rc-swag commented Feb 13, 2024

Fixes: #10471

Added a function to replace \r\n with \n that is U+000D U+000A with U+000A. And another to do the reverse, that is insert \r in front of any \n characters to support the CRLF for Windows. Supporting unit tests were also added.

This PR took a change after my first commit. Initially, I had nice symmetrical pre_process_context post_process_context pair. Then I realised I needed to deal with the UTF-32 encoding. It was more efficient just to process that and insert the CR at the same time, otherwise we a just adding another parse and making an extra buffer we don't need. I have written comments in the code that when if we remove the action queue then we can then make it more

User Testing

  • TEST_HIEROGLYPHIC_WINDOWS

    1. Install Hieroglyphic Keyboard from keyman.com into Keyman for Windows.
    2. Open Wordpad.
    3. Select the Hieroglyphic keyboard
    4. In wordpad, type: wdispacebarspacebar
    5. Expected Output : w𓂞
    6. Type:backspace
    7. Expected Output: w
  • TEST_NEWLINE_LF_ONLY
    Install the PR build of Keyman

    1. Install the attached newline keyboard newline.zip
    2. Open Wordpad.
    3. In wordpad, type: aEnterc
    4. Expected Output : Row lf
  • TEST_NEWLINE_CR_ONLY
    Install the PR build of Keyman

    1. Install the attached newline keyboard newline.zip if not installed already
    2. Open Firefox Browser (It has to be Firefox as it supports TSF)
    3. In the bowser type url bar type https://www.editpad.org/
    4. In new edit document type: aEnterc
    5. Expected Output : Row lf

Added a function to replace `\r\n` with `\n` that is
U+000D U+000A with U+000A. And another to do the reverse
that is insert '\r` infront of any `\n` charcters to support
the CRLF for windows. Supporting unit tests also added.
The function have been integerated into the code and are used
before the set_if_need Keyman Core call and then in the processing
of actions returned from the Core.
@keymanapp-test-bot keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Feb 13, 2024
@keymanapp-test-bot
Copy link

keymanapp-test-bot bot commented Feb 13, 2024

User Test Results

Test specification and instructions

  • TEST_HIEROGLYPHIC_WINDOWS (PASSED): Tested with the attached PR build (17.0.266-alpha-test-10697) on Windows 11 OS and here is my observation: 1. Opened Wordpad app. 2. Typed 'wdi spacebar spacebar' and verified that it showed the expected output.3. Pressed backspace key and showed the expected output. Seems to be working fine. (notes)
  • TEST_NEWLINE_LF_ONLY (PASSED): Retested as per Ross's suggestion in a WordPad document and here is my observation: 1. Followed the steps as mentioned in the test description. 2. Verified that 'Row If' text appears on the document after pressing the Enter key. (notes)
  • TEST_NEWLINE_CR_ONLY (PASSED): Tested with the attached PR build (keyman version 17.0.285-beta-test-10697) and here is my observation: 1. Opened the 'editpad' website. 2. Created a new document. 3. Followed step (iv) and verified that the text 'ROW If' appeared in the document. Seems to be working as expected. (notes)

Test Artifacts

@keymanapp-test-bot keymanapp-test-bot bot added has-user-test user-test-required User tests have not been completed and removed user-test-missing User tests have not yet been defined for the PR labels Feb 16, 2024
@rc-swag rc-swag self-assigned this Feb 16, 2024
@rc-swag rc-swag marked this pull request as ready for review February 16, 2024 04:16
@rc-swag rc-swag changed the base branch from master to beta February 16, 2024 05:31
@bharanidharanj
Copy link

Test Results

  • TEST_HIEROGLYPHIC_WINDOWS (PASSED): Tested with the attached PR build (17.0.266-alpha-test-10697) on Windows 11 OS and here is my observation: 1. Opened Wordpad app. 2. Typed 'wdi spacebar spacebar' and verified that it showed the expected output.3. Pressed backspace key and showed the expected output. Seems to be working fine.

..Result 1

..Result 2

@keymanapp-test-bot keymanapp-test-bot bot removed the user-test-required User tests have not been completed label Feb 16, 2024
@mcdurdin mcdurdin modified the milestones: B17S1, B17S2 Feb 17, 2024
@mcdurdin
Copy link
Member

The user tests do not appear to test anything relating to CRLF. Is that something we can do?

mcdurdin
mcdurdin previously approved these changes Feb 17, 2024
Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. Would be good to have a user test for the CRLF scenarios. Minor nits otherwise.

windows/src/engine/keyman32/appcontext.h Outdated Show resolved Hide resolved
windows/src/engine/keyman32/appcontext.cpp Outdated Show resolved Hide resolved
windows/src/engine/keyman32/appcontext.cpp Outdated Show resolved Hide resolved
windows/src/engine/keyman32/kmprocessactions.cpp Outdated Show resolved Hide resolved
@rc-swag
Copy link
Contributor Author

rc-swag commented Feb 18, 2024

The user tests do not appear to test anything relating to CRLF. Is that something we can do?

Correct the current test was good regression to make sure nothing was broken. I was working on using a ldml keyboard with rules for the CRLF test. However, it seemed to fail but the debugger and logging was giving some "interesting" results, and proving a bit tricky to test, as it appears something else can strip the CR. Still investigating.

@rc-swag
Copy link
Contributor Author

rc-swag commented Feb 19, 2024

@mcdurdin I have just been doing some logging around the contexts and finding some surprising results.

  1. AITIP::ReadContext is returning the string with just \r and no `\n' . I have followed the call stack to see if this what it gets back from the TSF API call.
  2. Not using AITIP::ReadContext. With the LDML Then reading the cores version of the app context it is not keeping track of new lines at all but include the characters before the enter was pressed.
  3. A kmn keyboard without ReadContext the context is cleared when the Enter Key is pressed.
    Note to self investigate: GetLeftOfSelection in kmtip\inserttext.cpp

Using Keyman Developer I was able to create a keyboard with \r\n rules and when testing the the keyboard in Developer in matched those rules. However developer is clearly not the same as Windows application.

@rc-swag
Copy link
Contributor Author

rc-swag commented Feb 19, 2024

And with that I guess this has to move back to draft

@rc-swag rc-swag marked this pull request as draft February 19, 2024 05:29
@mcdurdin
Copy link
Member

Which compliant apps are you testing with? I am guessing some apps will treat each paragraph as a separate context, but others may inline the \r\n.

New thought: ideally, we track what the app gives us in the input context, and give the same pattern back. So we have a function (serious pseudocode ahead warning):

const LBType = { none, '\n', '\r\n', '\r' };
function normalizeLineBreaks(&str): LBType {
  // find first line break
  m = /(\r\n|\r|\n)/.match(str);
  if(!m) {
   return none;
  }
  str = str.replaceAll(/(\r\n|\r|\n)/g, '\n');
  return m.firstMatch;
}

function restoreLineBreaks(str, tp:LBType, defaulttp:LBType): string {
  if(tp == none) tp = defaulttp; // use platform default for line breaks if context had no line breaks
  if(tp == none) return str; // if platform has no default and context had no line breaks, do nothing
  return str.replaceAll(/\n/g, tp); // replace line breaks
}

And, this could even be buried in Core, ideally, so that the Engine doesn't need to do this legwork. We could pass defaultLineBreakType in as a parameter to Core...

Seems like too much for 17.0 though!

@rc-swag
Copy link
Contributor Author

rc-swag commented Feb 26, 2024

And, this could even be buried in Core, ideally, so that the Engine doesn't need to do this legwork. We could pass defaultLineBreakType in as a parameter to Core...

Seems like too much for 17.0 though!

This is putting a more of platform knowledge into the core, but it is ok as it we keep it away from the core logic of the processors. I assume when we use the core context to load the updated app context (stripping out the markers) we also run restoreLineBreaks

This would be more robust to handling any nuances from different platforms and apps. for example would also handle any MacOS apps that also used just \r I would still like to know why the Windows ReadContext has only `\r'.

The platforms would need to make sure to set the platform default line break when loading the keyboard state. This would be straightforward with the core environment array already part of km_core_state_create.

@rc-swag rc-swag modified the milestones: B17S2, 18.0 Feb 26, 2024
@rc-swag
Copy link
Contributor Author

rc-swag commented Mar 6, 2024

I have implemented the normalize/ restore algorithm but not in core at this late stage as it would effect all platforms.
I have created a issue #10954 to move it to core in version 18.0

@keymanapp-test-bot keymanapp-test-bot bot added the user-test-required User tests have not been completed label Mar 6, 2024
@rc-swag rc-swag marked this pull request as ready for review March 6, 2024 05:01
@rc-swag rc-swag requested a review from mcdurdin March 6, 2024 05:02
@bharanidharanj
Copy link

Test Results

  • TEST_NEWLINE_LF_ONLY (FAILED): Tested with the attached PR build (Keyman ) on Windows 10 OS and here is my observation: 1. Installed NewLine keyboard. 2. Opened WordPad document. 3. Switched to NewLine keyboard. 4. Typed as per the instructions. Verified it is showing the expected result. 5. Opened Microsoft office 365 Word online document. 6. Switched to NewLine Keyboard. 7. Typed as per instructions. Noticed that it is showing only the typed letters. Seems to be an issue.

..in WordPad document

..in Microsoft Office 365 word online document

@keymanapp-test-bot keymanapp-test-bot bot added user-test-failed and removed user-test-required User tests have not been completed labels Mar 6, 2024
@rc-swag
Copy link
Contributor Author

rc-swag commented Mar 8, 2024

@bharanidharanj The testing of Word needs to be for an installed desktop version. It can be 365 but still needs to be an installed version. It failed the test because you used the online version in a web browser that doesn't support the Text Services Framework(TSF). Which is needed for this test. Do you have a test machine with Word installed? If not let me know and I will modify the test to suit.

@keymanapp-test-bot retest TEST_NEWLINE_LF_ONLY

@keymanapp-test-bot keymanapp-test-bot bot added user-test-required User tests have not been completed and removed user-test-failed labels Mar 8, 2024
@bharanidharanj
Copy link

bharanidharanj commented Mar 8, 2024

@bharanidharanj The testing of Word needs to be for an installed desktop version. It can be 365 but still needs to be an installed version. It failed the test because you used the online version in a web browser that doesn't support the Text Services Framework(TSF). Which is needed for this test. Do you have a test machine with Word installed? If not let me know and I will modify the test to suit.

@keymanapp-test-bot retest TEST_NEWLINE_LF_ONLY

@rc-swag I think it would be better to modify the test to suit. Thanks.

@darcywong00 darcywong00 modified the milestones: B17S3, B17S4 Mar 16, 2024
@rc-swag
Copy link
Contributor Author

rc-swag commented Mar 16, 2024

@rc-swag I think it would be better to modify the test to suit. Thanks.

@bharanidharanj I have updated the TEST_NEWLINE_LF_ONLY test to only have Wordpad and added another for TEST_NEWLINE_CR_ONLY with Firefox so please do this tests

@bharanidharanj
Copy link

@rc-swag I think it would be better to modify the test to suit. Thanks.

@bharanidharanj I have updated the TEST_NEWLINE_LF_ONLY test to only have Wordpad and added another for TEST_NEWLINE_CR_ONLY with Firefox so please do this tests

@rc-swag Okay, Thanks. I will do it.

@bharanidharanj
Copy link

Test Results

  • TEST_NEWLINE_LF_ONLY (PASSED): Retested as per Ross's suggestion in a WordPad document and here is my observation: 1. Followed the steps as mentioned in the test description. 2. Verified that 'Row If' text appears on the document after pressing the Enter key.

  • TEST_NEWLINE_CR_ONLY (PASSED): Tested with the attached PR build (keyman version 17.0.285-beta-test-10697) and here is my observation: 1. Opened the 'editpad' website. 2. Created a new document. 3. Followed step (iv) and verified that the text 'ROW If' appeared in the document. Seems to be working as expected.

@keymanapp-test-bot keymanapp-test-bot bot removed the user-test-required User tests have not been completed label Mar 18, 2024
@rc-swag rc-swag dismissed mcdurdin’s stale review March 20, 2024 01:57

Testing showed a different approach was required and has since been implemented.

Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am somewhat uncomfortable about this PR because it makes significant string handling changes very late in the beta cycle. I think we may need some more exercising of the boundary conditions, and I think we should be able to avoid truncation of buffers with a small rework per my notes?

Comment on lines +221 to +222
size_t buf_length = wcslen(win_out_str) * 2;
LPWSTR buf_string = new WCHAR[buf_length];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't allow space for final terminating nul if every char is doubled

Comment on lines +227 to +228
while (*in_ptr != L'\0') {
if (*in_ptr == '\n') {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not consistently using L prefix, here and elsewhere in this PR

break;
case lbCR:
*buf_ptr++ = '\r';
*in_ptr++;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
*in_ptr++;
in_ptr++;

Comment on lines +253 to +254
auto diff = final_length + 1 - win_out_size_in_chars; // +1 for null termination
wcscpy_s(win_out_str, win_out_size_in_chars, buf_string + diff);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could break a surrogate pair

}
*buf_ptr = '\0'; // Null terminate the modified string

// may now need to truncate the string preserving the end closest the caret.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of doing this, couldn't we just return the longer buffer?


BOOL
context_char32_char16(const km_core_usv *core_output, LPWSTR win_out_str, uint32_t win_output_size_in_char) {
if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char <= 0) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char <= 0) {
if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char == 0) {

uint32_t cannot be <0

Comment on lines +279 to +285
auto final_length = wcsnlen_s(buf, buf_length);
if (final_length < win_output_size_in_char) {
wcscpy_s(win_out_str, win_output_size_in_char, buf);
} else {
auto diff = final_length + 1 - win_output_size_in_char; // +1 for null termination
wcscpy_s(win_out_str, win_output_size_in_char, buf + diff);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same risk of splitting a surrogate pair

@@ -78,3 +78,58 @@ TEST(AppContext, AppContext_Delete) {
testContext.Delete();

}

// Test normalize and restore line break functions
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need some tests for surrogate pair edge cases, especially boundary conditions

@rc-swag
Copy link
Contributor Author

rc-swag commented Mar 20, 2024

I am somewhat uncomfortable about this PR because it makes significant string handling changes very late in the beta cycle.

Based on this I am going to close this and update the issue #10471 for 18.0 but for the algorithm to be in the core (and to check developer debugger). That is where I ultimately wanted it to live and it would then cover linux and macos also. This was just to get a reference solution over the line for the LDML and version 17. With the buffer in the core truncation I would have done it differently however when I looked at it was just kicking the problem down the road because it had truncation in there anyway around a defined MAX_CONTEXT. Anyway, for the core I can revisit it as it will sit on a different layer.
Finally, I am disappointed I missed the surrogates as that is one of the first edge cases I discovered in the Context::Buf method. when I started.

@rc-swag
Copy link
Contributor Author

rc-swag commented Mar 20, 2024

This will be moved to version 18.0 as it is too late in the beta cycle. The logic will also be contained in the core.

@rc-swag rc-swag closed this Mar 20, 2024
@rc-swag rc-swag deleted the feat/windows/10471/preprocess-u000d-u000A branch March 20, 2024 22:42
@mcdurdin
Copy link
Member

Agree that this can wait until 18.0; if necessary we can back-port to 17.0 stable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

feat(windows): core needs to preprocess U+000D U+000A to U+000A in context before any other processing
4 participants