feat(windows): handle CRLF specific to windows #10697

rc-swag · 2024-02-13T08:26:04Z

Added a function to replace \r\n with \n that is U+000D U+000A with U+000A. And another to do the reverse, that is insert \r in front of any \n characters to support the CRLF for Windows. Supporting unit tests were also added.

This PR took a change after my first commit. Initially, I had nice symmetrical pre_process_context post_process_context pair. Then I realised I needed to deal with the UTF-32 encoding. It was more efficient just to process that and insert the CR at the same time, otherwise we a just adding another parse and making an extra buffer we don't need. I have written comments in the code that when if we remove the action queue then we can then make it more

User Testing

TEST_HIEROGLYPHIC_WINDOWS
1. Install Hieroglyphic Keyboard from keyman.com into Keyman for Windows.
2. Open Wordpad.
3. Select the Hieroglyphic keyboard
4. In wordpad, type: wdispacebarspacebar
5. Expected Output : w𓂞
6. Type:backspace
7. Expected Output: w
TEST_NEWLINE_LF_ONLY
Install the PR build of Keyman
1. Install the attached newline keyboard newline.zip
2. Open Wordpad.
3. In wordpad, type: aEnterc
4. Expected Output : Row lf
TEST_NEWLINE_CR_ONLY
Install the PR build of Keyman
1. Install the attached newline keyboard newline.zip if not installed already
2. Open Firefox Browser (It has to be Firefox as it supports TSF)
3. In the bowser type url bar type https://www.editpad.org/
4. In new edit document type: aEnterc
5. Expected Output : Row lf

Added a function to replace `\r\n` with `\n` that is U+000D U+000A with U+000A. And another to do the reverse that is insert '\r` infront of any `\n` charcters to support the CRLF for windows. Supporting unit tests also added. The function have been integerated into the code and are used before the set_if_need Keyman Core call and then in the processing of actions returned from the Core.

keymanapp-test-bot · 2024-02-13T08:26:10Z

User Test Results

Test specification and instructions

✅ TEST_HIEROGLYPHIC_WINDOWS (PASSED): Tested with the attached PR build (17.0.266-alpha-test-10697) on Windows 11 OS and here is my observation: 1. Opened Wordpad app. 2. Typed 'wdi spacebar spacebar' and verified that it showed the expected output.3. Pressed backspace key and showed the expected output. Seems to be working fine. (notes)
✅ TEST_NEWLINE_LF_ONLY (PASSED): Retested as per Ross's suggestion in a WordPad document and here is my observation: 1. Followed the steps as mentioned in the test description. 2. Verified that 'Row If' text appears on the document after pressing the Enter key. (notes)
✅ TEST_NEWLINE_CR_ONLY (PASSED): Tested with the attached PR build (keyman version 17.0.285-beta-test-10697) and here is my observation: 1. Opened the 'editpad' website. 2. Created a new document. 3. Followed step (iv) and verified that the text 'ROW If' appeared in the document. Seems to be working as expected. (notes)

Test Artifacts

Windows

bharanidharanj · 2024-02-16T11:34:10Z

Test Results

TEST_HIEROGLYPHIC_WINDOWS (PASSED): Tested with the attached PR build (17.0.266-alpha-test-10697) on Windows 11 OS and here is my observation: 1. Opened Wordpad app. 2. Typed 'wdi spacebar spacebar' and verified that it showed the expected output.3. Pressed backspace key and showed the expected output. Seems to be working fine.

..Result 1

..Result 2

mcdurdin · 2024-02-17T07:44:24Z

The user tests do not appear to test anything relating to CRLF. Is that something we can do?

mcdurdin

This LGTM. Would be good to have a user test for the CRLF scenarios. Minor nits otherwise.

windows/src/engine/keyman32/appcontext.h

windows/src/engine/keyman32/appcontext.cpp

windows/src/engine/keyman32/kmprocessactions.cpp

rc-swag · 2024-02-18T23:57:10Z

The user tests do not appear to test anything relating to CRLF. Is that something we can do?

Correct the current test was good regression to make sure nothing was broken. I was working on using a ldml keyboard with rules for the CRLF test. However, it seemed to fail but the debugger and logging was giving some "interesting" results, and proving a bit tricky to test, as it appears something else can strip the CR. Still investigating.

rc-swag · 2024-02-19T04:48:41Z

@mcdurdin I have just been doing some logging around the contexts and finding some surprising results.

AITIP::ReadContext is returning the string with just \r and no `\n' . I have followed the call stack to see if this what it gets back from the TSF API call.
Not using AITIP::ReadContext. With the LDML Then reading the cores version of the app context it is not keeping track of new lines at all but include the characters before the enter was pressed.
A kmn keyboard without ReadContext the context is cleared when the Enter Key is pressed.
Note to self investigate: GetLeftOfSelection in kmtip\inserttext.cpp

Using Keyman Developer I was able to create a keyboard with \r\n rules and when testing the the keyboard in Developer in matched those rules. However developer is clearly not the same as Windows application.

rc-swag · 2024-02-19T05:29:20Z

And with that I guess this has to move back to draft

mcdurdin · 2024-02-22T02:49:26Z

Which compliant apps are you testing with? I am guessing some apps will treat each paragraph as a separate context, but others may inline the \r\n.

New thought: ideally, we track what the app gives us in the input context, and give the same pattern back. So we have a function (serious pseudocode ahead warning):

const LBType = { none, '\n', '\r\n', '\r' };
function normalizeLineBreaks(&str): LBType {
  // find first line break
  m = /(\r\n|\r|\n)/.match(str);
  if(!m) {
   return none;
  }
  str = str.replaceAll(/(\r\n|\r|\n)/g, '\n');
  return m.firstMatch;
}

function restoreLineBreaks(str, tp:LBType, defaulttp:LBType): string {
  if(tp == none) tp = defaulttp; // use platform default for line breaks if context had no line breaks
  if(tp == none) return str; // if platform has no default and context had no line breaks, do nothing
  return str.replaceAll(/\n/g, tp); // replace line breaks
}

And, this could even be buried in Core, ideally, so that the Engine doesn't need to do this legwork. We could pass defaultLineBreakType in as a parameter to Core...

Seems like too much for 17.0 though!

rc-swag · 2024-02-26T00:41:27Z

And, this could even be buried in Core, ideally, so that the Engine doesn't need to do this legwork. We could pass defaultLineBreakType in as a parameter to Core...

Seems like too much for 17.0 though!

This is putting a more of platform knowledge into the core, but it is ok as it we keep it away from the core logic of the processors. I assume when we use the core context to load the updated app context (stripping out the markers) we also run restoreLineBreaks

This would be more robust to handling any nuances from different platforms and apps. for example would also handle any MacOS apps that also used just \r I would still like to know why the Windows ReadContext has only `\r'.

The platforms would need to make sure to set the platform default line break when loading the ~~keyboard~~ state. This would be straightforward with the core environment array already part of km_core_state_create.

rc-swag · 2024-03-06T05:00:52Z

I have implemented the normalize/ restore algorithm but not in core at this late stage as it would effect all platforms.
I have created a issue #10954 to move it to core in version 18.0

bharanidharanj · 2024-03-06T11:07:26Z

Test Results

TEST_NEWLINE_LF_ONLY (FAILED): Tested with the attached PR build (Keyman ) on Windows 10 OS and here is my observation: 1. Installed NewLine keyboard. 2. Opened WordPad document. 3. Switched to NewLine keyboard. 4. Typed as per the instructions. Verified it is showing the expected result. 5. Opened Microsoft office 365 Word online document. 6. Switched to NewLine Keyboard. 7. Typed as per instructions. Noticed that it is showing only the typed letters. Seems to be an issue.

..in WordPad document

..in Microsoft Office 365 word online document

rc-swag · 2024-03-08T04:46:16Z

@bharanidharanj The testing of Word needs to be for an installed desktop version. It can be 365 but still needs to be an installed version. It failed the test because you used the online version in a web browser that doesn't support the Text Services Framework(TSF). Which is needed for this test. Do you have a test machine with Word installed? If not let me know and I will modify the test to suit.

@keymanapp-test-bot retest TEST_NEWLINE_LF_ONLY

…s-u000d-u000A

bharanidharanj · 2024-03-08T06:58:45Z

@bharanidharanj The testing of Word needs to be for an installed desktop version. It can be 365 but still needs to be an installed version. It failed the test because you used the online version in a web browser that doesn't support the Text Services Framework(TSF). Which is needed for this test. Do you have a test machine with Word installed? If not let me know and I will modify the test to suit.

@keymanapp-test-bot retest TEST_NEWLINE_LF_ONLY

@rc-swag I think it would be better to modify the test to suit. Thanks.

rc-swag · 2024-03-16T22:16:53Z

@rc-swag I think it would be better to modify the test to suit. Thanks.

@bharanidharanj I have updated the TEST_NEWLINE_LF_ONLY test to only have Wordpad and added another for TEST_NEWLINE_CR_ONLY with Firefox so please do this tests

bharanidharanj · 2024-03-18T09:56:13Z

@rc-swag I think it would be better to modify the test to suit. Thanks.

@bharanidharanj I have updated the TEST_NEWLINE_LF_ONLY test to only have Wordpad and added another for TEST_NEWLINE_CR_ONLY with Firefox so please do this tests

@rc-swag Okay, Thanks. I will do it.

bharanidharanj · 2024-03-18T10:20:20Z

Test Results

TEST_NEWLINE_LF_ONLY (PASSED): Retested as per Ross's suggestion in a WordPad document and here is my observation: 1. Followed the steps as mentioned in the test description. 2. Verified that 'Row If' text appears on the document after pressing the Enter key.
TEST_NEWLINE_CR_ONLY (PASSED): Tested with the attached PR build (keyman version 17.0.285-beta-test-10697) and here is my observation: 1. Opened the 'editpad' website. 2. Created a new document. 3. Followed step (iv) and verified that the text 'ROW If' appeared in the document. Seems to be working as expected.

Testing showed a different approach was required and has since been implemented.

mcdurdin

I am somewhat uncomfortable about this PR because it makes significant string handling changes very late in the beta cycle. I think we may need some more exercising of the boundary conditions, and I think we should be able to avoid truncation of buffers with a small rework per my notes?

mcdurdin · 2024-03-20T09:38:44Z

windows/src/engine/keyman32/appcontext.cpp

+  size_t buf_length = wcslen(win_out_str) * 2;
+  LPWSTR buf_string = new WCHAR[buf_length];


This doesn't allow space for final terminating nul if every char is doubled

mcdurdin · 2024-03-20T09:39:10Z

windows/src/engine/keyman32/appcontext.cpp

+  while (*in_ptr != L'\0') {
+    if (*in_ptr == '\n') {


Not consistently using L prefix, here and elsewhere in this PR

mcdurdin · 2024-03-20T09:40:08Z

windows/src/engine/keyman32/appcontext.cpp

+          break;
+        case lbCR:
+          *buf_ptr++ = '\r';
+          *in_ptr++;


Suggested change

*in_ptr++;

in_ptr++;

mcdurdin · 2024-03-20T09:41:11Z

windows/src/engine/keyman32/appcontext.cpp

+    auto diff = final_length + 1 - win_out_size_in_chars;  // +1 for null termination
+    wcscpy_s(win_out_str, win_out_size_in_chars, buf_string + diff);


This could break a surrogate pair

mcdurdin · 2024-03-20T09:41:36Z

windows/src/engine/keyman32/appcontext.cpp

+  }
+  *buf_ptr = '\0';  // Null terminate the modified string
+
+  // may now need to truncate the string preserving the end closest the caret.


Instead of doing this, couldn't we just return the longer buffer?

mcdurdin · 2024-03-20T09:42:02Z

windows/src/engine/keyman32/appcontext.cpp

+
+BOOL
+context_char32_char16(const km_core_usv *core_output, LPWSTR win_out_str, uint32_t win_output_size_in_char) {
+  if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char <= 0) {


Suggested change

if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char <= 0) {

if (core_output == nullptr || win_out_str == nullptr || win_output_size_in_char == 0) {

uint32_t cannot be <0

mcdurdin · 2024-03-20T09:43:06Z

windows/src/engine/keyman32/appcontext.cpp

+  auto final_length = wcsnlen_s(buf, buf_length);
+  if (final_length < win_output_size_in_char) {
+    wcscpy_s(win_out_str, win_output_size_in_char, buf);
+  } else {
+    auto diff = final_length + 1 - win_output_size_in_char;  // +1 for null termination
+    wcscpy_s(win_out_str, win_output_size_in_char, buf + diff);
+  }


Same risk of splitting a surrogate pair

mcdurdin · 2024-03-20T09:46:33Z

windows/src/engine/keyman32/tests/keyman-engine-tests/appinttests.cpp

@@ -78,3 +78,58 @@ TEST(AppContext, AppContext_Delete) {
  testContext.Delete();

 }
+
+// Test normalize and restore line break functions


I think we need some tests for surrogate pair edge cases, especially boundary conditions

rc-swag · 2024-03-20T22:13:50Z

I am somewhat uncomfortable about this PR because it makes significant string handling changes very late in the beta cycle.

Based on this I am going to close this and update the issue #10471 for 18.0 but for the algorithm to be in the core (and to check developer debugger). That is where I ultimately wanted it to live and it would then cover linux and macos also. This was just to get a reference solution over the line for the LDML and version 17. With the buffer in the core truncation I would have done it differently however when I looked at it was just kicking the problem down the road because it had truncation in there anyway around a defined MAX_CONTEXT. Anyway, for the core I can revisit it as it will sit on a different layer.
Finally, I am disappointed I missed the surrogates as that is one of the first edge cases I discovered in the Context::Buf method. when I started.

rc-swag · 2024-03-20T22:37:28Z

This will be moved to version 18.0 as it is too late in the beta cycle. The logic will also be contained in the core.

mcdurdin · 2024-03-20T23:14:44Z

Agree that this can wait until 18.0; if necessary we can back-port to 17.0 stable.

keymanapp-test-bot bot added the user-test-missing User tests have not yet been defined for the PR label Feb 13, 2024

keymanapp-test-bot bot added this to the B17S1 milestone Feb 13, 2024

github-actions bot added windows/ windows/engine/ feat labels Feb 13, 2024

rc-swag added 2 commits February 16, 2024 13:26

feat(windows): combined post_process with process_uniChar

3a4095b

feat(windows): fix enum snake case format

4d1cddb

keymanapp-test-bot bot added has-user-test user-test-required User tests have not been completed and removed user-test-missing User tests have not yet been defined for the PR labels Feb 16, 2024

rc-swag self-assigned this Feb 16, 2024

rc-swag marked this pull request as ready for review February 16, 2024 04:16

rc-swag requested a review from ermshiperete as a code owner February 16, 2024 04:16

rc-swag changed the base branch from master to beta February 16, 2024 05:31

keymanapp-test-bot bot removed the user-test-required User tests have not been completed label Feb 16, 2024

mcdurdin modified the milestones: B17S1, B17S2 Feb 17, 2024

mcdurdin previously approved these changes Feb 17, 2024

View reviewed changes

rc-swag marked this pull request as draft February 19, 2024 05:29

rc-swag modified the milestones: B17S2, 18.0 Feb 26, 2024

feat(windows): add normalize and restore line breaks

8cb395c

rc-swag mentioned this pull request Mar 6, 2024

feat(core): Move normalize and restore line break functionality to Keyman Core. #10954

Open

keymanapp-test-bot bot added the user-test-required User tests have not been completed label Mar 6, 2024

rc-swag marked this pull request as ready for review March 6, 2024 05:01

Merge branch 'beta' into feat/windows/10471/preprocess-u000d-u000A

b263c03

rc-swag requested a review from mcdurdin March 6, 2024 05:02

keymanapp-test-bot bot added user-test-failed and removed user-test-required User tests have not been completed labels Mar 6, 2024

keymanapp-test-bot bot added user-test-required User tests have not been completed and removed user-test-failed labels Mar 8, 2024

rc-swag added 2 commits March 8, 2024 15:31

chore(windows): Merge branch 'beta' into feat/windows/10471/preproces…

a865eff

…s-u000d-u000A

feat(windows): remove debug messages add notes about

c9bd1e8

darcywong00 modified the milestones: B17S3, B17S4 Mar 16, 2024

keymanapp-test-bot bot removed the user-test-required User tests have not been completed label Mar 18, 2024

mcdurdin requested changes Mar 20, 2024

View reviewed changes

rc-swag mentioned this pull request Mar 20, 2024

feat(windows): core needs to preprocess U+000D U+000A to U+000A in context before any other processing #10471

Open

rc-swag closed this Mar 20, 2024

rc-swag deleted the feat/windows/10471/preprocess-u000d-u000A branch March 20, 2024 22:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(windows): handle CRLF specific to windows #10697

feat(windows): handle CRLF specific to windows #10697

rc-swag commented Feb 13, 2024 •

edited by mcdurdin

Loading

keymanapp-test-bot bot commented Feb 13, 2024 •

edited

Loading

bharanidharanj commented Feb 16, 2024

mcdurdin commented Feb 17, 2024

mcdurdin left a comment

rc-swag commented Feb 18, 2024 •

edited

Loading

rc-swag commented Feb 19, 2024 •

edited

Loading

rc-swag commented Feb 19, 2024

mcdurdin commented Feb 22, 2024

rc-swag commented Feb 26, 2024

rc-swag commented Mar 6, 2024

bharanidharanj commented Mar 6, 2024

rc-swag commented Mar 8, 2024 •

edited

Loading

bharanidharanj commented Mar 8, 2024 •

edited

Loading

rc-swag commented Mar 16, 2024

bharanidharanj commented Mar 18, 2024

bharanidharanj commented Mar 18, 2024

mcdurdin left a comment

mcdurdin Mar 20, 2024

mcdurdin Mar 20, 2024

mcdurdin Mar 20, 2024

mcdurdin Mar 20, 2024

mcdurdin Mar 20, 2024

mcdurdin Mar 20, 2024

mcdurdin Mar 20, 2024

mcdurdin Mar 20, 2024

rc-swag commented Mar 20, 2024 •

edited by mcdurdin

Loading

rc-swag commented Mar 20, 2024

mcdurdin commented Mar 20, 2024

		size_t buf_length = wcslen(win_out_str) * 2;
		LPWSTR buf_string = new WCHAR[buf_length];

		auto diff = final_length + 1 - win_out_size_in_chars; // +1 for null termination
		wcscpy_s(win_out_str, win_out_size_in_chars, buf_string + diff);

	if (core_output == nullptr \|\| win_out_str == nullptr \|\| win_output_size_in_char <= 0) {
	if (core_output == nullptr \|\| win_out_str == nullptr \|\| win_output_size_in_char == 0) {

feat(windows): handle CRLF specific to windows #10697

feat(windows): handle CRLF specific to windows #10697

Conversation

rc-swag commented Feb 13, 2024 • edited by mcdurdin Loading

User Testing

keymanapp-test-bot bot commented Feb 13, 2024 • edited Loading

User Test Results

Test Artifacts

bharanidharanj commented Feb 16, 2024

Test Results

mcdurdin commented Feb 17, 2024

mcdurdin left a comment

Choose a reason for hiding this comment

rc-swag commented Feb 18, 2024 • edited Loading

rc-swag commented Feb 19, 2024 • edited Loading

rc-swag commented Feb 19, 2024

mcdurdin commented Feb 22, 2024

rc-swag commented Feb 26, 2024

rc-swag commented Mar 6, 2024

bharanidharanj commented Mar 6, 2024

Test Results

rc-swag commented Mar 8, 2024 • edited Loading

bharanidharanj commented Mar 8, 2024 • edited Loading

rc-swag commented Mar 16, 2024

bharanidharanj commented Mar 18, 2024

bharanidharanj commented Mar 18, 2024

Test Results

mcdurdin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

rc-swag commented Mar 20, 2024 • edited by mcdurdin Loading

rc-swag commented Mar 20, 2024

mcdurdin commented Mar 20, 2024

rc-swag commented Feb 13, 2024 •

edited by mcdurdin

Loading

keymanapp-test-bot bot commented Feb 13, 2024 •

edited

Loading

rc-swag commented Feb 18, 2024 •

edited

Loading

rc-swag commented Feb 19, 2024 •

edited

Loading

rc-swag commented Mar 8, 2024 •

edited

Loading

bharanidharanj commented Mar 8, 2024 •

edited

Loading

rc-swag commented Mar 20, 2024 •

edited by mcdurdin

Loading