Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(common/models): move core trie-compilation code into common/models/templates 📖 #12083

Merged
merged 6 commits into from
Aug 12, 2024

Conversation

jahorton
Copy link
Contributor

@jahorton jahorton commented Aug 2, 2024

Note: this only moves Trie construction code, rather than TrieModel construction code.

  • Wordlist file loading, parsing, etc are still part of kmc-model.
  • This does not write the results out to a file or provide the TrieModel structure and additional parameters, such as punctuation, search term keying, etc.

In support of #10973.

At runtime - be it within the worker or within a separate process hosted by the mobile apps - we'll want a way to collate and/or compile user-specific words and names into a model that we can use for correction and prediction. The simplest such model is a wordlist... and we already know how to build those.

The key detail here is that we need Trie compilation code, at a minimum, to become common code. I believe we can no longer limit its use to within Developer.

@keymanapp-test-bot skip

}
trieCollator.sort();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still calls sortTrie, just via the object's member function. (See line 162 of trie-builder.ts.)

@jahorton jahorton marked this pull request as draft August 2, 2024 05:32
…on/models/templates

We'll likely want to dynamically build a Trie to represent user-specific entries made available by the active OS.  We'll then blend _that_ with the 'static' distributed model.
@jahorton jahorton force-pushed the refactor/common/models/trie-construction branch from f28f7b9 to 54bdf73 Compare August 2, 2024 05:54
@jahorton jahorton changed the base branch from refactor/common/models/trie-model to refactor/developer/build-trie August 2, 2024 05:54
@jahorton jahorton marked this pull request as ready for review August 2, 2024 05:57
* Unicode will never assign semantics to these characters, as they are
* intended to be used internally as sentinel values.
*/
const INTERNAL_VALUE = '\uFDD0';
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This exactly matches the value of SENTINEL_CODE_UNIT in common/models/templates, so I replaced all migrated instances of it with SENTINEL_CODE_UNIT.

@@ -88,12 +98,26 @@ export class TrieTraversal implements LexiconTraversal {
return traversal;
}

private sortNodeIfNeeded(node: Node) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found this by referencing the original original implementation:

// The following trie implementation has been (heavily) derived from trie-ing
// by Conrad Irwin.

// See: https://github.com/ConradIrwin/trie-ing/blob/df55d7af7068d357829db9e0a7faa8a38add1d1d/LICENSE

This may seem "pulled up a tad bit too early", but there's a reason. If we 'construct' the Trie while live, modifications can make part of the Trie unsorted. We want to track that and correct for that.

The followup then changes this block, integrating it within the Trie traversal structure. Pulling it in early like this allows for code comparison between the two approaches.

@jahorton
Copy link
Contributor Author

jahorton commented Aug 2, 2024

Keyman Core test failures:

13:07:10   [developer/src/kmc-model] Skipping dependency common/models/templates for build
13:07:12   
13:07:12   > build
13:07:12   > tsc -b
13:07:12   
13:07:14   src/build-trie.ts(4,29): error TS2307: Cannot find module '@keymanapp/models-templates' or its corresponding type declarations.
13:07:14   npm error Lifecycle script `build` failed with error:
13:07:14   npm error code 1
13:07:14   npm error path /var/lib/TeamCity/work/99b311828f4ee7c/keyman/developer/src/kmc-model
13:07:14   npm error workspace @keymanapp/kmc-model
13:07:14   npm error location /var/lib/TeamCity/work/99b311828f4ee7c/keyman/developer/src/kmc-model
13:07:14   npm error command failed
13:07:14   npm error command sh -c tsc -b

Yeah, that tracks. I probably forgot to set a dependency link.

@darcywong00 darcywong00 modified the milestones: A18S7, A18S8 Aug 2, 2024
Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One important dependency change required (otherwise I suspect this will break at runtime)

developer/src/kmc-model/build.sh Show resolved Hide resolved
Base automatically changed from refactor/developer/build-trie to epic/user-dict August 9, 2024 02:23
@jahorton jahorton requested a review from mcdurdin August 9, 2024 02:27
Copy link
Member

@mcdurdin mcdurdin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jahorton jahorton merged commit 85e0157 into epic/user-dict Aug 12, 2024
14 of 15 checks passed
@jahorton jahorton deleted the refactor/common/models/trie-construction branch August 12, 2024 02:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants