-
-
Notifications
You must be signed in to change notification settings - Fork 111
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(common): xml2js -> fast-xml-parser #12331
Conversation
- just the 'easy stuff' in this commit
- it compiles, but broken
- update numbers- skip all numbers
- reverted to fontsize being a string - updated the kvks schema to match current xml
- assert.doesNotThrow just truncates the exception with no benefit - assert early that there are no messages when checking for a load failure
User Test ResultsTest specification and instructions User tests are not required Test Artifacts
|
developer/src/common/web/utils/src/types/kpj/kpj-file-reader.ts
Outdated
Show resolved
Hide resolved
@SabineSIL FYI |
- need to use trimValues:false and a custom tagValueProcessor to preserve spaces Fixes: #12208
Making some good progress, it looks like the PUA issue has to do with NCR's, either not being par or being skipped if there's an error |
- difference in fast-xml-parser and xml2js handling here - part of the move to fast-xml-parser Fixes: #12208
The visual keyboard compiler was never finished in 17.0. This rewrites it to: 1. Use the kmxplus data rather than reading from xml directly 2. Fill in `visualkeyboard.header.kbdname` 3. Support modifiers 4. Handle encoded characters like `\u{1234}` 5. Handle string variables like `${one}`* Additional unit tests have been added to verify the behavior of the visual keyboard compiler in more detail. TODO-LDML: string variables appear to have a secondary bug -- they seem to be returning the string 'undefined'. I have disabled the related tests and will examine this separately, and enable those tests once fixed. TODO-LDML: we should probably add a compiler warning + unit test for `<layers formId="us"><layer id="base">`, because this pattern does not make sense: when using non-touch forms, the `<layer>` element should use `modifiers` attribute, and correspondingly, `modifiers` attribute should _not_ be used when `formId` is `touch`. Other fixes: 1. The LDML XML reader was relying on its input being a Node.js `Buffer` even though it was declared `Uint8Array`, as it implicitly used `Buffer.toString()` to do text conversion. (`Buffer` subclasses from `Uint8Array`). This breaks when using `Uint8Array` directly and means we had an implicit dependency on Node.js. See also #12331. 2. XML errors were not captured in the LDML XML reader. See also #12331. 3. The unused and unfinished touch-layout-compiler.ts and keymanweb-compiler.ts have been removed along with corresponding unit tests and fixtures. These are replaced by Core implementations; see #12291. Fixes: #12395
The visual keyboard compiler was never finished in 17.0. This rewrites it to: 1. Use the kmxplus data rather than reading from xml directly 2. Fill in `visualkeyboard.header.kbdname` 3. Support modifiers 4. Handle encoded characters like `\u{1234}` 5. Handle string variables like `${one}`* Additional unit tests have been added to verify the behavior of the visual keyboard compiler in more detail. * String variable tests will be enabled in next commit (which is a cherry-pick of #12404). Other fixes: 1. The LDML XML reader was relying on its input being a Node.js `Buffer` even though it was declared `Uint8Array`, as it implicitly used `Buffer.toString()` to do text conversion. (`Buffer` subclasses from `Uint8Array`). This breaks when using `Uint8Array` directly and means we had an implicit dependency on Node.js. See also #12331. 2. XML errors were not captured in the LDML XML reader. See also #12331. 3. The unused and unfinished touch-layout-compiler.ts and keymanweb-compiler.ts have been removed along with corresponding unit tests and fixtures. These will be replaced by Core implementations; see #12291. Fixes: #12395 Cherry-pick-of: #12402
turns out we didn't have a test for XML NCR in keyboards. Now we do. Fixes: #12208
I'd like to refactor the common parsing, etc as a followon PR after we make sure this is actually working due to the dependency change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started the review but have a significant change request which will, I hope, reduce the impact and scope. As it stands, I think the changes would have a number of negative cascading impacts on consumers of the data, and so I would prefer to have the data coming out of the parser match the current shape, so we don't have to change anything downstream from the parser.
@@ -2,7 +2,9 @@ import { MATCH_HEX_ESCAPE, CONTAINS_QUAD_ESCAPE, MATCH_QUAD_ESCAPE } from './con | |||
export { MATCH_HEX_ESCAPE, CONTAINS_QUAD_ESCAPE, MATCH_QUAD_ESCAPE }; | |||
|
|||
/** | |||
* xml2js will not place single-entry objects into arrays. Easiest way to fix | |||
* fast-xml-parser will not place single-entry objects into arrays. | |||
* We could set a list of items expected to be arrays, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this declarative approach viable? It seems like it would be cleaner
// may or may not have extended details | ||
if (o.code && o.line && o.msg) { | ||
return m(this.ERROR_InvalidXML, | ||
`Error reading XML at ${def(o.line)}:${def(o.col)}: ${def(o.code)} ${def(o.msg)}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
`Error reading XML at ${def(o.line)}:${def(o.col)}: ${def(o.code)} ${def(o.msg)}` | |
`Error reading XML at ${def(o.line)}${o.col ? ':' + def(o.col) : ''}: ${def(o.code)} ${def(o.msg)}` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to get a little too 'formatted' for translators so we need to be careful
"dependencies": { | ||
"fast-xml-parser": "^4.4.1" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't be in web-utils deps. web-utils is not part of Keyman Developer. It is a part of the KeymanWeb source base (see #12384 for its move to /web).
fast-xml-parser needs to be in package.json for:
- developer/src/common/web/utils
- developer/src/kmc-ldml
- developer/src/kmc-package
data = r as KPJFile; | ||
}); | ||
const parser = new XMLParser({ | ||
htmlEntities: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are not expecting html entities in a .kpj file -- it is pure XML
htmlEntities: true, | |
htmlEntities: false, |
const parser = new XMLParser({ | ||
htmlEntities: true, | ||
ignoreAttributes: false, // We'd like attributes, please | ||
attributeNamePrefix: '', // to avoid '@_' prefixes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's a good idea to mix attributes and children, can we use a '$' prefix for attributes? (same goes for .kvks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See later comment for bigger request that makes this suggestion null
ignoreAttributes: false, // We'd like attributes, please | ||
attributeNamePrefix: '', // to avoid '@_' prefixes | ||
numberParseOptions: { | ||
skipLike: /(?:)/, // parse numbers as strings |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I take it this somewhat awkward pattern leaves numbers as strings? Shame FXP doesn't have a numberParse: false
option!
}, | ||
}); | ||
const raw = parser.parse(file.toString()); | ||
delete raw['?xml']; // XML prologue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does option ignoreDeclaration
work?
_: string; | ||
$: { URL: string }; | ||
'#text': string; | ||
URL: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an API change which I am not excited about -- I would prefer to keep the same interfaces where possible. Yes, the original interface does inherit the shape of xml2js, but it's what we have now. We don't know if there are other consumers outside the code base. Even internally, you've missed, for example:
keyman/developer/src/kmc-package/src/compiler/windows-package-installer-compiler.ts
Line 232 in e67aed1
setupInf += `${str.$?.Name}=${str.$?.Value}\n`; |
(kpj.schema.json would also need updating if we continue with this change)
Does this work to match existing shape?
attributesGroupName: '$',
attributesNamePrefix: '',
ignoreAttributes: false,
textNodeName: '_',
If we do go this way, we don't need to update any schema.json files, nor any consumers of the XML-as-js, AFAICT.
Some seem to be positive changes, but make some sense, basically Yeah I think I can do it. Not this sprint or at least not this day though |
Oh for sure, but negative in that they require lots of changes to consumers of the API, which is costly. |
/** | ||
* relationship between this package and the related package, "related" is default if not specified | ||
*/ | ||
Relationship?: "deprecates"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
Relationship?: "deprecates"; | |
Relationship?: "deprecates"|"related"; |
if the default is "related"? (I didn't check if there are other possible values)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently only "deprecates"
is allowed:
keyman/common/schemas/kps/kps.xsd
Lines 108 to 109 in 4004449
<!-- "deprecates" is only valid value, if not present then means "related" --> | |
<xs:attribute name="Relationship" type="km-package-relationship" /> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. The comment in kps.xsd
makes that much clearer.
/** | ||
* relationship between this package and the related package, "related" is default if not specified | ||
*/ | ||
Relationship?: "deprecates"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/** | |
* relationship between this package and the related package, "related" is default if not specified | |
*/ | |
Relationship?: "deprecates"; | |
/** | |
* relationship between this package and the related package, if not specified it means "related" | |
*/ | |
Relationship?: "deprecates"; |
/** | ||
* relationship between this package and the related package, "related" is default if not specified | ||
*/ | ||
Relationship?: "deprecates"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, ok. The comment in kps.xsd
makes that much clearer.
did you mean to close this? |
Yes, in favor of one based on #12482 (I guess I could reopen this once sorted) |
OK, np. Please make sure you document when you do actions like this because I was quite confused! Shall I'll go ahead and delete the branch? |
Not yet because I am using it to work from! I will either reopen it or open a new one |
replace xml2js with fast-xml-parser (much better support)
Fixes: #12208
@keymanapp-test-bot skip