Feature request: let's make this compatible with XPath's fn:matches and fn:replace #9

DrRataplan · 2019-12-18T13:21:32Z

The patterns used in XPath are very much alike the xs:patterns used in the XSD spec, with the following changes:

Flags

s: the dot-all flag
i: the case insensitive flag
m: the multiline flag
x: the whitespace flag

Partial matches and matching the beginning and end of the input

XPath adds the ^ and $ characters: https://www.w3.org/TR/xpath-functions-31/#matching-start-and-end.
The fn:matches should return true when only a substring of the input matches the pattern, there is basically an implicit .* before and after the query.

Captured subexpressions

https://www.w3.org/TR/xpath-functions-31/#captured-subexpressions I think this may be implemented by recording the start and end of an expression in the whynot program, and output those when the execution is done? Thankfully there is no eagerness involved, returning any match is sufficient.
Non capturing subexpressions, the (?:) variant.

Reluctant quantifiers

See https://www.w3.org/TR/xpath-functions-31/#reluctant-quantifiers.

We only have to be able to parse them for fn:matches,
actually resolving them only affects functions like fn:replace.

Backreferences

Parsing backreferences, this already makes sense to do, but just throw a readable not-supported error when we see them. They should not be syntax errors.
See https://www.w3.org/TR/xpath-functions-31/#back-references for more info. We could implement these backreferences by matching anything in them and filter out the different paths through the whynot execution path to see whether there was an actual match?

Unkown unicode blocks

https://www.w3.org/TR/xpath-functions-31/#unicode-block-names: A regular expression that uses a Unicode block name that is not defined in the version(s) of Unicode supported by the processor (for example \p{IsBadBlockName}) is deemed to be invalid. We could switch this on the language level.

I think it would be cool to implement these features in this library, enabled when a language option is passed to the compile function for example.

The text was updated successfully, but these errors were encountered:

bwrrp · 2019-12-20T15:05:45Z

Sounds great! Thanks for the overview and initial PR!

For backreferences and other difficult cases we can also consider adding an interpreter as an alternative to the whynot-based approach.

bwrrp · 2020-03-13T09:45:40Z

I just released 1.1.0 with your changes so far, that should be enough to implement fn:matches except for the missing flags. I think for those we can add a flags option for compile that sits next to the new language, which is simply a string containing the letters for flags to apply. It may make sense to try supporting them for both languages.

DrRataplan mentioned this issue Dec 18, 2019

Request for fn:matches FontoXML/fontoxpath#188

Closed

bwrrp mentioned this issue Jul 23, 2021

Adding support for fn:replace FontoXML/fontoxpath#400

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: let's make this compatible with XPath's fn:matches and fn:replace #9

Feature request: let's make this compatible with XPath's fn:matches and fn:replace #9

DrRataplan commented Dec 18, 2019 •

edited by bwrrp

Loading

bwrrp commented Dec 20, 2019

bwrrp commented Mar 13, 2020

Feature request: let's make this compatible with XPath's fn:matches and fn:replace #9

Feature request: let's make this compatible with XPath's fn:matches and fn:replace #9

Comments

DrRataplan commented Dec 18, 2019 • edited by bwrrp Loading

Flags

Partial matches and matching the beginning and end of the input

Captured subexpressions

Reluctant quantifiers

Backreferences

Unkown unicode blocks

bwrrp commented Dec 20, 2019

bwrrp commented Mar 13, 2020

DrRataplan commented Dec 18, 2019 •

edited by bwrrp

Loading