You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As experienced in #74, there is room for improvement as it relates to how fingerprint regular expression flags are handled.
Today, the flags attribute is parsed in such a way that fingerprint authors can use multiple different formats for regular expression flags, and that during recog testing time, the variety of ways in which the flags are specified are normalized to something that Recog, which is written in Ruby, can understand. The issue with this, as we saw in #74, is that if we support multiple methods of specifying the same flag (in this case, REG_MULTILINE is the Java/GNU/perl friendly way and MULTILINE was the Ruby friendly way (per http://ruby-doc.org/core-2.1.1/Regexp.html#method-i-options)), any product that consumes Recog content must also have the same support for multiple methods of specifying options, or Recog itself needs to provide that mechanism.
Further proof of this is still lurking in IGNORECASE, which would still be allowed in current Recog but would break any Java/GNU/Perl implementation because REG_ICASE is the preferred method.
I am thinking we should do one of the following:
Pick 1 set of regular expression flags, per the TODO from http://www.rubydoc.info/gems/recog/2.0.7/Recog/Fingerprint/RegexpFactory, such that products consuming recog are responsible for translating Recog's options. Ensure that the Recog tests will properly catch a fingerprint with bad flags.
Ditch flags altogether and require that any regular expression "options" be specified in the regular expression itself. For example, rather than pattern="foo" flags="REG_ICASE", use pattern="(?i:foo)". We can automatically convert all of the existing fingerprints with some simple Ruby Regexp code that computes the new pattern with Regexp.new(old_pattern, old_flags).to_s
The text was updated successfully, but these errors were encountered:
As experienced in #74, there is room for improvement as it relates to how fingerprint regular expression flags are handled.
Today, the
flags
attribute is parsed in such a way thatfingerprint
authors can use multiple different formats for regular expression flags, and that during recog testing time, the variety of ways in which the flags are specified are normalized to something that Recog, which is written in Ruby, can understand. The issue with this, as we saw in #74, is that if we support multiple methods of specifying the same flag (in this case,REG_MULTILINE
is the Java/GNU/perl friendly way andMULTILINE
was the Ruby friendly way (per http://ruby-doc.org/core-2.1.1/Regexp.html#method-i-options)), any product that consumes Recog content must also have the same support for multiple methods of specifying options, or Recog itself needs to provide that mechanism.Further proof of this is still lurking in
IGNORECASE
, which would still be allowed in current Recog but would break any Java/GNU/Perl implementation becauseREG_ICASE
is the preferred method.I am thinking we should do one of the following:
fingerprint
with badflags
.flags
altogether and require that any regular expression "options" be specified in the regular expression itself. For example, rather thanpattern="foo" flags="REG_ICASE"
, usepattern="(?i:foo)"
. We can automatically convert all of the existing fingerprints with some simple RubyRegexp
code that computes the new pattern withRegexp.new(old_pattern, old_flags).to_s
The text was updated successfully, but these errors were encountered: