clap
has been updated to version 4.0. The help output bygrex -h
now looks a little different.
- A bug in the grapheme segmentation was fixed that caused test cases which contain backslashes to produce incorrect regular expressions.
- The library can now be compiled to WebAssembly and be used in any JavaScript project. (#82)
- The supported character set for regular expression generation has been updated to the current Unicode Standard 14.0.
structopt
has been replaced withclap
providing much nicer help output for the command-line tool.
- The regular expression generation performance has been significantly improved, especially for generating very long expressions from a large set of test cases. This has been accomplished by reducing the number of memory allocations, removing deprecated code and applying several minor optimizations.
- Several bugs have been fixed that caused incorrect expressions to be generated in rare cases.
- anchors can now be disabled so that the generated expression can be used as part of a larger one (#30)
- the command-line tool can now be used within Unix pipelines (#45)
- Additional methods have been added to
RegExpBuilder
in order to replace the enumFeature
and make the library API more consistent. (#47)
- Under rare circumstances, the conversion of repetitions did not work. This has been fixed. (#36)
- verbose mode is now supported with the
--verbose
flag to produce regular expressions which are easier to read (#17)
- case-insensitive matching regexes are now supported with the
--ignore-case
command-line flag or withFeature::CaseInsensitivity
in the library (#23) - non-capturing groups are now the default; capturing groups can be enabled with the
--capture-groups
command-line flag or withFeature::CapturingGroup
in the library (#15) - a lower bound for the conversion of repeated substrings can now be set by specifying
--min-repetitions
and--min-substring-length
or using the library methodsRegExpBuilder.with_minimum_repetitions()
andRegExpBuilder.with_minimum_substring_length()
(#10) - test cases can now be passed from a file within the library as well using
RegExpBuilder::from_file()
(#13)
- the rules for the conversion of test cases to shorthand character classes have been updated to be compliant to the newest Unicode Standard 13.0 (#21)
- the dependency on the unmaintained linked-list crate has been removed (#24)
- test cases starting with a hyphen are now correctly parsed on the command-line (#12)
- the common substring detection algorithm now uses optionality expressions where possible instead of redundant union operations (#22)
- new unit tests, integration tests and property tests have been added
- conversion to character classes
\d
,\D
,\s
,\S
,\w
,\W
is now supported - repetition detection now works with arbitrarily nested expressions. Input strings such as
aaabaaab
which were previously converted to^(aaab){2}$
are now converted to^(a{3}b){2}$
. - optional syntax highlighting for the produced regular expressions can now be enabled using the
--colorize
command-line flag or with the library methodRegExpBuilder.with_syntax_highlighting()
- new unit tests, integration tests and property tests have been added
- new property tests have been added that revealed new bugs
- entire rewrite of the repetition detection algorithm
- the former algorithm produced wrong regular expressions or even panicked for certain test cases
- property tests have been added using the proptest crate
- big thanks go to Christophe Biocca for pointing me to the concept of property tests in the first place and for writing an initial implementation of these tests
- some regular expression specific characters were not escaped correctly in the generated expression
- expressions consisting of a single alternation such as
^(abc|xyz)$
were missing the outer parentheses. This caused an erroneous match of strings such asabc123
or456xyz
because of precedence rules. - the created DFA was wrong for repetition conversion in some corner cases. The input
a, aa, aaa, aaaa, aaab
previously returned the expression^a{1,4}b?$
which erroneously matchesaaaab
. Now the correct expression^(a{3}b|a{1,4})$
is returned.
- some minor documentation updates
- grex is now also available as a library
- escaping of non-ascii characters is now supported with the
-e
flag - astral code points can be converted to surrogate with the
--with-surrogates
flag - repeated non-overlapping substrings can be converted to
{min,max}
quantifier notation using the-r
flag
- many many many bug fixes :-O
- character classes are now supported
- input strings can now be read from a text file
- unicode characters are not escaped anymore by default
- the performance of the DFA minimization algorithm has been improved for large DFAs
- regular expressions are now always surrounded by anchors
^
and$
- fixed a bug that caused a panic when giving an empty string as input
This is the very first release of grex. It aims at simplifying the construction of regular expressions based on matching example input.
- literals
- detection of common prefixes and suffixes
- alternation using
|
operator - optionality using
?
quantifier - concatenation of all of the former