Just another day at the office, you write a .NET Regex like a boss,
and suddenly realize that you need to declare, say, a non-capturing group.
It's (?:pattern)
, right? Wait, or was it (?=pattern)
? No no, (?=pattern)
must be a positive lookahead or something. But if (?<=pattern)
is a positive lookbehind,
then maybe positive lookahead would be (?>=pattern)
?
"Aaargh! Now where's that Regex cheat sheet?.." And make sure to share it with your five colleagues who might be maintaining this code later. Also, remember to use comments inside the Regex pattern, and maybe a few third-party tools to be sure what the expression does.
"How did it come to this?"
Inspired by the Expression Trees feature in .NET, the RegexBuilder library provides a more verbose but more human-readable way of declaring regular expressions, using a language friendly to the .NET world instead of two lines of cryptic mess.
When it might be useful:
- When the expressions are complex and might be frequently changed.
- When you can tolerate 20 lines of understandable code instead of 1 hardly understandable.
- If you can spare a bit of CPU time and memory for constructing the Regex object for the sake of readability.
Let's say you want to make a simple HTML parser
and capture the value of every href
attribute from hyperlinks, like shown in the MSDN example.
The usual way:
Regex hrefRegex = new Regex("href\\s*=\\s*(?:[\"'](?<Target>[^\"']*)[\"']|(?<Target>\\S+))", RegexOptions.IgnoreCase);
With RegexBuilder:
const string quotationMark = "\"";
Regex hrefRegex = RegexBuilder.Build
(
RegexOptions.IgnoreCase,
// Regex structure declaration
RegexBuilder.Literal("href"),
RegexBuilder.MetaCharacter(RegexMetaChars.WhiteSpace, RegexQuantifier.ZeroOrMore),
RegexBuilder.Literal("="),
RegexBuilder.MetaCharacter(RegexMetaChars.WhiteSpace, RegexQuantifier.ZeroOrMore),
RegexBuilder.Alternate
(
RegexBuilder.Concatenate
(
RegexBuilder.NonEscapedLiteral(quotationMark),
RegexBuilder.Group
(
"Target",
RegexBuilder.NegativeCharacterSet(quotationMark, RegexQuantifier.ZeroOrMore)
),
RegexBuilder.NonEscapedLiteral(quotationMark)
),
RegexBuilder.Group
(
"Target",
RegexBuilder.MetaCharacter(RegexMetaChars.NonwhiteSpace, RegexQuantifier.OneOrMore)
)
)
);
See CustomRegexTests.cs for more examples.
RegexBuilder currently supports all regular expression language elements except substitution/replacement patterns.
The following elements are supported:
- Quantifiers
- Character escapes
- Character classes
- Anchors (atomic zero-width assertions)
- Grouping constructs
- Backreference constructs
- Alternation constructs
- Inline options and comments
Add a reference to YuriyGuts.RegexBuilder.dll in your project manually, or use NuGet Package Manager:
PM> Install-Package RegexBuilder
There are 3 classes you'll need. They all expose their functionality via static members and work statelessly.
RegexBuilder
: a factory class that produces and glues together different parts of a regular expression.RegexQuantifier
: produces quantifiers (?
,+
{4,}
, etc.) for regex parts that support them.RegexMetaChars
: named constants for character classes (word boundary, whitespace, tab, etc.).
Start with var regex = RegexBuilder.Build(...);
and replace ...
with the parts of your regular expression
by calling the corresponding methods of RegexBuilder
.
RegexBuilder uses MSTest for unit testing. To run or add unit tests in Visual Studio, please see the YuriyGuts.RegexBuilder.Tests
project.
The YuriyGuts.RegexBuilder.TestApp
project is a console application that can be used as a temporary testing workbench.
The source code is licensed under The MIT License.