-
Notifications
You must be signed in to change notification settings - Fork 9
Representing character classes
isums (integer sums, in DFA-construction/sets.lisp
) are of the forms
Here are the POSIX character classes, and some definitions:
alnum: alpha ∪ digit
alpha: alpha
blank: {Space, Tab} *
cntrl: [0, 32) ∪ {127}
digit: digit
graph: Sigma \ (cntrl ∪ {Space}) *
lower: lower
print: graph ∪ {Space} *
punct: graph \ (alpha ∪ digit) * **
space: {Space, Return, Newline, Tab} *
upper: upper
xdigit: digit ∪ [A-Fa-f]
So the classes I can't represent as nice isums are alpha, digit, lower, and upper.
* These come from https://github.com/micromatch/posix-character-classes which is obviously not normative of POSIX.
** Is 💩 punctuation?
csums (character sums) are of the form
There are no explicit complements like with isums, but twiddling the class set allows for complements. For example,
To interpret the class part of a csum, a vector mapping character codes to their bit-set of classes may be used, and then (logbitp bit-set c)
may be used to test membership in