[ fixed #119 ] latin1 encoding: each byte counts as 1 char #156
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The computation of the length component of
AlexToken
was tailored tothe utf8 encoding, and didn't work correctly for latin1.
This is fixed by having a new flag
ALEX_LATIN1
intemplates/GenericTemplate.hs
that turns on code that increases thelength by 1 for each byte, while for utf8 something more sophisticated
is done.
The fix requires more template instances to be generated. To streamline
the instance generation, now all 2^4 = 16 template instances are
generated for the 4 flags
To ensure consistent reference to the template instance, a function
residing both in
src/Main
andgen-alex-sdist/Main
needs to be keptconsistent, should more dimensions be added to the template.
(Putting this function into a separate file that is included by both
modules could be an option, but seemed not enough in the spirit of
cabal-organized projects.)