Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ fixed #119 ] latin1 encoding: each byte counts as 1 char #156

Merged
merged 1 commit into from
Jan 27, 2020

Conversation

andreasabel
Copy link
Member

The computation of the length component of AlexToken was tailored to
the utf8 encoding, and didn't work correctly for latin1.

This is fixed by having a new flag ALEX_LATIN1 in
templates/GenericTemplate.hs that turns on code that increases the
length by 1 for each byte, while for utf8 something more sophisticated
is done.

The fix requires more template instances to be generated. To streamline
the instance generation, now all 2^4 = 16 template instances are
generated for the 4 flags

  • ghc
  • latin1
  • nopred
  • debug

To ensure consistent reference to the template instance, a function

  templateFileName

residing both in src/Main and gen-alex-sdist/Main needs to be kept
consistent, should more dimensions be added to the template.

(Putting this function into a separate file that is included by both
modules could be an option, but seemed not enough in the spirit of
cabal-organized projects.)

The computation of the length component of AlexToken was tailored to
the utf8 encoding, and didn't work correctly for latin1.

This is fixed by having a new flag ALEX_LATIN1 in
templates/GenericTemplate.hs that turns on code that increases the
length by 1 for each byte, while for utf8 something more sophisticated
is done.

The fix requires more template instances to be generated.  To streamline
the instance generation, now all 2^4 = 16 template instances are
generated for the 4 flags

  - ghc
  - latin1
  - nopred
  - debug

To ensure consistent reference to the template instance, a function

  templateFileName

residing both in src/Main and gen-alex-sdist/Main needs to be kept
consistent, should more dimensions be added to the template.

(Putting this function into a separate file that is included by both
modules could be an option, but seemed not enough in the spirit of
cabal-organized projects.)
@simonmar
Copy link
Member

Nice. Thanks!

@mtolly
Copy link

mtolly commented Jan 10, 2021

Hi, it looks like this (and some other merges) were not included in the recent Alex 3.2.6 release. Understandable since it was a stopgap for a GHC release.

This fix to the Latin-1 mode would be helpful in order to fix a language-c (and thus c2hs) issue: visq/language-c#72

Any info on when a new release can happen with some of these PRs that have been merged since 3.2.5?

@Ericson2314
Copy link
Collaborator

Yes, I suppose I should release another now that GHC is finally using 3.2.5. I did want to finish #174 first, I guess I should get on that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants