Skip to content

0.11.1

Compare
Choose a tag to compare
@maciejhirsz maciejhirsz released this 23 Apr 19:00
· 217 commits to master since this release

Manual token disambiguation

Previously when two definitions computed could match the same input and were assigned the same priority, Logos would make an arbitrary choice about which token to produce. This behavior could produce unexpected results, it is therefore now considered a compile error, and will be reported as such.

Example

Consider two regexes, one matching [abc]+ while the other matches [cde]+. Both of those would have a computed priority of 1, and both could match any sequence of c.

Logos will now return a compile error with hints for solution in this case:

error: A definition of variant `Abc` can match the same input as another definition of variant `Cde`.

hint: Consider giving one definition a higher priority: #[regex(..., priority = 2)]
   --> tests/tests/edgecase.rs:410:17
    |
410 |         #[regex("[abc]+")]
    |                 ^^^^^^^^

error: A definition of variant `Cde` can match the same input as another definition of variant `Abc`.

hint: Consider giving one definition a higher priority: #[regex(..., priority = 2)]
   --> tests/tests/edgecase.rs:413:17
    |
413 |         #[regex("[cde]+")]
    |                 ^^^^^^^^

error: aborting due to 2 previous errors

Setting priority = 2 to either token will override the computed priority, allowing Logos to properly disambiguate the tokens.

Generic type parameters

Deriving Logos on an enum with type parameters like so:

    #[derive(Logos, Debug, PartialEq)]
    enum Token<S, N> {
        #[regex(r"[ \n\t\f]+", logos::skip)]
        #[error]
        Error,

        #[regex("[a-z]+")]
        Ident(S),

        #[regex("[0-9]+", |lex| lex.slice().parse())]
        Number(N)
    }

Will now produce following errors:

error: Generic type parameter without a concrete type

Define a concrete type Logos can use: #[logos(type S = Type)]
   --> tests/tests/edgecase.rs:339:16
    |
339 |     enum Token<S, N> {
    |                ^

error: Generic type parameter without a concrete type

Define a concrete type Logos can use: #[logos(type N = Type)]
   --> tests/tests/edgecase.rs:339:19
    |
339 |     enum Token<S, N> {
    |                   ^

It's now possible to define concrete types for the generic type parameters:

    #[derive(Logos, Debug, PartialEq)]
    #[logos(
        type S = &str,
        type N = u64,
    )]
    enum Token<S, N> {

This will derive the Logos trait for Token<&str, u64>. All reference types (like &str here) will automatically use the lifetime of the source.

Other changes:

  • It's now possible to define callbacks using callback = ... syntax inside #[regex(...)] and #[token(...)] attributes. This allows for callback and priority to be placed arbitrarily within the attribute. All of these are now legal and equivalent:
    #[regex("[abc]+", my_callback, priority = 10)]
    #[regex("[abc]+", callback = my_callback, priority = 10)]
    #[regex("[abc]+", priority = 10, callback = my_callback)]
  • Priority is now computed from an intermediate representation of the regex, before parsing it to the state machine graph.