0.11.1
Manual token disambiguation
Previously when two definitions computed could match the same input and were assigned the same priority, Logos
would make an arbitrary choice about which token to produce. This behavior could produce unexpected results, it is therefore now considered a compile error, and will be reported as such.
Example
Consider two regexes, one matching [abc]+
while the other matches [cde]+
. Both of those would have a computed priority of 1
, and both could match any sequence of c
.
Logos will now return a compile error with hints for solution in this case:
error: A definition of variant `Abc` can match the same input as another definition of variant `Cde`.
hint: Consider giving one definition a higher priority: #[regex(..., priority = 2)]
--> tests/tests/edgecase.rs:410:17
|
410 | #[regex("[abc]+")]
| ^^^^^^^^
error: A definition of variant `Cde` can match the same input as another definition of variant `Abc`.
hint: Consider giving one definition a higher priority: #[regex(..., priority = 2)]
--> tests/tests/edgecase.rs:413:17
|
413 | #[regex("[cde]+")]
| ^^^^^^^^
error: aborting due to 2 previous errors
Setting priority = 2
to either token will override the computed priority, allowing Logos to properly disambiguate the tokens.
Generic type parameters
Deriving Logos on an enum with type parameters like so:
#[derive(Logos, Debug, PartialEq)]
enum Token<S, N> {
#[regex(r"[ \n\t\f]+", logos::skip)]
#[error]
Error,
#[regex("[a-z]+")]
Ident(S),
#[regex("[0-9]+", |lex| lex.slice().parse())]
Number(N)
}
Will now produce following errors:
error: Generic type parameter without a concrete type
Define a concrete type Logos can use: #[logos(type S = Type)]
--> tests/tests/edgecase.rs:339:16
|
339 | enum Token<S, N> {
| ^
error: Generic type parameter without a concrete type
Define a concrete type Logos can use: #[logos(type N = Type)]
--> tests/tests/edgecase.rs:339:19
|
339 | enum Token<S, N> {
| ^
It's now possible to define concrete types for the generic type parameters:
#[derive(Logos, Debug, PartialEq)]
#[logos(
type S = &str,
type N = u64,
)]
enum Token<S, N> {
This will derive the Logos
trait for Token<&str, u64>
. All reference types (like &str
here) will automatically use the lifetime of the source.
Other changes:
- It's now possible to define callbacks using
callback = ...
syntax inside#[regex(...)]
and#[token(...)]
attributes. This allows for callback and priority to be placed arbitrarily within the attribute. All of these are now legal and equivalent:#[regex("[abc]+", my_callback, priority = 10)] #[regex("[abc]+", callback = my_callback, priority = 10)] #[regex("[abc]+", priority = 10, callback = my_callback)]
- Priority is now computed from an intermediate representation of the regex, before parsing it to the state machine graph.