Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to match string literals case insensitively #216

Closed
azul opened this issue Dec 8, 2019 · 4 comments
Closed

How to match string literals case insensitively #216

azul opened this issue Dec 8, 2019 · 4 comments

Comments

@azul
Copy link

azul commented Dec 8, 2019

I'm working on upgrading a codebase from peg 0.5 to 0.6.

There's a bunch of rules like this one in there:

pub cmd_help -> SmtpCommand
        = "help"i s:strparam* NL
        { SmtpCommand::Help(s) }

In 0.6 the i modifier of "help"i no longer works. What's the recommended way to do this now?

@azul azul changed the title How to match string literals in case insensitively How to match string literals case insensitively Dec 8, 2019
@kevinmehall
Copy link
Owner

kevinmehall commented Dec 11, 2019

The "str"i syntax got removed because as a procedural macro, the syntax now has to conform to Rust's tokenization rules. Having it built-in also brought up issues of what exactly "case-insensitive" means in the context of Unicode.

However, the addition of rule arguments allows you to define your own rule that matches a given literal string. Something like:

rule i(literal: &'static str)
    = input:$([_]*<{literal.len()}>)
      {? if input.eq_ignore_ascii_case(literal) { Ok(()) } else { Err(literal) } }
    
pub rule cmd_help() -> SmtpCommand
    = i("help") s:strparam()* NL() { SmtpCommand::Help(s) }

Breaking that down, [_] accepts any single character, [_]*<{literal.len()}> accepts a string with the same length as the literal, $() gets the corresponding slice of the input string, and the {? } tests whether it is a case-insensitive match or returns the literal as the "Expected" error message. That should work for ASCII literals; would need some additional complexity to handle Unicode.

This is the kind of thing that I'd like to eventually get into a kind of "standard library" of common rules: #201.

@kevinmehall
Copy link
Owner

Alternatively, if all the literals in your language are case insensitive, you could define the grammar for your own struct wrapping str or &[u8] and redefine "" literals to behave however you want. Literals compile down to calls to ParseLiteral::parse_string_literal, which you could define for a custom input type in a way that uses case-insensitive string comparison instead of bytewise comparison.

@azul
Copy link
Author

azul commented Dec 11, 2019

I still have to try this out. But looks like it's gonna work.
Will close this issue for now and reopen if i run into problems.

@azul azul closed this as completed Dec 11, 2019
@azul
Copy link
Author

azul commented Dec 12, 2019

Worked like a charm for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants