separated_by but keep them joined #576
-
So I am trying to implement a few of the email standards. Currently trying to parse mailboxes. And part of the standard is dot-atom-text You can't have two Currently I have this /// [atext Defined in RFC 2822](https://datatracker.ietf.org/doc/html/rfc2822#section-3.2.4)
pub fn atext<'a>() -> impl Parser<'a, &'a str, char, ErrType<'a>> {
choice((
// Instead of having a choice inside of a choice call the parser directly
one_of('a'..='z'),
one_of('A'..='Z'),
one_of('0'..='9'),
one_of([
'!', '#', '$', '%', '&', '\'', '*', '+', '-', '/', '=', '?', '^', '_', '`', '{', '|',
'}', '~',
]),
))
}
pub fn atext_seg<'a, C>() -> impl Parser<'a, &'a str, C, ErrType<'a>>
where
C: Container<char>,
{
atext().repeated().at_least(1).collect::<C>()
}
/// \`\`\`ebnf
/// dot-atom-text = 1*atext *("." 1*atext)
/// \`\`\`
pub fn dot_atom_text<'a>() -> impl Parser<'a, &'a str, String, ErrType<'a>> {
atext_seg::<String>()
.separated_by(just('.'))
.collect::<Vec<_>>()
.map(|v| {
if v.len() == 1 {
return v.into_iter().next().unwrap();
}
let mut s = String::with_capacity(v.iter().map(|v| v.len() + 1).sum::<usize>());
s.push_str(&v[0]);
for v in v[1..].iter() {
s.push('.');
s.push_str(&v);
}
s
})
} I found Vec::join to be slower than what I have lol. Anyway, does Chumsky have a way of doing a separated by that keeps the result united. The only real reason I am asking is. I am bike shedding performance for no reason. Lol |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
To avoid allocation, you could write pub fn atext_seg<'a>() -> impl Parser<'a, &'a str, &'a str, ErrType<'a>>
{
atext().repeated().at_least(1).to_slice()
} And, if I'm understanding correctly, pub fn dot_atom_text<'a>() -> impl Parser<'a, &'a str, &'a str, ErrType<'a>> {
atext_seg()
.separated_by(just('.'))
.to_slice()
} Right now, you are manually rebuilding the input you parse, char-by-char, using |
Beta Was this translation helpful? Give feedback.
To avoid allocation, you could write
atext_seg
as:And, if I'm understanding correctly,
dot_atom_text
can be written in a similar way:Right now, you are manually rebuilding the input you parse, char-by-char, using
collect
. Howeverto_slice
will return the slice of the input you parsed, for free, without any copying or allocation.(This is the biggest strength of zero-copy parsing in chumsky 1.0!)