-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement LocatedSpan::get_line(). #66
Conversation
Add a function to get the full input line containing the (start point of the) LocatedSpan. As suggested in fflorent#53.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Could you add some tests for this, checking edge cases? (first/last line, beginning/end of line, etc.)
src/lib.rs
Outdated
let self_bytes = self.fragment.as_bytes(); | ||
let self_ptr = self_bytes.as_ptr(); | ||
let offset = self.get_column() - 1; | ||
let the_line = unsafe { | ||
assert!( | ||
offset <= isize::max_value() as usize, | ||
"offset is too big" | ||
); | ||
let line_start_ptr = self_ptr.offset(-(offset as isize)); | ||
slice::from_raw_parts(line_start_ptr, offset + self_bytes.len()) | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this code could/should be shared with get_columns_and_bytes_before
. Could you add a private (unsafe) function for this?
Yes, I'll write some tests and then I'll see if the unsafe code can be unified. |
Actually, I'm not sure, but isn't |
Not quite, as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does get_line
never return None
?
Actually, as I was writing tests, I found myself asking me the same question. I don't think it does, the option is accidental. I'll get rid of it. |
Oh yeah, indeed |
It's getting late in my time zone, so I'll go to sleep now. I'll look at refactoring and any further questions later in the weekend. |
looks good, just need to deduplicate that code now |
I've been afk most of the day, it's time for bed in my tz, and I'm not entirely sober, but I've attempted a refactoring of the two similar unsafe blocks to one. Can look more at it in ten hours or so ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't worry, we're in no hurry.
I just realized that this code doesn't work if LocatedSpan
s are sliced with a right bound.
For example, this fails:
#[test]
fn line_of_word_in_middle() {
let data = "One line of text\
\nFollowed by a second\
\nand a third\n";
assert_eq!(
StrSpan::new(data)
.slice(data.find('\n').unwrap()..data.find('\n').unwrap()+5)
.get_line(),
"Followed by a second".as_bytes(),
);
}
because self.get_unoffsetted_slice()
only returns the bytes before the LocatedSpan
and the bytes in the LocatedSpan
itself, but none of the bytes after, even if they are on the same line.
Unfortunately, I don't see a way out of this without including the size of the original &[u8]
in every LocatedSpan
; but it would increase the size of LocatedSpan<_, ()>
from 20 to 28 bytes (probably 24 to 32 including the padding).
But this seems costly just to provide a convenience function that can already be implemented by users themselves with safe code.
What do you think?
@@ -257,17 +257,24 @@ impl<T: AsBytes, X> LocatedSpan<T, X> { | |||
&self.fragment | |||
} | |||
|
|||
fn get_columns_and_bytes_before(&self) -> (usize, &[u8]) { | |||
fn get_unoffsetted_slice(&self) -> &[u8] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, the name doesn't make it clear what this does, but I can't think of a better one. :/
Also some comment to explain what it does would be good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I considered get_original_size
, as that is pretty much the intent. But I didn't, since it has no way of reconstructing the original length. So unoffsetted
is what it actually does, it undos the offset.
As for the larger question about a possibly missing trailer of get_line()
; Yes, I was a bit worried about that myself, but decided that it's not a big problem in my main use-case of reporting parse errors. In cases I can think of that involves marking an interval (and not just a position) of a line, I think I would have two LocatedSpan
s to combine, where each of them would (probably) be "the rest of input from a starting point".
But I agree that this should be explained somehow, both in i a comment at get_unoffsetted_slice
and in the docstring of get_line
. I'll try to write something.
On the other hand, the "original" adress and length won't change while the parser creates all those subslices that are But that is a quite big change ... |
Well that's what I was thinking of, minus removing the |
What about adding a function (or method of That way we don't make the struct larger and we still provide that feature (and as a bonus, it won't rely on |
That's what I do in rsass before starting to use |
Hmm yeah, good point. You could put a ref to the original string in the |
For now, I think the docstring on Replacing the |
Yeah, you're right. Could you just rename |
Ok, done. Should I also squash the commits on this branch to one? |
This test documents how `get_line_beginning()` differs from a hypotetical `get_line()` method.
I can squash it on my end :) |
Add a function to get the full input line containing the (start point of the) LocatedSpan.
As suggested in #53.