Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RUST-1992 Convert raw deserializer to use raw document iteration #483

Merged
merged 19 commits into from
Jul 22, 2024

Conversation

abr-egn
Copy link
Contributor

@abr-egn abr-egn commented Jul 18, 2024

RUST-1992

This removes quite a bit of duplicated binary parsing code and brings us closer to having separation of concerns between binary parsing and serde deserialization. IMO, the deserializer and helper structs are now easier to understand as well - rather than a mutable "root deserializer" that's threaded downwards, each deserializer or access struct just gets an immutable ref to the thing it's mapping into serde.

RawBsonRef::Document(doc) => doc,
_ => {
return Err(Error::deserialization(
"internal error: unexpected non-document",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places where I had to add this kind of error; these should never happen in practice because the value() match is always immediately preceded by an element_type() check.

}
}

impl<'de, 'a, 'b> serde::de::Deserializer<'de> for &'b mut CodeWithScopeDeserializer<'de, 'a> {
impl<'a, 'de> serde::de::Deserializer<'de> for &'a CodeWithScopeAccess<'de> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the various *Access types that needed to be updated, I found it simpler to have them implement both the Access and Deserializer traits rather than splitting across two types.

@@ -256,20 +267,69 @@ impl<'a> RawElement<'a> {
})
}

pub(crate) fn value_utf8_lossy(&self) -> Result<Option<Utf8LossyBson<'a>>> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this could hypothetically become part of the public API, although we'd want to think about what it should return along the axes of (restricted enum, full enum) x (always owned, some kind of cow).

Personally, because parsing invalid utf8 is such a corner case, what I'd like to do for 3.0 is remove support for that from the Serde deserialization entirely and just have it exposed via a method like this on the raw document iteration. That would give an escape hatch for those few times it's needed and enable quite a bit of simplification in the deserializer. If that sounds reasonable, I'll file a bug for that.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be fine; we're using this to handle bad server responses (SERVER-24007) in the driver but iteration should be a fine drop-in replacement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed RUST-1998 for this.

) -> Result<V::Value>
/// Deserialize the element, using the type of the element along with the
/// provided hint to determine how to visit the data.
fn deserialize_hint<V>(&self, visitor: V, hint: DeserializerHint) -> Result<V::Value>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Github's diffing, unsurprisingly, got very confused with these changes. This can be directly compared with the previous version's deserialize_next method; in particular, the cases in the the main match in both are in the same order so it shouldn't be too hard to scan down them side-by-side.

@abr-egn abr-egn marked this pull request as ready for review July 19, 2024 14:04
@abr-egn abr-egn requested a review from isabelatkinson July 19, 2024 14:04
Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one small suggestion but changes look good otherwise! have you done any benchmarking on this? I can't imagine there would be any major regression but I think it would be worth a sanity check

src/de/raw.rs Outdated Show resolved Hide resolved
@@ -256,20 +267,69 @@ impl<'a> RawElement<'a> {
})
}

pub(crate) fn value_utf8_lossy(&self) -> Result<Option<Utf8LossyBson<'a>>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be fine; we're using this to handle bad server responses (SERVER-24007) in the driver but iteration should be a fine drop-in replacement.

Co-authored-by: Isabel Atkinson <[email protected]>
@abr-egn
Copy link
Contributor Author

abr-egn commented Jul 22, 2024

have you done any benchmarking on this? I can't imagine there would be any major regression but I think it would be worth a sanity check

Good call, and yeah, <1% variation in benchmarks 🙂

@abr-egn abr-egn requested a review from isabelatkinson July 22, 2024 16:58
Copy link
Contributor

@isabelatkinson isabelatkinson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<1% variation in benchmarks

great, thanks for checking!

@abr-egn abr-egn merged commit 39d90f6 into mongodb:main Jul 22, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants