-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUST-1992 Convert raw deserializer to use raw document iteration #483
Conversation
RawBsonRef::Document(doc) => doc, | ||
_ => { | ||
return Err(Error::deserialization( | ||
"internal error: unexpected non-document", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a few places where I had to add this kind of error; these should never happen in practice because the value()
match is always immediately preceded by an element_type()
check.
} | ||
} | ||
|
||
impl<'de, 'a, 'b> serde::de::Deserializer<'de> for &'b mut CodeWithScopeDeserializer<'de, 'a> { | ||
impl<'a, 'de> serde::de::Deserializer<'de> for &'a CodeWithScopeAccess<'de> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the various *Access
types that needed to be updated, I found it simpler to have them implement both the Access
and Deserializer
traits rather than splitting across two types.
@@ -256,20 +267,69 @@ impl<'a> RawElement<'a> { | |||
}) | |||
} | |||
|
|||
pub(crate) fn value_utf8_lossy(&self) -> Result<Option<Utf8LossyBson<'a>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like this could hypothetically become part of the public API, although we'd want to think about what it should return along the axes of (restricted enum, full enum) x (always owned, some kind of cow).
Personally, because parsing invalid utf8 is such a corner case, what I'd like to do for 3.0 is remove support for that from the Serde deserialization entirely and just have it exposed via a method like this on the raw document iteration. That would give an escape hatch for those few times it's needed and enable quite a bit of simplification in the deserializer. If that sounds reasonable, I'll file a bug for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be fine; we're using this to handle bad server responses (SERVER-24007) in the driver but iteration should be a fine drop-in replacement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Filed RUST-1998 for this.
) -> Result<V::Value> | ||
/// Deserialize the element, using the type of the element along with the | ||
/// provided hint to determine how to visit the data. | ||
fn deserialize_hint<V>(&self, visitor: V, hint: DeserializerHint) -> Result<V::Value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Github's diffing, unsurprisingly, got very confused with these changes. This can be directly compared with the previous version's deserialize_next
method; in particular, the cases in the the main match
in both are in the same order so it shouldn't be too hard to scan down them side-by-side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one small suggestion but changes look good otherwise! have you done any benchmarking on this? I can't imagine there would be any major regression but I think it would be worth a sanity check
@@ -256,20 +267,69 @@ impl<'a> RawElement<'a> { | |||
}) | |||
} | |||
|
|||
pub(crate) fn value_utf8_lossy(&self) -> Result<Option<Utf8LossyBson<'a>>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that would be fine; we're using this to handle bad server responses (SERVER-24007) in the driver but iteration should be a fine drop-in replacement.
Co-authored-by: Isabel Atkinson <[email protected]>
Good call, and yeah, <1% variation in benchmarks 🙂 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<1% variation in benchmarks
great, thanks for checking!
RUST-1992
This removes quite a bit of duplicated binary parsing code and brings us closer to having separation of concerns between binary parsing and serde deserialization. IMO, the deserializer and helper structs are now easier to understand as well - rather than a mutable "root deserializer" that's threaded downwards, each deserializer or access struct just gets an immutable ref to the thing it's mapping into serde.