-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deserialization layer seems too permissive with regards to checking the actual types received #3868
Comments
This is a somewhat known issue, it's just something that happens somewhat rarely that no-one really cared about improving this case. There is quite a bit to note here: The first thing is that diesel/diesel/src/pg/types/array.rs Line 54 in bc263af
Now we could go and perform a check at a higher level as you suggested. I feel that might be problematic as well. You are right that we get the right oids for the result set as basically no cost. The problematic part is getting the expected oids from our side of types. That might require a database query for custom types, which is not cheap. (We would basically need to call this function: https://docs.diesel.rs/2.1.x/diesel/expression/trait.QueryMetadata.html#tymethod.row_metadata)
I'm aware of at least the following cases:
That's a good idea. We should do that. Overall its a unclear for me how exactly to fix the complete problem. Maybe we could approach it the other way around and provide a check function that allows to check the existing |
Side note: if we implement OID checking, we may still want to allow hacks where we put wrong types in the schema for types that can reasonably get coalesced: #3973 (comment) (Or not: if the database schema changes during a migration but non-tz-aware wasn't UTC, it is better to throw an error...) (The over-engineered solution would be to have OID-overriding SQL types to make the hack a feature.) |
When the database sends a type that doesn't match the one declared in the schema, we don't check that it matches, and may attempt deserializing from a wire representation of a different type, possibly returning incorrect data without generating an error.
I encountered this scenario while changing the type of a column from
Int4
toInt8
. I expected to see errors due to mismatches for the time it would take for my new service to deploy after the migration was done running, which I was fine with. (Or at least that it would deserialize the appropriate thing because everybody uses little endian, right?)What actually happened however is the line went through here:
diesel/diesel/src/deserialize.rs
Lines 538 to 550 in bc263af
Then through here:
diesel/diesel/src/pg/types/integers.rs
Lines 45 to 63 in bc263af
Note that in this second part:
debug_assert!
(I wouldn't want it to panic anyway, I would want a deserialization error instead)NetworkEndian
is actuallyBigEndian
(I just imagined that PG uses little endian since that's the one actually used on all machines so that's one more reason why I didn't expect this issue ^^')Anyway so of course, with big endian and ignoring the size mismatch, diesel ended up deserializing only the leading zeroes of my
Int8
into anInt4
, causing it to send a bunch of zeroes into my applicative code.That doesn't seem like the ideal behavior. Instead, one would expect that if the Rust
schema.rs
of the currently deployed service doesn't match the actual database schema, corresponding queries error.It seems that we could prevent that by checking the consistency between the database-provided OID:
diesel/diesel/src/pg/value.rs
Line 93 in bc263af
and our known OID for the type:
diesel/diesel/src/sql_types/mod.rs
Lines 100 to 101 in bc263af
There may be several ways to make that safety check cost zero performance, one of them seems to be to do it only once when building the
LoadIter
(or on first row, or within the connection implementation usingQueryMetadata
), since every row has the same set of oids.diesel/diesel/src/query_dsl/load_dsl.rs
Lines 107 to 128 in bc263af
Side note: could there be users that rely on a mismatch between their
schema.rs
and the actual OIDs provided by the database? (Like weird custom text types with case insensitive & co. being declared as justText
inschema.rs
?) If so (or even in any case), we should at least check the size of the things we attempt to deserialize. (An additional singleif value.len() != 4 { return <some deserialization error> }
would cost exactly nothing here since there has to be bounds checking going on inbytes.read_i32
already, which we could then avoid, or we could just check that bytes is empty after reading which would only be two instructions anyway so probably wouldn't show in any benchmark...)The text was updated successfully, but these errors were encountered: