Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

.eq_any([["foo"]]) on SqlType = Array<Text> causes runtime error on Postgres #4349

Open
Ten0 opened this issue Nov 19, 2024 · 4 comments
Open

Comments

@Ten0
Copy link
Member

Ten0 commented Nov 19, 2024

Setup

Versions

  • Diesel: ce6fa0b
  • Database: Postgres

Feature Flags

  • diesel: postgres

Problem Description

array.eq_any(array_of_array) doesn't work on Postgres, because Postgres doesn't have such a thing as "array of array". It has 2D array but "2-D array is a different concept".
This causes an error, so we should probably use the non-specialized ANSI IN (...) when dealing with arrays.

What are you trying to accomplish?

table! {
    posts (id) {
        id -> Int4,
        // ...
        tags -> Array<Text>,
    }
}

#[test]
#[cfg(feature = "postgres")]
fn filter_array_by_in() {
    use crate::schema::posts::dsl::*;

    let connection: &mut PgConnection = &mut connection();
    let tag_combinations_to_look_for: &[&[&str]] = &[&["foo"], &["foo", "bar"], &["baz"]];
    let result: Vec<i32> = posts
        .filter(tags.eq_any(tag_combinations_to_look_for))
        .select(id)
        .load(connection)
        .unwrap();
    assert_eq!(result, &[] as &[i32]);
}

What is the actual output?

called `Result::unwrap()` on an `Err` value: DatabaseError(Unknown, "could not find array type for data type text[]")

What is the expected output?

Either of:

  • test passes, serializing as IN ($1, $2, $3) with $1 = ["foo"]...
  • Doesn't compile, requiring an explicit writing (e.g. .eq_any(abc).ansi()) to tell it to use ANSI serialization

ATM AFAICT there's no way to even tell it to use ANSI serialization because of bounds here:

impl<T, U, DB> QueryFragment<DB, sql_dialect::array_comparison::AnsiSqlArrayComparison> for In<T, U>
where
DB: Backend
+ SqlDialect<ArrayComparison = sql_dialect::array_comparison::AnsiSqlArrayComparison>,

which forbids the use of the ANSI implementation on Pg Backend (which seems weird since ANSI should always work, no?)

(NB: All of this is useful only to the extent to which indexing will work and this behaves reasonably even if there are 500 different values in there. I recall that we did tend to prefer = ANY maybe because of this but I may be mistaken here, was it only for statement caching ?)

Steps to reproduce

Ten0@d6c93fd

or dedicated working branch to fixing this:

cd diesel
git remote add Ten0 https://github.com/Ten0/diesel.git
git checkout array_eq_any
cd diesel_tests
DATABASE_URL="..." cargo test --no-default-features --features "postgres" -- filter_array_by_in --nocapture
@Ten0 Ten0 added the bug label Nov 19, 2024
@Ten0 Ten0 changed the title .eq_any([["foo"]]) on SqlType = Array<Text> causes runtime error on Postgres .eq_any([["foo"]]) on SqlType = Array<Text> causes runtime error on Postgres Nov 19, 2024
@Ten0
Copy link
Member Author

Ten0 commented Nov 19, 2024

= ANY (VALUES (ARRAY['foo']), (ARRAY['bar'])) works, so it looks like as a VALUES subquery in the Many specialization would also work (and maybe generate query plans that match better the = ANY (ARRAY) case but not sure)...
The main question is: how to detect that you actually have an Array of Array.
I thought about:

  1. Using the type metadata (array_oid of array is zero): we can't ATM because that requires having a connection (for MetadataLookup), which we don't have for the ToSql AST pass (notably for debug reasons)
  2. Adding that information to the SqlType trait.
    a. We could have that as a type like is done for IsNullable but that may not be worth the additional complexity (and/or breaking changes), considering that "associated type defaults are unstable (Tracking issue for RFC 2532, "Associated type defaults" rust-lang/rust#29661)"... but maybe that's fine because that is mostly (always?) derived anyway?
    b. It doesn't necessarily need to be a type though: even if it's just an associated const we could use it in the QueryFragment implementation. If it's just an associated const we can't use that to restrict compilation of additional array nesting though, because "associated const equality is incomplete (Tracking Issue for associated const equality rust-lang/rust#92827)", so if it's meant to actually be a type then we probably shouldn't do this...

weiznich added a commit to weiznich/diesel that referenced this issue Nov 19, 2024
This commit fixes an issue that allowed to trigger a runtime error by
passing an array of arrays to `.eq_any()`. We rewrite queries containing
`IN` expressions to `= ANY()` on postgresql as that more
efficient (allows caching + allows binding all values at once). That's
not possible for arrays of arrays as we do not support nested arrays
yet.

The fix introduces an associated constant for the `SqlType` trait that
tracks if a SQL type is a `Array<T>` or not. This information is then
used to conditionally generate the "right" SQL. By defaulting to `false`
while adding this constant we do not break existing code. We use the
derive to set the right value based on the type name.
weiznich added a commit to weiznich/diesel that referenced this issue Nov 19, 2024
This commit fixes an issue that allowed to trigger a runtime error by
passing an array of arrays to `.eq_any()`. We rewrite queries containing
`IN` expressions to `= ANY()` on postgresql as that more
efficient (allows caching + allows binding all values at once). That's
not possible for arrays of arrays as we do not support nested arrays
yet.

The fix introduces an associated constant for the `SqlType` trait that
tracks if a SQL type is a `Array<T>` or not. This information is then
used to conditionally generate the "right" SQL. By defaulting to `false`
while adding this constant we do not break existing code. We use the
derive to set the right value based on the type name.
@Ten0
Copy link
Member Author

Ten0 commented Nov 19, 2024

A workaround that uses the VALUES (...), (...) writing:

use {
	diesel::{
		backend::Backend,
		expression::{
			array_comparison::{AsInExpression, MaybeEmpty},
			AsExpression, Expression, TypedExpressionType, ValidGrouping,
		},
		prelude::*,
		query_builder::{AstPass, QueryFragment, QueryId},
		serialize::ToSql,
		sql_types::{HasSqlType, SingleValue, SqlType},
	},
	std::marker::PhantomData,
};

/// Fixes `eq_any` usage with arrays of arrays:
/// https://github.com/diesel-rs/diesel/issues/4349
///
/// Usage: `column.eq_any(AsSingleColumnValueSubselect(iterator_of_arrays))`
pub struct AsSingleColumnValueSubselect<I>(pub I);
impl<I, T, ST> AsInExpression<ST> for AsSingleColumnValueSubselect<I>
where
	I: IntoIterator<Item = T>,
	T: AsExpression<ST>,
	ST: SqlType + TypedExpressionType,
{
	type InExpression = SingleColumnValuesSubselect<ST, T>;

	fn as_in_expression(self) -> Self::InExpression {
		SingleColumnValuesSubselect {
			values: self.0.into_iter().collect(),
			p: PhantomData,
		}
	}
}

#[derive(Debug, Clone)]
pub struct SingleColumnValuesSubselect<ST, I> {
	/// The values contained in the `= ANY ()` clause
	pub values: Vec<I>,
	p: PhantomData<ST>,
}

impl<ST, I, GB> ValidGrouping<GB> for SingleColumnValuesSubselect<ST, I>
where
	ST: SingleValue,
	I: AsExpression<ST>,
	I::Expression: ValidGrouping<GB>,
{
	type IsAggregate = <I::Expression as ValidGrouping<GB>>::IsAggregate;
}

// This implementation is fake, it's not an expression, but it's used to check consistency between the SQL types
// of both sides of the `= ANY ()` clause.
impl<ST, I> Expression for SingleColumnValuesSubselect<ST, I>
where
	ST: TypedExpressionType,
{
	type SqlType = ST;
}

impl<ST, I> MaybeEmpty for SingleColumnValuesSubselect<ST, I> {
	fn is_empty(&self) -> bool {
		self.values.is_empty()
	}
}

impl<ST, I, QS> SelectableExpression<QS> for SingleColumnValuesSubselect<ST, I>
where
	SingleColumnValuesSubselect<ST, I>: AppearsOnTable<QS>,
	ST: SingleValue,
	I: AsExpression<ST>,
	<I as AsExpression<ST>>::Expression: SelectableExpression<QS>,
{
}

impl<ST, I, QS> AppearsOnTable<QS> for SingleColumnValuesSubselect<ST, I>
where
	SingleColumnValuesSubselect<ST, I>: Expression,
	I: AsExpression<ST>,
	ST: SingleValue,
	<I as AsExpression<ST>>::Expression: SelectableExpression<QS>,
{
}

impl<ST, I, DB> QueryFragment<DB> for SingleColumnValuesSubselect<ST, I>
where
	DB: Backend + HasSqlType<ST>,
	ST: SingleValue,
	I: ToSql<ST, DB>,
{
	fn walk_ast<'b>(&'b self, mut out: AstPass<'_, 'b, DB>) -> QueryResult<()> {
		if self.values.is_empty() {
			return Err(diesel::result::Error::QueryBuilderError(
				"Cannot generate an empty `= ANY (VALUES (...), (...))` clause - avoid making a query in that case"
					.into(),
			));
		}
		out.unsafe_to_cache_prepared();
		out.push_sql("VALUES ");
		let mut first = true;
		for value in &self.values {
			if first {
				first = false;
			} else {
				out.push_sql(", ");
			}
			out.push_sql("(");
			out.push_bind_param(value)?;
			out.push_sql(")");
		}
		Ok(())
	}
}

impl<ST, I> QueryId for SingleColumnValuesSubselect<ST, I> {
	type QueryId = ();

	const HAS_STATIC_QUERY_ID: bool = false;
}

@weiznich
Copy link
Member

I've pushed #4350 with another approach to fix that problem, but I'm open to different solutions.

As a more general note: I wonder if it would be meaningful (and possible) to have some sort of fuzzer that takes the documentation and keeps generating queries via the DSL and then checks if they compile and if they do not return an syntax error while executing them.

@Ten0
Copy link
Member Author

Ten0 commented Nov 20, 2024

As a more general note: I wonder if it would be meaningful (and possible) to have some sort of fuzzer that takes the documentation and keeps generating queries via the DSL and then checks if they compile and if they do not return an syntax error while executing them.

This looks interesting. However I feel like it's probably fine if Diesel doesn't cover for mistakes people never make, and there ought to be some of those that the type system allows but the complexity of preventing isn't worth the probability of occurrence...
If this is true, then I'm not sure what would be a strategy to create a general test system that would test only "relevant" cases but still be able to infer them automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants