-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slimmer #includes behaviour #1
base: main
Are you sure you want to change the base?
Conversation
5642b37
to
96fd8d4
Compare
The goal of the commit is to make #includes slimmer - only join the table required to satisfy the referenced tables and preload all other associations independently. Context The Active Record #includes method results in different behaviour depending on the other parts of the query: it may execute multiple queries (one per included association) or build a single JOIN query to load everything at once. Before the commit, it was "all or nothing" behind an eager load flag - one referenced table was enough to transform the whole query into one big JOIN. However, it's not always necessary to join all the tables except the ones with conditions. Normally, we could avoid joining everything (which could be performance-heavy), and execute a few queries instead. Idea The main idea is to remove referenced tables (which are #references, #where, #order or copied by #and, #or, #merge) from the final join query. So we would LEFT OUTER JOINed only includes tables with conditions on the query. At the same time, we have to bring non-referenced tables back to separate query preloads. Backward compatibility 1) Includes and table names don't always match so we use reflections. We try to match all possible includes table names against references including inner and left outer joins. If there's an unreliable match (like no reflection for association), we're inclined to JOIN the associations in order to not break user applications. All current tests are green without changes inside them. 2) The commit certainly breaks user code with the plain string where or select: ``` Author.includes(:company, :books).where(company: { title: "The Office" }) .where("books.title = ?", "Design Patterns") .select("books.title") ``` We do not want to introduce any flags or scopes to support this behaviour because in docs there're mentions to prohibit string causes without references: https://api.rubyonrails.org/v7.0.4.2/classes/ActiveRecord/QueryMethods.html#method-i-includes https://api.rubyonrails.org/v7.0.4.2/classes/ActiveRecord/QueryMethods.html#method-i-references People who don't know it have to start using it. Future improvements 1) Includes are not always plain - they may contain several levels inside a hash. In the commit, we handle several includes as separate trees starting from the top-level root. If there's any reference to possible includes table, we'll mark this whole tree as needed to JOINed. It's a safe bet which could be improved in future with a more fine-grained reference marking algorithm. 2) The commit's tree traversal is average performant at best. Several intermediate structures and tree walks could be removed - we prefer a simple algorithm at the moment to show the idea.
96fd8d4
to
44372c2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Looks more sophisticated than I expected, but seems working. I left a few questions, let's discuss them and see how we can improve the solution.
@@ -539,7 +539,7 @@ def test_nested_has_many_through_with_conditions_on_through_associations | |||
def test_nested_has_many_through_with_conditions_on_through_associations_preload | |||
assert_empty Author.where("tags.id" => 100).joins(:misc_post_first_blue_tags) | |||
|
|||
author = assert_queries(2) { Author.includes(:misc_post_first_blue_tags).third } | |||
author = assert_queries(3) { Author.includes(:misc_post_first_blue_tags).third } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, this change looks suspicious; ideally, no existing tests should have changed. What's the source of the additional query?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm almost sure that it's a natural change - we're moving JOINs to separate preloads after all.
Before:
FROM
"posts"
LEFT OUTER JOIN "taggings" ON "taggings"."taggable_type" = ?
AND "taggings"."comment" = ?
AND "taggings"."taggable_id" = "posts"."id"
LEFT OUTER JOIN "tags" ON "tags"."name" = ?
AND "tags"."id" = "taggings"."tag_id"
LEFT OUTER JOIN "taggings" "taggings_tags_2" ON "taggings_tags_2"."tag_id" = "tags"."id"
After:
-- The first query
FROM
"posts"
LEFT OUTER JOIN "taggings" ON "taggings"."taggable_type" = ?
AND "taggings"."comment" = ?
AND "taggings"."taggable_id" = "posts"."id"
LEFT OUTER JOIN "tags" ON "tags"."name" = ?
AND "tags"."id" = "taggings"."tag_id"
-- The second query
FROM
"taggings"
LEFT OUTER JOIN "tags" ON "tags"."id" = "taggings"."tag_id"
LEFT OUTER JOIN "taggings" "taggings_tags" ON "taggings_tags"."tag_id" = "tags"."id"
So we're preloading the second taggings
instead of eager loading.
The change wasn't needed before because without the new tree traversal we preferred joining queries more frequently (but also we weren't able to optimize advanced things like this).
By the way, we don't observe so many failures like this in the current Rails tests as there're not so many tests with both includes
+ no further where
usage and a number of queries check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, it's nice that you've spotlighted this - it's strange that there's a JOIN taggings
to taggings
in the second query.
It's included in the long associations chain as the last includes
:
class Author < ActiveRecord::Base
has_many :misc_post_first_blue_tags, through: :misc_posts, source: :first_blue_tags
has_many :misc_posts, -> { where(posts: { title: ["misc post by bob", "misc post by mary"] }) }, class_name: "Post"
class Post < ActiveRecord::Base
has_many :first_blue_tags, -> { where tags: { name: "Blue" } }, through: :first_taggings, source: :tag
has_many :first_taggings, -> { where taggings: { comment: "first" } }, as: :taggable, class_name: "Tagging"
class Tagging < ActiveRecord::Base
belongs_to :tag, -> { includes(:tagging) }
but it's still strange - I'm investigating. Maybe, it's because of the known limitations with ambiguous table names in includes
and references
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small update - if we remove includes
in the last belongs_to
, the problem would be solved.
It's super strange as we don't change how includes
works - only filtering the current includes_values
.
Right now, my plan is to replicate this behaviour without our optimization. It'd show this as another Rails minor issue.
The goal of the commit is to make
#includes
slimmer - only join the table required to satisfy the referenced tables and preload all other associations independently.Motivation / Background
The Active Record #includes method results in different behaviour depending on the other parts of the query:
it may execute multiple queries (one per included association) or build a single JOIN query to load everything at once.
Before the commit, it was "all or nothing" behind an eager load flag - one referenced table was enough to transform the whole query into one big JOIN.
However, it's not always necessary to join all the tables except the ones with conditions.
Normally, we could avoid joining everything (which could be performance-heavy), and execute a few queries instead.
The following two statements would demonstrate the difference:
Detail
The main idea is to remove referenced tables
(which are #references, #where, #order or copied by #and, #or, #merge) from the final join query.
So we would LEFT OUTER JOINed only includes tables with conditions on the query. At the same time,
we have to bring non-referenced tables back to separate query preloads:
Backward compatibility
We try to match all possible includes table names against references including inner and left outer joins.
If there's an unreliable match (like no reflection for association), we're inclined to JOIN the associations in order to not break user applications.
All current tests are green without changes inside them (except a number of queries).
where
orselect
:We do not want to introduce any flags or scopes to support this behaviour because in docs there're mentions to prohibit string causes without references:
https://api.rubyonrails.org/v7.0.4.2/classes/ActiveRecord/QueryMethods.html#method-i-includes https://api.rubyonrails.org/v7.0.4.2/classes/ActiveRecord/QueryMethods.html#method-i-references
People who don't know it have to start using it.
Known limitations
includes
trees.Look at this example:
Instead of the expected separate preload for the
publishers
, it would include them in a single JOIN. It's because we cannot differentiate usage level in the plainreferences_values
at the moment.Moreover, using references to numbered table aliases (with the digits at the end) disables all slim optimizations completely.
Checklist
Before submitting the PR make sure the following are checked:
[Fix #issue-number]