Skip to content

Commit

Permalink
Slimmer #includes behaviour
Browse files Browse the repository at this point in the history
The goal of the commit is to make #includes slimmer -
only join the table required to satisfy the referenced tables and
preload all other associations independently.

Context

The Active Record #includes method results in different behaviour
depending on the other parts of the query:
it may execute multiple queries (one per included association) or
build a single JOIN query to load everything at once.

Before the commit, it was "all or nothing" behind an eager load flag -
one referenced table was enough to transform the whole query into one big JOIN.

However, it's not always necessary to join all the tables
except the ones with conditions.
Normally, we could avoid joining everything (which could be performance-heavy),
and execute a few queries instead.

Idea

The main idea is to remove referenced tables
(which are #references, #where, #order or copied by #and, #or, #merge)
from the final join query.
So we would LEFT OUTER JOINed only includes tables with conditions on the query.
At the same time,
we have to bring non-referenced tables back to separate query preloads.

Backward compatibility

1) Includes and table names don't always match so we use reflections.

We try to match all possible includes table names against references
including inner and left outer joins.
If there's an unreliable match (like no reflection for association),
we're inclined to JOIN the associations in order to not break user applications.

All current tests are green without changes inside them.

2) The commit certainly breaks user code with the plain string where or select:
```
Author.includes(:company, :books).where(company: { title: "The Office" })
                                 .where("books.title = ?", "Design Patterns")
                                 .select("books.title")
```
We do not want to introduce any flags or scopes to support this behaviour
because in docs there're mentions to prohibit string causes without references:
https://api.rubyonrails.org/v7.0.4.2/classes/ActiveRecord/QueryMethods.html#method-i-includes
https://api.rubyonrails.org/v7.0.4.2/classes/ActiveRecord/QueryMethods.html#method-i-references

People who don't know it have to start using it.

Future improvements

1) Includes are not always plain - they may contain several levels inside a hash.

In the commit, we handle several includes
as separate trees starting from the top-level root.
If there's any reference to possible includes table,
we'll mark this whole tree as needed to JOINed.

It's a safe bet which could be improved in future
with a more fine-grained reference marking algorithm.

2) The commit's tree traversal is average performant at best.
Several intermediate structures and tree walks could be removed -
we prefer a simple algorithm at the moment to show the idea.
  • Loading branch information
ardecvz committed Mar 13, 2023
1 parent 2eed4dc commit 44372c2
Show file tree
Hide file tree
Showing 7 changed files with 397 additions and 7 deletions.
31 changes: 31 additions & 0 deletions activerecord/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,34 @@
* Slimmer `#includes` behaviour.

The goal of the change is to make #includes slimmer - only join the table required to satisfy the referenced tables and preload all other associations independently.

The following two statements would demonstrate the difference BEFORE the change:

```sh
> Author.includes(:books, :post).to_a

Author Load (0.1ms) SELECT "authors".* FROM "authors"
Book Load (0.3ms) SELECT "books".* FROM "books" WHERE "books"."author_id" IN (?, ?, ?) [["author_id", 1], ["author_id", 2], ["author_id", 3]]
Post Load (0.2ms) SELECT "posts".* FROM "posts" WHERE "posts"."author_id" IN (?, ?, ?) [["author_id", 1], ["author_id", 2], ["author_id", 3]]

> Author.includes(:books, :post).where(books: { language: 'EN' }).to_a

SQL (0.3ms) SELECT "authors"."id" AS t0_r0, "authors"."name" AS t0_r1, "authors"."author_address_id" AS t0_r2, "authors"."author_address_extra_id" AS t0_r3, "authors"."organization_id" AS t0_r4, "authors"."owned_essay_id" AS t0_r5, "books"."id" AS t1_r0, "books"."author_id" AS t1_r1, "books"."format" AS t1_r2, "books"."format_record_id" AS t1_r3, "books"."format_record_type" AS t1_r4, "books"."name" AS t1_r5, "books"."status" AS t1_r6, "books"."last_read" AS t1_r7, "books"."nullable_status" AS t1_r8, "books"."language" AS t1_r9, "books"."author_visibility" AS t1_r10, "books"."illustrator_visibility" AS t1_r11, "books"."font_size" AS t1_r12, "books"."difficulty" AS t1_r13, "books"."cover" AS t1_r14, "books"."isbn" AS t1_r15, "books"."external_id" AS t1_r16, "books"."original_name" AS t1_r17, "books"."published_on" AS t1_r18, "books"."boolean_status" AS t1_r19, "books"."tags_count" AS t1_r20, "books"."created_at" AS t1_r21, "books"."updated_at" AS t1_r22, "books"."updated_on" AS t1_r23, "posts"."id" AS t2_r0, "posts"."author_id" AS t2_r1, "posts"."team_id" AS t2_r2, "posts"."title" AS t2_r3, "posts"."body" AS t2_r4, "posts"."type" AS t2_r5, "posts"."legacy_comments_count" AS t2_r6, "posts"."taggings_with_delete_all_count" AS t2_r7, "posts"."taggings_with_destroy_count" AS t2_r8, "posts"."tags_count" AS t2_r9, "posts"."indestructible_tags_count" AS t2_r10, "posts"."tags_with_destroy_count" AS t2_r11, "posts"."tags_with_nullify_count" AS t2_r12 FROM "authors" LEFT OUTER JOIN "books" ON "books"."author_id" = "authors"."id" LEFT OUTER JOIN "posts" ON "posts"."author_id" = "authors"."id" WHERE "books"."language" IS NULL

```
And this is AFTER the change:
```sh
> Author.includes(:books, :post).where(books: { language: 'EN' }).to_a

SQL (0.2ms) SELECT "authors"."id" AS t0_r0, "authors"."name" AS t0_r1, "authors"."author_address_id" AS t0_r2, "authors"."author_address_extra_id" AS t0_r3, "authors"."organization_id" AS t0_r4, "authors"."owned_essay_id" AS t0_r5, "books"."id" AS t1_r0, "books"."author_id" AS t1_r1, "books"."format" AS t1_r2, "books"."format_record_id" AS t1_r3, "books"."format_record_type" AS t1_r4, "books"."name" AS t1_r5, "books"."status" AS t1_r6, "books"."last_read" AS t1_r7, "books"."nullable_status" AS t1_r8, "books"."language" AS t1_r9, "books"."author_visibility" AS t1_r10, "books"."illustrator_visibility" AS t1_r11, "books"."font_size" AS t1_r12, "books"."difficulty" AS t1_r13, "books"."cover" AS t1_r14, "books"."isbn" AS t1_r15, "books"."external_id" AS t1_r16, "books"."original_name" AS t1_r17, "books"."published_on" AS t1_r18, "books"."boolean_status" AS t1_r19, "books"."tags_count" AS t1_r20, "books"."created_at" AS t1_r21, "books"."updated_at" AS t1_r22, "books"."updated_on" AS t1_r23 FROM "authors" LEFT OUTER JOIN "books" ON "books"."author_id" = "authors"."id" WHERE "books"."language" IS NULL
Post Load (0.2ms) SELECT "posts".* FROM "posts" WHERE "posts"."author_id" IN (?, ?) [["author_id", 2], ["author_id", 3]]

```
*Artur Petrov*
* Remove deprecated `Tasks::DatabaseTasks.schema_file_type`.
*Rafael Mendonça França*
Expand Down
1 change: 1 addition & 0 deletions activerecord/lib/active_record.rb
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ module ActiveRecord
autoload :Calculations
autoload :Delegation
autoload :FinderMethods
autoload :IncludesTracker
autoload :PredicateBuilder
autoload :QueryMethods
autoload :SpawnMethods
Expand Down
4 changes: 2 additions & 2 deletions activerecord/lib/active_record/relation.rb
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ class Relation
VALUE_METHODS = MULTI_VALUE_METHODS + SINGLE_VALUE_METHODS + CLAUSE_METHODS

include Enumerable
include FinderMethods, Calculations, SpawnMethods, QueryMethods, Batches, Explain, Delegation
include FinderMethods, Calculations, SpawnMethods, QueryMethods, Batches, IncludesTracker, Explain, Delegation

attr_reader :table, :klass, :loaded, :predicate_builder
attr_accessor :skip_preloading_value
Expand Down Expand Up @@ -838,7 +838,7 @@ def self.strict_loading_value

def preload_associations(records) # :nodoc:
preload = preload_values
preload += includes_values unless eager_loading?
preload += eager_loading? ? includes_values_non_referenced : includes_values
scope = strict_loading_value ? StrictLoadingScope : nil
preload.each do |associations|
ActiveRecord::Associations::Preloader.new(records: records, associations: associations, scope: scope).call
Expand Down
6 changes: 3 additions & 3 deletions activerecord/lib/active_record/relation/calculations.rb
Original file line number Diff line number Diff line change
Expand Up @@ -189,7 +189,7 @@ def async_sum(identity_or_column = nil)
# end
def calculate(operation, column_name)
if has_include?(column_name)
relation = apply_join_dependency
relation = apply_join_dependency(full_eager_loading: true)

if operation.to_s.downcase == "count"
unless distinct_value || distinct_select?(column_name || select_for_count)
Expand Down Expand Up @@ -251,7 +251,7 @@ def pluck(*column_names)
end

if has_include?(column_names.first)
relation = apply_join_dependency
relation = apply_join_dependency(full_eager_loading: true)
relation.pluck(*column_names)
else
klass.disallow_raw_sql!(column_names)
Expand Down Expand Up @@ -315,7 +315,7 @@ def ids
end

if has_include?(primary_key)
relation = apply_join_dependency.distinct
relation = apply_join_dependency(full_eager_loading: true).distinct
return relation.ids
end

Expand Down
5 changes: 3 additions & 2 deletions activerecord/lib/active_record/relation/finder_methods.rb
Original file line number Diff line number Diff line change
Expand Up @@ -407,9 +407,10 @@ def construct_relation_for_exists(conditions)
relation
end

def apply_join_dependency(eager_loading: group_values.empty?)
def apply_join_dependency(eager_loading: group_values.empty?, full_eager_loading: false)
selected_includes_values = full_eager_loading ? includes_values : includes_values_referenced
join_dependency = construct_join_dependency(
eager_load_values | includes_values, Arel::Nodes::OuterJoin
eager_load_values | selected_includes_values, Arel::Nodes::OuterJoin
)
relation = except(:includes, :eager_load, :preload).joins!(join_dependency)

Expand Down
87 changes: 87 additions & 0 deletions activerecord/lib/active_record/relation/includes_tracker.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,87 @@
# frozen_string_literal: true

module ActiveRecord
# Determine includes which are used / not used in refererences and joins.
module IncludesTracker # :nodoc:
def includes_values_referenced
select_includes_values_with_references(:any?, :present?)
end

def includes_values_non_referenced
select_includes_values_with_references(:all?, :blank?)
end

private
def select_includes_values_with_references(matcher, intersect_matcher)
all_references = (references_values + joins_values + left_outer_joins_values).map(&:to_s)

normalized_includes_values.select do |includes_value|
includes_tree = ActiveRecord::Associations::JoinDependency.make_tree(includes_value)

includes_values_reflections(includes_tree).public_send(matcher) do |reflection|
next true unless reliable_reflection_match?(reflection)

(possible_includes_tables(reflection) & all_references).public_send(intersect_matcher)
end
end
end

def includes_values_reflections(includes_tree)
includes_reflections = []

traverse_tree_with_model(includes_tree, self) do |association, model|
reflection = model.reflect_on_association(association)

includes_reflections << reflection

reliable_reflection_match?(reflection) ? reflection.klass : model
end

includes_reflections
end

def possible_includes_tables(reflection)
all_possible_includes_tables =
reflection.collect_join_chain.map(&:table_name) <<
reflection.alias_candidate(reflection.table_name) <<
reflection.name.to_s

if inferable_reflection_table_name?(reflection)
all_possible_includes_tables <<
reflection.join_table <<
join_table_with_postfix_alias(reflection)
end

all_possible_includes_tables
end

def reliable_reflection_match?(reflection)
reflection && inferable_reflection_klass?(reflection)
end

def inferable_reflection_klass?(reflection)
!reflection.polymorphic?
end

def inferable_reflection_table_name?(reflection)
!reflection.through_reflection?
end

def join_table_with_postfix_alias(reflection)
[reflection.join_table, reflection.alias_candidate(:join)].sort.join("_")
end

def normalized_includes_values
includes_values.map do |element|
element.is_a?(Hash) ? element.map { |key, value| { key => value } } : element
end.flatten
end

def traverse_tree_with_model(object, model, &block)
object.each do |key, value|
next_model = yield(key, model)
traverse_tree_with_model(value, next_model, &block) if next_model
end
end
end
end
Loading

0 comments on commit 44372c2

Please sign in to comment.