Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUGFIX: Improve performance on find*Aggregates #5268

Merged

Conversation

dlubitz
Copy link
Contributor

@dlubitz dlubitz commented Sep 27, 2024

The queries to determine the child or parent aggregates have been joined the hierarchy relation (hr) twice. Once for the parent and once for the children. But the hr does contain already both anchorpoints, so a second join is not needed.

Replay without this change:

Replaying events for projection "doctrineDbalContentGraph" of Content Repository "default" ...
 60440/60440 [============================] 100% 13 mins, 59 secs/13 mins, 59 secs 52.0 MiB

Replay with this change:

Replaying events for projection "doctrineDbalContentGraph" of Content Repository "default" ...
 60440/60440 [============================] 100%  4 mins, 13 secs/4 mins, 13 secs  52.0 MiB

Fixes: #5269

@dlubitz dlubitz added the 9.0 label Sep 27, 2024
@dlubitz dlubitz self-assigned this Sep 27, 2024
@github-actions github-actions bot added the Bug label Sep 27, 2024
@mhsdesign
Copy link
Member

Sadly there are no test for findParentNodeAggregates and maybe the other find*Aggregates so the ci passing might not be a reliable indicator. Id say we need tests here as well ._.

@dlubitz
Copy link
Contributor Author

dlubitz commented Sep 27, 2024

Sadly there are no test for findParentNodeAggregates and maybe the other find*Aggregates so the ci passing might not be a reliable indicator. Id say we need tests here as well ._.

But they are tested within the behat tests. If you mess up there, a lot of tests are failing.

Copy link
Member

@skurfuerst skurfuerst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct, but I'd love @nezaniel to have a second look over it :)

Copy link
Member

@bwaidelich bwaidelich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing, thank you!

@mhsdesign
Copy link
Member

mhsdesign commented Sep 30, 2024

It seems we found the culprit. The reason for this obsolete join is history. Originally the nodename was stored in the hierarchy relation table. And with #5018 as stated:

Node names are now stored in the contentgraph's node table

The node name is now in the node table.

There are actually more hidden places that can seemingly be optimised we found at least 4 places where a join was annotated being made for the legacy reason:

// we need to join with the hierarchy relation, because we need the node name.

@dlubitz dlubitz marked this pull request as draft October 1, 2024 13:29
@dlubitz
Copy link
Contributor Author

dlubitz commented Oct 1, 2024

This seems to be a bit more complex than expected. I need to check if the tests are covering all cases and if the changes still make sence.

@dlubitz dlubitz force-pushed the 90/bugfix/performance-find-parent-node-aggregate branch from eaaf7e7 to 4f058e3 Compare October 2, 2024 18:37
@dlubitz dlubitz marked this pull request as ready for review October 2, 2024 18:39
@dlubitz
Copy link
Contributor Author

dlubitz commented Oct 2, 2024

I've splitted the changes. This PR now only contains the first change I did. This only affects the NodeAggregates query, which we can safely improve, as they don't join subtree tags.

The other changes are now in a draft PR #5273. I need to find a way to test these queries, especially the resulting subtree tags. As it makes a difference for subtree tags if I use the parentanchorpoint or the childanchorpoint to join the node data.

return $this->createQueryBuilder()
->select('n.*, h.contentstreamid, h.subtreetags, dsp.dimensionspacepoint AS covereddimensionspacepoint')
->from($this->tableNames->node(), 'n')
->innerJoin('n', $this->tableNames->hierarchyRelation(), 'h', 'h.parentnodeanchor = n.relationanchorpoint')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fyi the only difference to the previously used buildBasicNodeAggregateQuery is that we join on h.parentnodeanchor instead of h.childnodeanchor

I had the feeling introducing this might deduplicate code another place and it could:

In findParentNodeAggregateByChildOriginDimensionSpacePoint we use the same logic besides one additional where. So i think it could be nicely deduplicated like:

public function findParentNodeAggregateByChildOriginDimensionSpacePoint(NodeAggregateId $childNodeAggregateId, OriginDimensionSpacePoint $childOriginDimensionSpacePoint): ?NodeAggregate
    {
        $subQueryBuilder = ...;
    
        $queryBuilder = $this->nodeQueryBuilder->buildParentNodeAggregateQuery()
            ->andWhere('n.nodeaggregateid = (' . $subQueryBuilder->getSQL() . ')')
            ->setParameters([
                'contentStreamId' => $this->contentStreamId->value,
                'childNodeAggregateId' => $childNodeAggregateId->value,
                'childOriginDimensionSpacePointHash' => $childOriginDimensionSpacePoint->hash,
            ]);

Also i dont exactly understand how this utility is supposed to work (when do we pass arguments or when to use setParameters) but because we dont return a final query i think its buildBasicParentNodeAggregateQuery

Copy link
Contributor Author

@dlubitz dlubitz Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is your comment only for clarification? Or do you complain about the partially duplication of the queries and want to get that changed?

Copy link
Member

@mhsdesign mhsdesign Oct 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was just thoroughly reviewing this and partly investigating and trying to understand. So i guess 50/50 ill create a followup pr :D

_> #5276

Copy link
Member

@mhsdesign mhsdesign left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By reading, thanks for the effort and also trying to improve the other queries (after me sidetracking you into that idea which turned out to be more complex^^)

@mhsdesign mhsdesign merged commit c14a159 into neos:9.0 Oct 7, 2024
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: Very slow doctrineDbalContentGraph projection replay
4 participants