-
Notifications
You must be signed in to change notification settings - Fork 617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SimpleNeo4jRepository.findAll(Pageable) fails on big databases #2597
Comments
Thanks for reporting this. There is definitely a missed skip/limit in the initial query. If you don't have it done already, here are the repository definitions for Spring snapshots and milestones to be included in your repository definition.
|
I tested version 6.3.4-GH-2597-SNAPSHOT but it still OOMs when asking for a page of 20 elements of a heavy interconnected graph with ~10M nodes I don't really know if the issue arises from this same line of code, I can't enter in debug now. |
But the underlying problem is that there are a lot relationships on the next level, if I understand you correctly. |
Yeah I believe you have understood perfectly my case, my workaround was indeed a custom query - I'll stick with it since it is the only way to hydrate up to a limited depth.
It is worth noticing, though, that my use case is using Spring Data Rest, which uses HATEOAS to resolve the depth problem on the JSON serialization side.
So I'm in a case in which the 2 spring libraries don't integrate nicely until I add some custom queries - it looks a bit redundant, but I don't have a suggestion on how to solve the redundancy.
Ottenere Outlook per Android<https://aka.ms/AAb9ysg>
…________________________________
From: Gerrit Meier ***@***.***>
Sent: Thursday, October 6, 2022 11:39:57 AM
To: spring-projects/spring-data-neo4j ***@***.***>
Cc: Federico Bonelli ***@***.***>; Author ***@***.***>
Subject: Re: [spring-projects/spring-data-neo4j] SimpleNeo4jRepository.findAll(Pageable) fails on big databases (Issue #2597)
But the underlying problem is that there are a lot relationships on the next level, if I understand you correctly.
Using here the Movie graph as an example, where a Movie has various Actors (let's say in your case > 1 M ;) ).
If you query for the Movies with a Pageable, you will only reduce the amount of initial Movies getting fetched (with this patch). But you do not reduce the amount of connected Actors. This would be counter-intuitive to apply the limit also on the relationships because the Java objects will only be partially hydrated in the end.
I might got your comment wrong, but if I am right, you should think about using a custom Cypher statement and use the Pageable information within your query as described in the documentation https://docs.spring.io/spring-data/neo4j/docs/current/reference/html/#faq.custom-queries-with-page-and-slice
—
Reply to this email directly, view it on GitHub<#2597 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB5RRGP2GVJMQ4XI7OY7ZTTWB2M63ANCNFSM6AAAAAAQQF4CMA>.
You are receiving this because you authored the thread.Message ID: ***@***.***>
|
Before I am going to merge the changes for the initial query and close this, I would like to talk about, what you would expect?
|
This is my use-case:
At the moment the default behavior of SD Rest is to call the default I read about the reasons behind SDN6 hydrating the whole model in memory, but it can be very limiting in many cases with heavily interconnected graph databases. |
But you would always run into this issue also with other SD modules. The fix above limits now the amount of root data. What happens on the next level of related entities will never get modified by the |
About this issue, I finally worked around it by overriding the queries for I need to understand more about how I can work with a huge interconnected database with SDN, because the way I'm using it is very limiting at the moment. Is there any general way to write and read a single entity without having the whole database in memory? |
I think that the issue #2762 is exactly what is left from the discussion here. The limit feature got merged in context of another issue. |
Invoking
SimpleNeo4jRepository.findAll(Pageable)
will cause OOME on databases with many nodes, even with a small page size.After some debugging, I found the probable cause in these lines of the
Neo4jTemplate.createNodesAndRelationshipsByIdStatementProvider()
method:spring-data-neo4j/src/main/java/org/springframework/data/neo4j/core/Neo4jTemplate.java
Lines 1142 to 1144 in 402c0ba
The
rootNodesStatement
that is used to detect circular dependencies is quering the database for all nodes of the given type, extracting their ID.This is no big deal on small databases, but in my case I have a single database with +10M nodes of the given type (e.g. Foo nodes), and the JVM starts downloading many millions of IDs from the database, eventually crashing for OOME or failing the execution for query timeout.
Is it essential to request the complete set of IDs for all of the Foo entities in the database, even when I'm just trying to fetch one page with 20sh elements?
The text was updated successfully, but these errors were encountered: