-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LOAD CSV improvements #887
Conversation
…troduction, minor fixes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments
Co-authored-by: Jessica Wright <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a structure that would more effectively cater for the needs of the majority is one like:
- Import csv files into neo4j
- Load from local paths (one example + saying where they are sourced by default)
- Load from remote locations
- Create constraints
- Handle large amounts of data
- Import relationships and datasets coming from relational databases (expand on the section you have, this is important. It's often a source of confusion how people are supposed to handle csv files, what they should do if they get a number of csvs and what if they get one csv with all nodes and relationships inside it)
- Process data ahead of import
- Cast CSV columns to relevant Neo4j types
- Split list values
- Create additional node labels
- LOAD CSV options and functions
- {all subsections from CSV file format}
- Performance recommendations
- A full example
- (Other ways of importing data into Neo4j) (just to give pointers to
neo4j-admin import
and drivers! Say that avoiding load csv and deferring the csv parsing+querying to your own app is a very valid choice (one that field team very often takes))
…ypher into LOAD-CSV-improvements
…ypher into LOAD-CSV-improvements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @rsill-neo4j!
I have a few editorial suggestions, and also 2 broader points:
- Maybe the "Uniqueness constraints" and "Handle large amounts of data" can go into a separate section at the end, called something along the lines of "Recommended/Best practices"?
- As discussed with @gem-neo4j , I think the "Performance" section could be cut from this page and perhaps be extended into its own tutorial.
Also, make sure to follow the new formating style of data types (e.g 'STRING
' instead of 'string' (and 'STRING
values' instead of 'strings').
Co-authored-by: Jens Pryce-Åklundh <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I think I have added the correct styling (judging from other pages in the docs)
Co-authored-by: Gem Lamont <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! just one comment now, will approve assuming you will deal with it!
From a graph perspective, these are nodes with different labels, so it takes different queries to load them. | ||
|
||
The example executes multiple passes of `LOAD CSV` on that one file, and each pass focuses on the creation of _one_ entity type. | ||
This is the most performant practice, see <<_separate_creation_of_nodes_and_relationships>>. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks the <<>> part isn't rendering correctly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch :) it refers to a section we commented out. gonna update
Thanks for the documentation updates. The preview documentation has now been torn down - reopening this PR will republish it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Great work @rsill-neo4j!
i think the first few (older) examples could be brought in line with the newer examples i added also, i'm pretty sure the tests will fail left and right - the expected query results were written by hand ✋ finally, the GitHub diff is not really pretty ( = readable). there ought to be a preview of the changes though --------- Co-authored-by: Jessica Wright <[email protected]> Co-authored-by: Stefano Ottolenghi <[email protected]> Co-authored-by: Jens Pryce-Åklundh <[email protected]> Co-authored-by: Gem Lamont <[email protected]>
i think the first few (older) examples could be brought in line with the newer examples i added
also, i'm pretty sure the tests will fail left and right - the expected query results were written by hand ✋
finally, the GitHub diff is not really pretty ( = readable). there ought to be a preview of the changes though