New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

LOAD CSV improvements #887

Merged

rsill-neo4j merged 54 commits into neo4j:dev from rsill-neo4j:LOAD-CSV-improvements

Mar 8, 2024

Contributor

rsill-neo4j commented Feb 8, 2024

i think the first few (older) examples could be brought in line with the newer examples i added

also, i'm pretty sure the tests will fail left and right - the expected query results were written by hand ✋

finally, the GitHub diff is not really pretty ( = readable). there ought to be a preview of the changes though

rsill-neo4j added 7 commits

January 26, 2024 13:42


          stuff

0063d2d


          section restructuring

fca60b3


          reduced the new structure, added read-transform-create mnemonic to in…

…troduction, minor fixes


          decided on more structure and added best practices chapters

3069de7


          section headings, added a link

d92b1cd


          added missing section content

dd4b8f6


          changed structure again, added a lot of links, finished sections

8ec1019

rsill-neo4j requested review from gem-neo4j and stefano-ottolenghi

February 8, 2024 16:11

AlexicaWright reviewed

View reviewed changes

Contributor

AlexicaWright left a comment

Some comments

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

rsill-neo4j commented

View reviewed changes

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

rsill-neo4j and others added 5 commits

February 9, 2024 14:38


          Apply suggestions from code review

64bbae4

Co-authored-by: Jessica Wright <[email protected]>


          resolved merge conflicts

c77582e


          addressed more review suggestions

0a6057f


          added another link to a tutorial of how to import data from a relatio…

46c979d

…nal database


          removed a link as it's a duplicate and soonTM to be deprecated page

4c41523

stefano-ottolenghi reviewed

View reviewed changes

Contributor

stefano-ottolenghi left a comment

I think a structure that would more effectively cater for the needs of the majority is one like:

Import csv files into neo4j
- Load from local paths (one example + saying where they are sourced by default)
- Load from remote locations
- Create constraints
- Handle large amounts of data
- Import relationships and datasets coming from relational databases (expand on the section you have, this is important. It's often a source of confusion how people are supposed to handle csv files, what they should do if they get a number of csvs and what if they get one csv with all nodes and relationships inside it)
Process data ahead of import
- Cast CSV columns to relevant Neo4j types
- Split list values
- Create additional node labels
LOAD CSV options and functions
- {all subsections from CSV file format}
Performance recommendations
A full example
(Other ways of importing data into Neo4j) (just to give pointers to neo4j-admin import and drivers! Say that avoiding load csv and deferring the csv parsing+querying to your own app is a very valid choice (one that field team very often takes))

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

rsill-neo4j and others added 8 commits

February 13, 2024 17:43


          apply suggestions from code review (part 1)

6c49c37


          hopefully fixed some xrefs

a71555d


          apply suggestions from review (part 2)

ab68bc2


          applied new structure, added zip section

025c5a9


          adapted sections from the getting started article, finished structure

24f8fac


          Merge branch 'dev' into LOAD-CSV-improvements

da35329


          added section about relational databases

07d5092


          Merge branch 'LOAD-CSV-improvements' of github.com:rsill-neo4j/docs-c…

073932d

…ypher into LOAD-CSV-improvements

rsill-neo4j marked this pull request as ready for review

February 22, 2024 09:05

stefano-ottolenghi added 4 commits

February 23, 2024 19:56


          first

b269624


          another pass

c0ed222


          final

9fa970e


          fixes

e54aa9a

rsill-neo4j and others added 2 commits

March 5, 2024 14:46


          Merge branch 'LOAD-CSV-improvements' of github.com:rsill-neo4j/docs-c…

7253bc5

…ypher into LOAD-CSV-improvements


          Merge branch 'dev' into LOAD-CSV-improvements

JPryce-Aklundh reviewed

View reviewed changes

Collaborator

JPryce-Aklundh left a comment

Great work @rsill-neo4j!
I have a few editorial suggestions, and also 2 broader points:

Maybe the "Uniqueness constraints" and "Handle large amounts of data" can go into a separate section at the end, called something along the lines of "Recommended/Best practices"?
As discussed with @gem-neo4j , I think the "Performance" section could be cut from this page and perhaps be extended into its own tutorial.

Also, make sure to follow the new formating style of data types (e.g 'STRING' instead of 'string' (and 'STRING values' instead of 'strings').

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

rsill-neo4j and others added 8 commits

March 5, 2024 16:22


          fixed queries and results

c613601


          adopted formatting style for string

34e6568


          Apply suggestions from code review

9cdfed1

Co-authored-by: Jens Pryce-Åklundh <[email protected]>


          added two tables and added return statements

fcb57fd


          fixed a bullet point list

9e778e7


          adjusted formatting on some queries

b8c730c


          rephrased an admonition text

9f0c8e5


          Merge branch 'dev' into LOAD-CSV-improvements

8a65e1b

gem-neo4j reviewed

View reviewed changes

Contributor

gem-neo4j left a comment

Okay, I think I have added the correct styling (judging from other pages in the docs)

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

modules/ROOT/pages/clauses/load-csv.adoc Outdated Show resolved Hide resolved

rsill-neo4j and others added 6 commits

March 6, 2024 16:59


          Apply suggestions from code review

fe8e025

Co-authored-by: Gem Lamont <[email protected]>


          Merge branch 'dev' into LOAD-CSV-improvements

887e9c9


          moved some sections to a top level recommendations section

0fe1dc5


          moved a section, addressed some review comments

834ffe0


          added some nicer tables for results

e3322a5


          correct quotes for a table

19b12cb

gem-neo4j approved these changes

View reviewed changes

Contributor

gem-neo4j left a comment

Looks great! just one comment now, will approve assuming you will deal with it!

modules/ROOT/pages/clauses/load-csv.adoc Outdated

+              From a graph perspective, these are nodes with different labels, so it takes different queries to load them.
+              The example executes multiple passes of `LOAD CSV` on that one file, and each pass focuses on the creation of _one_ entity type.
+              This is the most performant practice, see <<_separate_creation_of_nodes_and_relationships>>.

Contributor

gem-neo4j Mar 8, 2024

Looks the <<>> part isn't rendering correctly

Contributor Author

rsill-neo4j Mar 8, 2024

good catch :) it refers to a section we commented out. gonna update

rsill-neo4j added 5 commits

March 8, 2024 14:41


          skipped a test, made results uniform wrt tables, added a note

384143d


          removed a reference

572ef79


          added csv filenames

ba66f90


          added quotes to a csv filename

12b098a


          fixed CREATE INDEX statements

975261a

Collaborator

neo-technology-commit-status-publisher commented Mar 8, 2024 •

edited

Loading

Thanks for the documentation updates.

The preview documentation has now been torn down - reopening this PR will republish it.

JPryce-Aklundh approved these changes

View reviewed changes

Collaborator

JPryce-Aklundh left a comment

Nice! Great work @rsill-neo4j!

JPryce-Aklundh added the cherry-pick-this-to-5.x label

rsill-neo4j merged commit c82783c into neo4j:dev

5 checks passed

rsill-neo4j added a commit that referenced this pull request


          LOAD CSV improvements (#887)

c205629

i think the first few (older) examples could be brought in line with the
newer examples i added

also, i'm pretty sure the tests will fail left and right - the expected
query results were written by hand ✋

finally, the GitHub diff is not really pretty ( = readable). there ought
to be a preview of the changes though

---------

Co-authored-by: Jessica Wright <[email protected]>
Co-authored-by: Stefano Ottolenghi <[email protected]>
Co-authored-by: Jens Pryce-Åklundh <[email protected]>
Co-authored-by: Gem Lamont <[email protected]>

rsill-neo4j added the cherry-picked label

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cherry-pick-this-to-5.x cherry-picked