Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
refactor(flex): Replace the Adhoc csv reader with Arrow CSV reader (#…
…3154) 1. Use Arrow CSV Reader to replace current adhoc csv reader, to support more configurable options in `bulk_load.yaml`. 2. Introduce `CSVFragmentLoader`, `BasicFragmentLoader` for `MutablePropertyFragment`. With this PR merged, `MutablePropertyFragment` will support loading fragment from csv with options: - delimeter: default '|' - header_row: default true - quoting: default false - quoting_char: default '"' - escaping: default false - escaping_char: default'\\' - batch_size: the batch size of when reading file into memory, default 1MB. - batch_reader: default false. If set to true, `arrow::csv::StreamingReader` will be used to parse the input file. Otherwise, `arrow::TableReader` will be used. With this PR merged, the performance of graph loading will be improved. The Adhoc Reader denote the current implemented csv parser, 1,2,4,8 denotes the parallelism of graph loading, i.e. how many labels of vertex/edge are concurrently processed. Note that TableReader is around 10x faster than StreamingReader. The possible reason could be the multi-threading is used. See [arrow-csv-doc](https://arrow.apache.org/docs/cpp/csv.html) for details. ### SF30 | Reader | Phase | 1 | 2 | 4 | 8 | | --------- | -------------- | ---------- |---------- |---------- |---------- | | Adhoc Reader | ReadFile\+LoadGraph |805s| 468s| 349s| 313s| | Adhoc Reader | Serialization | 126s| 126s| 126s| 126s| | Adhoc Reader | **Total** |931s| 594s| 475s| 439s| | Table Reader | ReadFile | 9s |9s |9s| 9s| | Table Reader | LoadGraph |455s| 280s| 211s| 182s| | Table Reader |Serialization |126s| 126s| 126s| 126s| | Table Reader | **Total** | 600s| 415s| 346s| 317s| | Streaming Reader | ReadFile |91s| 91s| 91s| 91s| | Streaming Reader | LoadGraph | 555s| 289s| 196s| 149s| | Streaming Reader | Serialization |126s| 126s| 126s| 126s| | Streaming Reader | **Total** | 772s| 506s| 413s| 366s| ### SF100 | Reader | Phase | 1 | 2 | 4 | 8 | | --------- | -------------- | ---------- |---------- |---------- |---------- | | Adhoc Reader | ReadFile\+LoadGraph |2720s| 1548s| 1176s| 948s| | Adhoc Reader | Serialization | 409s| 409s| 409s| 409s| | Adhoc Reader | **Total** | 3129s| 1957s| 1585s| 1357s| | Table Reader | ReadFile |24s| 24s| 24s| 24s| | Table Reader | LoadGraph |1576s| 949s| 728s| 602s| | Table Reader |Serialization |409s| 409s| 409s| 409s| | Table Reader | **Total** | 2009s| 1382s| 1161s| 1035s| | Streaming Reader | ReadFile |300s| 300s| 300s| 300s| | Streaming Reader | LoadGraph | 1740s| 965s| 669s| 497s| | Streaming Reader | Serialization | 409s| 409s| 409s| 409s| | Streaming Reader | **Total** | 2539s| 1674s| 1378s| 1206s| ### SF300 | Reader | Phase | 1 | 2 | 4 | 8 | | --------- | -------------- | ---------- |---------- |---------- |---------- | | Adhoc Reader | ReadFile\+LoadGraph | 8260s| 4900s |3603s |2999s| | Adhoc Reader | Serialization | 1201s | 1201s| 1201s |1201s| | Adhoc Reader | **Total** | 9461s| 6101s | 4804s |4200s| | Table Reader | ReadFile | 73s |73s| 96s| 96s| | Table Reader | LoadGraph |4650s| 2768s| 2155s |1778s| | Table Reader |Serialization | 1201s | 1201s| 1201s |1201s| | Table Reader | **Total** | 5924s| 4042s| 3452s| 3075s| | Streaming Reader | ReadFile | 889s |889s | 889s| 889s| | Streaming Reader | LoadGraph | 5589s| 3005s| 2200s| 1712s| | Streaming Reader | Serialization | 1201s| 1201s| 1201s |1201s | | Streaming Reader | **Total** | 7679s | 5095s |4290s| 3802s| FIx #3116
- Loading branch information