Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Meta-issue] Optimize pipeline (python/add_data.sh etc.) #76

Open
12 of 23 tasks
anthonyfok opened this issue Mar 19, 2021 · 1 comment
Open
12 of 23 tasks

[Meta-issue] Optimize pipeline (python/add_data.sh etc.) #76

anthonyfok opened this issue Mar 19, 2021 · 1 comment

Comments

@anthonyfok
Copy link
Member

anthonyfok commented Mar 19, 2021

Goals include:

  • Reduce download time, build time, disk usage...
  • Increase robustness / resilience (e.g. recovering from interrupted download)
  • ... (to be continued)

Future tasks (that have yet to be turned into GitHub issues):

  • Use of e.g. /usr/bin/time -v for profiling
  • docker-compose logs -f -t provides log with timestamp
  • Some kind of DEBUG variable? e.g. Make the psql flag -a or --echo-all optional unless in DEBUG mode for a more concise log.
  • Add option to delete downloaded *.gpkg and *.csv files as soon as they have been imported to save space
  • etc.

Maybe in Round 2 of refactoring? Or this round? Need to discuss with Drew first:

  • Leave the model-factory/scripts/* files where they are instead of copying them?
  • Use e.g. _build and _data directories to separate our code from downloaded data and temporary build files?

Random ideas, questions, etc.

@anthonyfok
Copy link
Member Author

anthonyfok commented Mar 31, 2021

[Edited] See #88 (comment) for a more complete benchmark (March 19 vs April 27)

Benchmark (in progress, to be edited)

Before:

Duration Command
2s git clone https://github.com/OpenDRR/model-factory.git --depth 1
4m58s [Download] git clone https://github.com/OpenDRR/boundaries.git --depth 1
3m08 [Import] ogr2ogr run on the 9 .gpkg files from git clone of OpenDRR/boundaries
... ...

After:

Duration Command
2s git clone https://github.com/OpenDRR/model-factory.git --depth 1
43s to 1m20s wget https://opendrr.eccp.ca/file/OpenDRR/opendrr-boundaries.dump
... ...

@anthonyfok anthonyfok pinned this issue Apr 26, 2021
@anthonyfok anthonyfok added this to the Sprint 33 milestone Apr 26, 2021
@anthonyfok anthonyfok self-assigned this Apr 26, 2021
@anthonyfok anthonyfok changed the title [Meta-issue] Optimize python/add_data.sh [Meta-issue] Optimize pipeline (python/add_data.sh etc.) Apr 28, 2021
@jvanulde jvanulde removed this from the Sprint 33 milestone May 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants