Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up database writes with synchronous_commit=off (and full_page_write=off and fsync=off?) #77

Open
anthonyfok opened this issue Mar 20, 2021 · 4 comments · May be fixed by #94
Open

Comments

@anthonyfok
Copy link
Member

anthonyfok commented Mar 20, 2021

2021-11-22 update: This issue was essentially fixed in PR #105 (Pipeline optimization, Sprint 34–36) merged in June 2021; see [https://github.com//issues/77#issuecomment-975802148](reasons why I am keeping this issue open for the time being]


docker-compose up --build currently takes several hours to run. In particular, importing data into PostgreSQL database seems somewhat slow to me, apparently synchronizing to the physical disk for each single operation.

It turns out there are PostgreSQL settings to optimize its speed, and some users report like 10x speed improvement for e.g. CI test runs.

@anthonyfok
Copy link
Member Author

anthonyfok commented Mar 20, 2021

Hi @drotheram, please assign me to this issue (and add labels, link to project, etc.) when you have time. Many thanks!

Tasks:

  • Benchmark, before and after. (with time or with docker-compose logs?)
  • Relevant fsync=off, synchronous_commit=off, full_page_write=off
  • Shutdown postgis/postgis and restart it with just synchronous_commit=off when we are done?
  • (optional) Use pg_dump to backup the databases and store them somewhere, probably with a README.md or machine-readable file

Relevant links:


Related links (to be sorted):

eatmydata (no, it doesn't work with PostgreSQL

Misc:

@anthonyfok
Copy link
Member Author

One place to test this during psql: Running Create_all_tables_update.sql...:

Without synchronous_commit=off etc.:

python-opendrr_1         | 2021-05-20T18:13:20.046554364Z UPDATE exposure.canada_exposure SET geom = st_setsrid(st_makepoint(SauidLon,SauidLat),4326);
python-opendrr_1         | 2021-05-20T18:22:10.956967658Z UPDATE 2237515

With synchronous_commit=off etc.:

[TODO]

anthonyfok added a commit to anthonyfok/opendrr-api that referenced this issue May 21, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 34, Sprint 35 May 25, 2021
@anthonyfok
Copy link
Member Author

Reminder (to self): Of the three parameters, only synchronous_commit can be set with ALTER DATABASE, and that was done in add_data.sh in PR #105:

$ psql opendrr -a
psql (13.3 (Debian 13.3-1))
Type "help" for help.

opendrr=> ALTER DATABASE opendrr SET fsync TO off;
ERROR:  parameter "fsync" cannot be changed now
opendrr=> ALTER DATABASE opendrr SET full_page_writes TO off;
ERROR:  parameter "full_page_writes" cannot be changed now
opendrr=> ALTER DATABASE opendrr SET synchronous_commit TO off;
ALTER DATABASE
opendrr=> 

So, if we want to set fsync and full_page_writes to off too, that will need to be done via #94, pending more benchmark and ways to ensure the database is fully written to the disk at the end of the docker-compose run.

@anthonyfok anthonyfok modified the milestones: Sprint 39, Sprint 40 Aug 5, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 39, Sprint 41, Sprint 42 Sep 9, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 42, Sprint 43 Sep 23, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 43, Sprint 44 Oct 13, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 44, Sprint 45 Oct 25, 2021
@anthonyfok anthonyfok modified the milestones: Sprint 45, Sprint 46 Nov 8, 2021
@anthonyfok anthonyfok removed this from the Sprint 46 milestone Nov 22, 2021
@anthonyfok anthonyfok changed the title postgis/postgis: Speed up database writes with synchronous_commit=off or more Speed up database writes with synchronous_commit=off (and full_page_write=off and fsync=off?) Nov 22, 2021
@anthonyfok
Copy link
Member Author

Issue status

synchronous_commit = off has been implemented in the set_synchronous_commit() function in from add_data.sh:

Why am I keeping this issue open:

  • Lacking proper (unit) benchmark
  • full_page_write=off and fsync=off can or cannot be done from add_data.sh?
  • Earlier "draft" Pull Request Speed up PostGIS database with with synchronous_commit=off and more #94 that introduces a one-line change to docker-compose.yml, but which I didn't end up using as I do not yet know how to turn the settings back on for data integrity reasons.

To revisit in year 2022. :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants