Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specified neither primary_conninfo nor restore_command on restore from scratch #803

Open
Hannsre opened this issue Nov 11, 2024 · 2 comments
Assignees

Comments

@Hannsre
Copy link

Hannsre commented Nov 11, 2024

Hey everyone,

sorry to be back yet again. This is more of a report though, as I already got it working. But since it's unpredictable behavior I'd like to let you know.

I destroyed my previous cluster to rebuild it from scratch using a backup. This eventually works, but it took me a total of 4 runs to get it fully restored and started - without changing any settings.

First I got this on node 3, then after another try on node 2, both are replicas:

2024-11-11 13:34:06.748 UTC [38313] LOG:  restored log file "000000020000000000000021" from archive
2024-11-11 13:34:06.751 UTC [38313] LOG:  consistent recovery state reached at 0/1E000138
2024-11-11 13:34:06.751 UTC [38313] LOG:  recovery stopping after reaching consistency
2024-11-11 13:34:06.752 UTC [38310] LOG:  shutdown at recovery target
2024-11-11 13:34:06.753 UTC [38311] LOG:  shutting down
2024-11-11 13:34:06.755 UTC [38310] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

This was during name: Wait for the PostgreSQL start command to complete

I could mitigate it by running sudo -u postgres /usr/lib/postgresql/16/bin/pg_ctl start -D /var/lib/postgresql/16/main -o '-c hot_standby=off' . After that I got server started, but the playbook already failed so I had to start over.

There was nothing else in the logs so there's nothing else to post. Ansible failed due to the exit code not being 0.

On my third run this worked fine, but another issue on the leader came up:

2024-11-11 14:12:09.731 UTC [58328] LOG:  listening on IPv6 address "::1", port 5432
2024-11-11 14:12:09.731 UTC [58328] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2024-11-11 14:12:09.731 UTC [58328] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-11-11 14:12:09.734 UTC [58331] LOG:  database system was shut down at 2024-11-11 13:52:02 UTC
2024-11-11 14:12:09.734 UTC [58331] WARNING:  specified neither primary_conninfo nor restore_command
2024-11-11 14:12:09.734 UTC [58331] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-11-11 14:12:09.734 UTC [58331] LOG:  entering standby mode
2024-11-11 14:12:09.734 UTC [58331] LOG:  invalid resource manager ID in checkpoint record
2024-11-11 14:12:09.734 UTC [58331] PANIC:  could not locate a valid checkpoint record
2024-11-11 14:12:09.735 UTC [58328] LOG:  startup process (PID 58331) was terminated by signal 6: Aborted
2024-11-11 14:12:09.735 UTC [58328] LOG:  aborting startup due to startup process failure
2024-11-11 14:12:09.735 UTC [58328] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

I guess the config wasn't properly populated/passed to the restore command?
Ansible would be stuck at waiting for port 8008 to become available.

My "fix" was to add

recovery_target_action = 'shutdown'
restore_command = '/usr/bin/pgbackrest --stanza=test-stanza archive-get %f "%p"'
recovery_target = 'immediate'
recovery_target_timeline = 'current'

at the end of /var/lib/postgresql/16/main/postgresql.conf, then run sudo -u postgres /usr/lib/postgresql/16/bin/pg_ctl start -D /var/lib/postgresql/16/main -o '-c hot_standby=off' -w -t 180 .

This led to

2024-11-11 14:13:55.272 UTC [59310] LOG:  restored log file "00000003000000000000001E" from archive
2024-11-11 14:13:55.276 UTC [59310] LOG:  invalid resource manager ID in checkpoint record
2024-11-11 14:13:55.276 UTC [59310] PANIC:  could not locate a valid checkpoint record
2024-11-11 14:13:55.277 UTC [59307] LOG:  startup process (PID 59310) was terminated by signal 6: Aborted
2024-11-11 14:13:55.277 UTC [59307] LOG:  aborting startup due to startup process failure
2024-11-11 14:13:55.277 UTC [59307] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

So at least some progress? At this point I decided to run the playbook again because the steps above that worked on the replicas did not work here.

That 4. run now just finished without issues and the cluster was back up, our app is also working and not complaining.

Config is basically the same as in #770 , I only changed the ansible command to ansible-playbook deploy_pgcluster.yml -e '@pgcn.yml' --ask-vault-pass -v

Here's what I set for pgbackrest in the vars/main.yml

# https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#bootstrap
patroni_cluster_bootstrap_method: "pgbackrest"  # or "wal-g", "pgbackrest", "pg_probackup"

# https://patroni.readthedocs.io/en/latest/replica_bootstrap.html#building-replicas
patroni_create_replica_methods:
  - pgbackrest
#  - wal_g
#  - pg_probackup
  - basebackup

pgbackrest:
  - { option: "command", value: "{{ pgbackrest_patroni_cluster_restore_command }}" }
  - { option: "keep_data", value: "True" }
  - { option: "no_params", value: "True" }
wal_g:
  - { option: "command", value: "{{ wal_g_patroni_cluster_bootstrap_command }}" }
  - { option: "no_params", value: "True" }
basebackup:
  - { option: "max-rate", value: "1000M" }
  - { option: "checkpoint", value: "fast" }
#  - { option: "waldir", value: "{{ postgresql_wal_dir }}" }
pg_probackup:
  - { option: "command", value: "{{ pg_probackup_restore_command }}" }
  - { option: "no_params", value: "true" }

# "restore_command" written to recovery.conf when configuring follower (create replica)
# postgresql_restore_command: ""
# postgresql_restore_command: "{{ wal_g_path }} wal-fetch %f %p"  # restore WAL-s using WAL-G
postgresql_restore_command: "/usr/bin/pgbackrest --stanza={{ pgbackrest_stanza }} archive-get %f %p"  # restore WAL-s using pgbackrest

Let me know if you need any more information to debug this.

@vitabaks
Copy link
Owner

@Hannsre Thanks.

Please attach /tmp/pg_recovery_<date>.log files.

You can also try running the playbook in debug mode (-v or -vvv) to get more details.

@Hannsre
Copy link
Author

Hannsre commented Nov 12, 2024

Hey,

sure, here's the log, it's the same on both replicas, despite the timestamps:

waiting for server to start....2024-11-11 13:34:06.086 UTC [38310] LOG:  starting PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-11-11 13:34:06.086 UTC [38310] LOG:  listening on IPv6 address "::1", port 5432
2024-11-11 13:34:06.086 UTC [38310] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2024-11-11 13:34:06.087 UTC [38310] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-11-11 13:34:06.089 UTC [38313] LOG:  database system was shut down in recovery at 2024-11-11 13:31:24 UTC
2024-11-11 13:34:06.089 UTC [38313] LOG:  starting point-in-time recovery to earliest consistent point
2024-11-11 13:34:06.094 P00   INFO: archive-get command begin 2.54.0: [00000002.history, pg_wal/RECOVERYHISTORY] --archive-async --archive-get-queue-max=1GiB --exec-id=38315-51b721fd --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=test --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=test-stanza
2024-11-11 13:34:06.112 P00   INFO: found 00000002.history in the repo1: 16-1 archive
2024-11-11 13:34:06.112 P00   INFO: archive-get command end: completed successfully (19ms)
2024-11-11 13:34:06.113 UTC [38313] LOG:  restored log file "00000002.history" from archive
2024-11-11 13:34:06.119 P00   INFO: archive-get command begin 2.54.0: [00000002000000000000001E, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=1GiB --exec-id=38317-f44b2a19 --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=test --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=test-stanza
2024-11-11 13:34:06.722 P00   INFO: found 00000002000000000000001E in the archive asynchronously
2024-11-11 13:34:06.722 P00   INFO: archive-get command end: completed successfully (604ms)
2024-11-11 13:34:06.723 UTC [38313] LOG:  restored log file "00000002000000000000001E" from archive
2024-11-11 13:34:06.727 UTC [38313] LOG:  redo starts at 0/1E000028
2024-11-11 13:34:06.732 P00   INFO: archive-get command begin 2.54.0: [00000002000000000000001F, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=1GiB --exec-id=38326-c8402c23 --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=test --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=test-stanza
2024-11-11 13:34:06.732 P00   INFO: found 00000002000000000000001F in the archive asynchronously
2024-11-11 13:34:06.733 P00   INFO: archive-get command end: completed successfully (2ms)
2024-11-11 13:34:06.733 UTC [38313] LOG:  restored log file "00000002000000000000001F" from archive
2024-11-11 13:34:06.740 P00   INFO: archive-get command begin 2.54.0: [000000020000000000000020, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=1GiB --exec-id=38330-f8e33a0d --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=test --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=test-stanza
2024-11-11 13:34:06.740 P00   INFO: found 000000020000000000000020 in the archive asynchronously
2024-11-11 13:34:06.740 P00   INFO: archive-get command end: completed successfully (1ms)
2024-11-11 13:34:06.741 UTC [38313] LOG:  restored log file "000000020000000000000020" from archive
2024-11-11 13:34:06.748 P00   INFO: archive-get command begin 2.54.0: [000000020000000000000021, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=1GiB --exec-id=38332-382c40c4 --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=test --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=test-stanza
2024-11-11 13:34:06.748 P00   INFO: found 000000020000000000000021 in the archive asynchronously
2024-11-11 13:34:06.748 P00   INFO: archive-get command end: completed successfully (2ms)
2024-11-11 13:34:06.748 UTC [38313] LOG:  restored log file "000000020000000000000021" from archive
2024-11-11 13:34:06.751 UTC [38313] LOG:  consistent recovery state reached at 0/1E000138
2024-11-11 13:34:06.751 UTC [38313] LOG:  recovery stopping after reaching consistency
2024-11-11 13:34:06.752 UTC [38310] LOG:  shutdown at recovery target
2024-11-11 13:34:06.753 UTC [38311] LOG:  shutting down
2024-11-11 13:34:06.755 UTC [38310] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server
Examine the log output.

Note: I replaced the S3 Endpoint and Stanza.

And on the Leader:

2024-11-11 14:12:09.731 UTC [58328] LOG:  listening on IPv6 address "::1", port 5432
2024-11-11 14:12:09.731 UTC [58328] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2024-11-11 14:12:09.731 UTC [58328] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-11-11 14:12:09.734 UTC [58331] LOG:  database system was shut down at 2024-11-11 13:52:02 UTC
2024-11-11 14:12:09.734 UTC [58331] WARNING:  specified neither primary_conninfo nor restore_command
2024-11-11 14:12:09.734 UTC [58331] HINT:  The database server will regularly poll the pg_wal subdirectory to check for files placed there.
2024-11-11 14:12:09.734 UTC [58331] LOG:  entering standby mode
2024-11-11 14:12:09.734 UTC [58331] LOG:  invalid resource manager ID in checkpoint record
2024-11-11 14:12:09.734 UTC [58331] PANIC:  could not locate a valid checkpoint record
2024-11-11 14:12:09.735 UTC [58328] LOG:  startup process (PID 58331) was terminated by signal 6: Aborted
2024-11-11 14:12:09.735 UTC [58328] LOG:  aborting startup due to startup process failure
2024-11-11 14:12:09.735 UTC [58328] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server

As well as

waiting for server to start....2024-11-11 14:13:40.993 UTC [59165] LOG:  starting PostgreSQL 16.4 (Debian 16.4-1.pgdg120+2) on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit
2024-11-11 14:13:40.993 UTC [59165] LOG:  listening on IPv6 address "::1", port 5432
2024-11-11 14:13:40.993 UTC [59165] LOG:  listening on IPv4 address "127.0.0.1", port 5432
2024-11-11 14:13:40.994 UTC [59165] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2024-11-11 14:13:40.996 UTC [59168] LOG:  database system was shut down at 2024-11-11 13:52:02 UTC
2024-11-11 14:13:40.996 UTC [59168] LOG:  entering standby mode
2024-11-11 14:13:41.001 P00   INFO: archive-get command begin 2.54.0: [00000003.history, pg_wal/RECOVERYHISTORY] --archive-async --archive-get-queue-max=1GiB --exec-id=59170-489ccfba --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=bucket --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=sip-psql-dev
2024-11-11 14:13:41.021 P00   INFO: unable to find 00000003.history in the archive
2024-11-11 14:13:41.021 P00   INFO: archive-get command end: completed successfully (22ms)
2024-11-11 14:13:41.026 P00   INFO: archive-get command begin 2.54.0: [00000003000000000000001E, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=1GiB --exec-id=59172-b7f3b91a --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=bucket --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=sip-psql-dev
2024-11-11 14:13:41.628 P00   INFO: found 00000003000000000000001E in the archive asynchronously
2024-11-11 14:13:41.628 P00   INFO: archive-get command end: completed successfully (603ms)
2024-11-11 14:13:41.629 UTC [59168] LOG:  restored log file "00000003000000000000001E" from archive
2024-11-11 14:13:41.634 UTC [59168] LOG:  invalid resource manager ID in checkpoint record
2024-11-11 14:13:41.634 UTC [59168] PANIC:  could not locate a valid checkpoint record
2024-11-11 14:13:41.634 UTC [59165] LOG:  startup process (PID 59168) was terminated by signal 6: Aborted
2024-11-11 14:13:41.634 UTC [59165] LOG:  aborting startup due to startup process failure
2024-11-11 14:13:41.635 UTC [59165] LOG:  database system is shut down
 stopped waiting
pg_ctl: could not start server

I ran ansible with single verbosity -v, but didn't save the output unfortunately. But it basically just said it stopped because the previous command/log had exited with a non-0 Exit code, which was the pg_recovery_2024-11-11.log I posted above.

Edit/Addition:
While doing some failover testing I also noticed this in the postgresql log, because the former leader node did not come back online after a forced failure (abruptly killing the VM in this case).

2024-11-12 12:54:06 UTC [8003-1]  FATAL:  could not start WAL streaming: ERROR:  requested timeline 5 is not in this server's history
2024-11-12 12:54:06.705 P00   INFO: archive-get command begin 2.54.0: [00000008.history, pg_wal/RECOVERYHISTORY] --archive-async --archive-get-queue-max=1GiB --exec-id=8013-fc40b444 --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=test --repo1-s3-endpoint=minio-backup.scootershop.dev/test --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=sip-psql-dev
2024-11-12 12:54:06.740 P00   INFO: unable to find 00000008.history in the archive
2024-11-12 12:54:06.740 P00   INFO: archive-get command end: completed successfully (37ms)
2024-11-12 12:54:06 UTC [5182-40]  LOG:  waiting for WAL to become available at 0/31000040
2024-11-12 12:54:06 UTC [8015-1] [local] postgres@postgres FATAL:  the database system is starting up
2024-11-12 12:54:06 UTC [8017-1] [local] postgres@postgres FATAL:  the database system is starting up

Both initial replicas are fine.

Another Update:

On another restore run it worked fine until it tried to bring the leader up after restoring. This is from the postgresql-Tue.log

2024-11-12 13:51:14.738 P00   INFO: archive-get command begin 2.54.0: [00000004.history, pg_wal/RECOVERYHISTORY] --archive-async --archive-get-queue-max=1GiB --exec-id=41867-3ac485e4 --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=bucket --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=sip-psql-dev
2024-11-12 13:51:14.758 P00   INFO: found 00000004.history in the repo1: 16-1 archive
2024-11-12 13:51:14.758 P00   INFO: archive-get command end: completed successfully (22ms)
2024-11-12 13:51:14 UTC [41865-2]  LOG:  restored log file "00000004.history" from archive
2024-11-12 13:51:14.764 P00   INFO: archive-get command begin 2.54.0: [00000005.history, pg_wal/RECOVERYHISTORY] --archive-async --archive-get-queue-max=1GiB --exec-id=41869-f5d57f3c --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=bucket --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=sip-psql-dev
2024-11-12 13:51:14.783 P00   INFO: unable to find 00000005.history in the archive
2024-11-12 13:51:14.783 P00   INFO: archive-get command end: completed successfully (20ms)
2024-11-12 13:51:14 UTC [41865-3]  LOG:  entering standby mode
2024-11-12 13:51:14.789 P00   INFO: archive-get command begin 2.54.0: [00000004.history, pg_wal/RECOVERYHISTORY] --archive-async --archive-get-queue-max=1GiB --exec-id=41871-d25a7a5e --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=bucket --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=sip-psql-dev
2024-11-12 13:51:14.809 P00   INFO: found 00000004.history in the repo1: 16-1 archive
2024-11-12 13:51:14.809 P00   INFO: archive-get command end: completed successfully (21ms)
2024-11-12 13:51:14 UTC [41865-4]  LOG:  restored log file "00000004.history" from archive
2024-11-12 13:51:14.816 P00   INFO: archive-get command begin 2.54.0: [00000003000000000000001E, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=1GiB --exec-id=41873-59555ee7 --log-level-console=info --log-level-file=detail --log-path=/var/log/pgbackrest --pg1-path=/var/lib/postgresql/16/main --process-max=4 --repo1-path=/pgbackrest --repo1-s3-bucket=bucket --repo1-s3-endpoint=minio.test/bucket --repo1-s3-key=<redacted> --repo1-s3-key-secret=<redacted> --repo1-s3-region=sip-backup-01 --repo1-s3-uri-style=path --repo1-type=s3 --spool-path=/var/spool/pgbackrest --stanza=sip-psql-dev
2024-11-12 13:51:14 UTC [41880-1] 192.168.88.221(38122) dbu1_scp_dev@db_scp_dev FATAL:  the database system is starting up
2024-11-12 13:51:15.418 P00   INFO: found 00000003000000000000001E in the archive asynchronously
2024-11-12 13:51:15.418 P00   INFO: archive-get command end: completed successfully (603ms)
2024-11-12 13:51:15 UTC [41865-5]  LOG:  restored log file "00000003000000000000001E" from archive
2024-11-12 13:51:15 UTC [41865-6]  LOG:  invalid resource manager ID in checkpoint record
2024-11-12 13:51:15 UTC [41865-7]  PANIC:  could not locate a valid checkpoint record
2024-11-12 13:51:15 UTC [41855-6]  LOG:  startup process (PID 41865) was terminated by signal 6: Aborted
2024-11-12 13:51:15 UTC [41855-7]  LOG:  aborting startup due to startup process failure
2024-11-12 13:51:15 UTC [41855-8]  LOG:  database system is shut down

and in ansible:

TASK [patroni : Wait for port 8008 to become open on the host] ***************************************************************************************************************************************************************************************************
ok: [192.168.88.101] => {"changed": false, "elapsed": 10, "match_groupdict": {}, "match_groups": [], "path": null, "port": 8008, "search_regex": null, "state": "started"}
FAILED - RETRYING: [192.168.88.101]: Check PostgreSQL is started and accepting connections (1000 retries left).
FAILED - RETRYING: [192.168.88.101]: Check PostgreSQL is started and accepting connections (999 retries left).
FAILED - RETRYING: [192.168.88.101]: Check PostgreSQL is started and accepting connections (998 retries left).
FAILED - RETRYING: [192.168.88.101]: Check PostgreSQL is started and accepting connections (997 retries left).
FAILED - RETRYING: [192.168.88.101]: Check PostgreSQL is started and accepting connections (996 retries left).
FAILED - RETRYING: [192.168.88.101]: Check PostgreSQL is started and accepting connections (995 retries left).

and it keeps on counting

@vitabaks vitabaks self-assigned this Nov 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants