Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Barman has an errors when I restore base #1047

Open
hazard2005 opened this issue Dec 20, 2024 · 1 comment
Open

Barman has an errors when I restore base #1047

hazard2005 opened this issue Dec 20, 2024 · 1 comment
Labels

Comments

@hazard2005
Copy link

Hi.
Sometimes I need to restore Postgres bases from prod to dev environment. The database is managed by Patroni/Consul and run in Nomad.
So after restore I have 2 errors on Barman server - barman check all:

  1. replication slot: FAILED (slot 'barman' not initialised: is 'receive-wal' running?)
  2. archiver errors: FAILED (duplicates: 11)

I'm using migration scheme - run migration job in Nomad which create new service in Consul and restore backup to this host. Then I change connection service from my old base to this service. When data migration to the "old" base hosts was ended I turn off migration job and one of the hosts becomes the leader.
So I need to change the leader value in conninfo and streaming_conninfo. Error 1 disappears.
To solve error 2 I delete all files in errors directory. Sometimes it helps but sometimes it doesn't.
As I understood (I'm newbee in Barman) SELECT timeline_id FROM pg_control_checkpoint(); control_checkpoint doesn't match with Barman checkpoint.
So the question are - how quickly fix archiver error or do restore base another correct way?
I saw similar question in issue #897 but there are no answer.

@martinmarques
Copy link
Contributor

Hi,

Your message above is not very clear in what actions you have taken and not even on the end result.

On the actions side you mention cloning (or restoring) production databases to lower dev environments. Then later you talk about a "migration schema". We need more clarity into how Barman fits in this cloning/migration. Better if you give us step by step execution commands.

Given the "restore" and later "errors", it looks like you may be backing up with Barman both environments (prod and dev). Is this true? Which clusters are being backed up?

On the result side, you mention errors when running barman check all, but that execution will check all servers configured. Which servers are failing? Did you check if the slot is present on the server you are connected to? And regarding the duplicate WAL files, did you verify if they are legit duplicates (check if they are already in wals directory and that they have the same sha256 hash (meaning they are the same files).

Also, why are you changing the conninfo? Does that have anything to do with Patroni failing over and so Barman needs to switch to take backups and stream from a different node? That would explain the first error you have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants