Barman has an errors when I restore base #1047

hazard2005 · 2024-12-20T14:35:29Z

Hi.
Sometimes I need to restore Postgres bases from prod to dev environment. The database is managed by Patroni/Consul and run in Nomad.
So after restore I have 2 errors on Barman server - barman check all:

replication slot: FAILED (slot 'barman' not initialised: is 'receive-wal' running?)
archiver errors: FAILED (duplicates: 11)

I'm using migration scheme - run migration job in Nomad which create new service in Consul and restore backup to this host. Then I change connection service from my old base to this service. When data migration to the "old" base hosts was ended I turn off migration job and one of the hosts becomes the leader.
So I need to change the leader value in conninfo and streaming_conninfo. Error 1 disappears.
To solve error 2 I delete all files in errors directory. Sometimes it helps but sometimes it doesn't.
As I understood (I'm newbee in Barman) SELECT timeline_id FROM pg_control_checkpoint(); control_checkpoint doesn't match with Barman checkpoint.
So the question are - how quickly fix archiver error or do restore base another correct way?
I saw similar question in issue #897 but there are no answer.

The text was updated successfully, but these errors were encountered:

martinmarques · 2025-01-17T14:58:30Z

Hi,

Your message above is not very clear in what actions you have taken and not even on the end result.

On the actions side you mention cloning (or restoring) production databases to lower dev environments. Then later you talk about a "migration schema". We need more clarity into how Barman fits in this cloning/migration. Better if you give us step by step execution commands.

Given the "restore" and later "errors", it looks like you may be backing up with Barman both environments (prod and dev). Is this true? Which clusters are being backed up?

On the result side, you mention errors when running barman check all, but that execution will check all servers configured. Which servers are failing? Did you check if the slot is present on the server you are connected to? And regarding the duplicate WAL files, did you verify if they are legit duplicates (check if they are already in wals directory and that they have the same sha256 hash (meaning they are the same files).

Also, why are you changing the conninfo? Does that have anything to do with Patroni failing over and so Barman needs to switch to take backups and stream from a different node? That would explain the first error you have.

barthisrael added triage moreinfo and removed triage labels Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Barman has an errors when I restore base #1047

Barman has an errors when I restore base #1047

hazard2005 commented Dec 20, 2024

martinmarques commented Jan 17, 2025

Barman has an errors when I restore base #1047

Barman has an errors when I restore base #1047

Comments

hazard2005 commented Dec 20, 2024

martinmarques commented Jan 17, 2025