Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decommission puppetmaster.theforeman.org #1805

Closed
2 tasks done
ekohl opened this issue Jan 13, 2023 · 13 comments
Closed
2 tasks done

Decommission puppetmaster.theforeman.org #1805

ekohl opened this issue Jan 13, 2023 · 13 comments
Assignees

Comments

@ekohl
Copy link
Member

ekohl commented Jan 13, 2023

Via #1777, #1686 and #1685 most things were moved elsewhere.

Currently this remains:

  • Backups (/srv/backups)
  • Secrets (/srv/secretsgit)

Backups

This implies it needs to be accessible from our various servers (CI, Discourse, Puppet, Foreman, Redmine). In picking a host we need to consider network access (IPv4, IPv6) and location. If we host the backup on the same physical server then there's point: if the hardware burns down it's lost.

It may be wise to consider multiple hosts as destinations.

Secrets

Similar to backups, there is the concern of network access. In addition to that all people with access need to update their remote. It's possible to make puppetmaster a CNAME to the new host, but I'd prefer to retire that name instead.

theforeman/theforeman-rel-eng#171 is where we document the secret storage, but it's good to reach out to everyone with access individually.

While doing so we can also remove people's access when they've moved on to different projects.

@evgeni
Copy link
Member

evgeni commented Jan 14, 2023

How much storage is the backup?

@ekohl
Copy link
Member Author

ekohl commented Jan 14, 2023

# du -csh backups/*
17G	backups/ci
12G	backups/ci-jenkins
352M	backups/discourse
133M	backups/puppetmaster
1.2G	backups/redmine
12K	backups/ssh
30G	total

backups/ci looks like a full disk backup (I see /lib, /boot, etc). For our Jenkins controller I think it's sufficient to backup /var/lib/jenkins. This is actually what we have in backups/ci-jenkins. Even that looks large. Looks like from a single backup it's jobs which is 7 GB out of 8 GB. Diving deeper:

33M	smart-proxy-openscap-test
62M	smart-proxy-openscap-pull-request
66M	test_hammer_cli_foreman_pull_request
185M	test_3_3_stable
188M	foreman_host_extra_validator-pull-request
188M	test_3_4_stable
190M	foreman_default_hostgroup-pull-request
190M	foreman_kubevirt-pull-request
190M	test_3_5_stable
191M	foreman_setup-pull-request
370M	foreman_templates-pull-request
375M	puppetdb_foreman-pull-request
533M	foreman_bootdisk-pull-request
554M	foreman_openscap-pull-request
732M	foreman_discovery-pull-request
3.0G	test_plugin_matrix
7.0G	total

Looking into those directories I see a lot of workspace*/foreman directories which is a full Foreman checkout, which weighs in at 189 M each.

So I think there's a lot of room to improve here.

@ekohl
Copy link
Member Author

ekohl commented Jan 14, 2023

Thinking more about this: why are these executed on the controller at all? We should be building on nodes.

@ekohl
Copy link
Member Author

ekohl commented Jan 17, 2023

I do see we exclude /jobs/**/builds/ for ci-jenkins, but if we don't store the builds then what's the point of backing up the jobs in the first place? We have the configs in jenkins-jobs, so should we exclude /jobs/?

@ekohl
Copy link
Member Author

ekohl commented Jan 17, 2023

I ended up reverting #1808 since dirvish is not packaged and unmaintained. Switching over was part of #695 but we never really resolved that.

For my own servers I use restic. Any objection to going that route? Suggestions for alternatives?

@evgeni
Copy link
Member

evgeni commented Jan 17, 2023

I too use restic, so strong 👍

@ekohl
Copy link
Member Author

ekohl commented Jan 17, 2023

Secrets have been moved. theforeman/theforeman-rel-eng#171 still need to be merged and I've emailed everyone to update their git remotes.

As for restic: I have puppetized my own setup a bit (not 100%) so I'll try to publish that module so we can reuse it.

@ekohl
Copy link
Member Author

ekohl commented Jan 17, 2023

In ekohl/puppet-restic#1 I made a start with this.

@ekohl
Copy link
Member Author

ekohl commented Apr 14, 2023

I've deployed an initial version and after a few bumps it's been deployed on Redmine.

TODO items:

  • Set AmbientCapabilities=CAP_DAC_READ_SEARCH in [Service] (or otherwise deal with file permissions)
  • Add backups.theforeman.org SSH key as global known host via Puppet
  • Look into more native SQL backups - right now it's relying on the DB dump within /var/lib/redmine but with restic you can read stdin. The cron schedule may also not match up now, so if anything it should be a PreExec command to create the dump.

After that backing up Jenkins is a good next step. Today /etc on pupppetmaster is also backed up, but that doesn't make much sense since on the new server it's all Puppetized. It may make sense to back up foreman01's DB.

@ekohl
Copy link
Member Author

ekohl commented Apr 14, 2023

Set AmbientCapabilities=CAP_DAC_READ_SEARCH in [Service] (or otherwise deal with file permissions)

voxpupuli/puppet-systemd#329 would be a prerequisite. Then puppet-restic can use it.

@ekohl
Copy link
Member Author

ekohl commented Apr 18, 2023

6d54532 takes care of the CAP_DAC_READ_SEARCH capability. Redmine is now backing up daily.

Next step is to verify the backups contain good content and can be restored. Once that's done, manage the global SSH known host entry and apply it to Jenkins.

@ekohl
Copy link
Member Author

ekohl commented Apr 20, 2023

Add backups.theforeman.org SSH key as global known host via Puppet

8465158

Next step is to verify the backups contain good content and can be restored.

The exclude path was wrong, so it also backed up all git repositories: b7bf7cc

This decreased the backup from 2 GB to 632 MB.

apply it to Jenkins.

#1838 includes that.

@ekohl
Copy link
Member Author

ekohl commented Apr 25, 2023

Last week Jenkins was added. Yesterday I looked at Discourse. Turns out the Discourse backups we did make never contained the real files. So even though I hadn't verified the backups were made correctly, yesterday I turned off the server.

2bbdb33 & 0e68fdb mostly worked for Discourse. Just needs #1842 which I already did locally to verify the backups were correct. So now we will have good backups.

Needs some further iteration, but that'll be captured in new issues.

@ekohl ekohl closed this as completed Apr 25, 2023
@ekohl ekohl moved this to Done in Infrastructure Feb 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants