Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: Launchpad can not upgrade itself #19

Open
2 tasks done
rswrz opened this issue Dec 17, 2024 · 1 comment
Open
2 tasks done

[Bug]: Launchpad can not upgrade itself #19

rswrz opened this issue Dec 17, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@rswrz
Copy link
Member

rswrz commented Dec 17, 2024

Bug Description

When a change is made to the virtual_machine_scale_set in the Launchpad module, the Launchpad inadvertently updates itself. This creates a problematic situation because the update is initiated by the Launchpad’s GitHub Private Runner.

In many cases, this triggers a rollout of updates to the VMSS instance, which hosts the Launchpad currently executing Terraform. As a result, the GitHub Runner is terminated mid-execution, causing Terraform to fail. This failure leaves behind:

  • A locked Terraform state
  • An incomplete and inconsistent Terraform state

This behavior disrupts workflows and requires manual intervention to recover the environment.

Terraform-Version

any

Relevant log output

Relevant Error Messages

Additional Information

Root Cause and Suggested Solution

The primary issue lies in the upgrade_mode setting of the VMSS. Currently, it is set to Automatic, which causes problems when Terraform execution is running on the Launchpad itself, as any updates to the Launchpad VMSS trigger a self-termination of the instance executing Terraform.

To resolve this issue, the following changes are recommended:

  1. Set upgrade_mode to Manual (the default) to prevent automatic updates.
  2. Configure the azurerm provider feature flag reimage_on_manual_upgrade to false. This ensures that manual upgrades of the VMSS do not re-image the currently running instances (the GitHub Runner).

With these changes, the Launchpad VMSS will no longer automatically re-image itself during Terraform execution.

Next Steps

We need a method for rolling out VMSS upgrades without manual intervention. This process should not be handled by Terraform, running on the Private GitHub Runner (Launchpad).

Public runners are unsuitable due to the strict and private nature of the Storage Account holding the Terraform Remote State.

A potential solution could involve using the az CLI in a dedicated GitHub Actions job to re-image the Launchpad VMSS instances after the Terraform Apply job completes successfully.

Privacy Statement

  • I agree

Code of Conduct

  • I agree to follow this repository's Code of Conduct
@rswrz rswrz added the bug Something isn't working label Dec 17, 2024
@rswrz rswrz assigned rswrz and unassigned rswrz Dec 17, 2024
@rswrz
Copy link
Member Author

rswrz commented Feb 13, 2025

Another potential solution could involve using VMSS Instance Scale Protection

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant