Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi cell database/mq adoption #746

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

bogdando
Copy link
Contributor

@bogdando bogdando commented Nov 25, 2024

  1. Fix OSPDo specifics for single vs multi cellls

Declare RUN_OVERRIDES before it is used.
Use env vars instead of docs generation conditions to reuse the same
code in tests:

  • Add MARIADB_RUN_OVERRIDES to cover all overrides and client annotations
  • Add missing definitions for rhoso/ospd namespace specific vars
  • Use env TRIPLEO_PASSWORDS for all cases as OSPDo still deploys
    tripleo
  • Define and use NAMESPACE (default openstack) instead of
    RHOSO18_NAMESPACE or OSPDO_NAMESPACE. Remove unused rhoso18 ns value
    (only in these guide).
  1. Refactor comments in commands into asciidocs native

Illustrate how commands in scripts could have comments
that become (almost as is) native ascii docs foot-notes.

When copying code into docs, the minimal adjustments will
be needed, like adding '$' prefix (or '>' for multiline commands).

Provide a static multi-cell config for databases and messaging
for adoption guide and tests, which comprises a 3 cells.

  1. Keep renaming 'default' cell consistent for single and multi cells:

Default becomes cellX (or it can be imported as is, for a multi-cell
case only)
cell1 becomes mapped to openstack-cell1 osdp node set
cell2 becomes mapped to openstack-cell2 osdp node set, etc.
cellX (X=3 here) becomes mapped to openstack-cell3. Alternatively,
default cell retains its name for the openstack-default osdpns
mapping
Evaluate podified MariaDB passwords for cells from osp-secret
to align the tests with documented commands. Remove no longer
needed podified DB password variable.

  1. Make ansible and shell variables compute cells aware.

  2. Rework vars and secrets YAML values for the source and edpm
    nodes to not confuse its different naming schemes for cells
    in OSP/TripleO and RHOSO.

  3. Remove cached fact for pulled OSP configuration as it can no longer
    be generated in a multi-cell setup, where related shell variables
    become bash arrays.

  4. Simplify ENV headers management by collecting in a single place.

  5. Adjust storage/storageRequests values to make it better fitting
    a multi-cell test scenarios. Also provide values in docs and
    add a comment to adjust them as needed.

  6. Remove source_db_root_password as it is directly evaluated from
    tripleo passwords into an env var.

  7. Run mysql commands in individual pods.
    Finished pods take time to terminate, avoid errors where
    consequent mysql commands failing because the old and new pod use the
    same name.

  8. Rename nodesets to openstack-cell1, which is needed for adoption of
    remaining multi-cell aware services in a follow up.

  9. Make edpm_nodes input multi-cell aware.

Assume a single cell1 yet.

Remove edpm_computes and computes env var
from tests as it is not multi-cell aware, and should be no longer
needed. The docs still use that env var, it will be removed in
multi-cell adoption follow up, where we also cover EDPM multi-cell
adoption.

This is required as rhe rdo-jobs dependency introduces that
change for edpm_nodes and provides a common base for this and future
multi-cell follow ups.

  1. Unify org_namespace defaults and reference by env var

Closes: #184
Depends-On: openstack-k8s-operators/install_yamls#985
Depends-On: https://review.rdoproject.org/r/c/rdo-jobs/+/53192

Jira: #OSPRH-6548

@bogdando bogdando requested a review from jistr November 25, 2024 12:48
@bogdando bogdando force-pushed the multi_cell_database branch from 3323bbf to a1207eb Compare November 25, 2024 12:49
@bogdando bogdando changed the title Multi cell database Multi cell database/mq adoption Nov 25, 2024
@bogdando
Copy link
Contributor Author

@bogdando bogdando mentioned this pull request Nov 25, 2024
@bogdando bogdando added the check-before-merge/depends-on Don't forget to check depends-on before merging label Nov 25, 2024
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ff2fa0079d2d4d0f9d107db215021a87

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 1h 34m 52s
adoption-standalone-to-crc-no-ceph FAILURE in 1h 40m 01s
✔️ adoption-docs-preview SUCCESS in 1m 17s

@bogdando bogdando force-pushed the multi_cell_database branch from 0ba8ce7 to cd5b8f9 Compare November 26, 2024 12:54
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/2bf5c26d9ce042488b038c87505a0f82

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 1h 38m 37s
adoption-standalone-to-crc-no-ceph FAILURE in 1h 44m 16s
✔️ adoption-docs-preview SUCCESS in 1m 31s

@bogdando bogdando force-pushed the multi_cell_database branch 3 times, most recently from c392e11 to 006e247 Compare November 27, 2024 13:21
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ef997d9bd22a422fa0a81831c10ce76f

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph POST_FAILURE in 1h 40m 25s
adoption-standalone-to-crc-no-ceph RETRY_LIMIT in 10m 21s
✔️ adoption-docs-preview SUCCESS in 1m 21s

@bogdando bogdando force-pushed the multi_cell_database branch from 006e247 to 55aec9f Compare November 29, 2024 15:36
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/221b8eaa480445db948ba454fc1dc6d9

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 1h 51m 13s
adoption-standalone-to-crc-no-ceph FAILURE in 53m 45s
adoption-docs-preview FAILURE in 1m 15s

@bogdando
Copy link
Contributor Author

bogdando commented Dec 2, 2024

recheck

@bogdando bogdando force-pushed the multi_cell_database branch from 55aec9f to ee3defd Compare December 2, 2024 12:27
Copy link

This change depends on a change that failed to merge.

Change https://review.rdoproject.org/r/c/rdo-jobs/+/53192 is needed.

@bogdando bogdando force-pushed the multi_cell_database branch from ee3defd to 03ce37c Compare December 4, 2024 13:04
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/28720608befb44c49d0f76f95466684b

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph RETRY_LIMIT in 48m 27s
adoption-standalone-to-crc-no-ceph RETRY_LIMIT in 48m 06s
adoption-docs-preview FAILURE in 1m 15s

@bogdando bogdando force-pushed the multi_cell_database branch 3 times, most recently from 505e4e8 to 8d81b48 Compare December 6, 2024 13:59
MARIADB_IMAGE=registry.redhat.io/rhosp-dev-preview/openstack-mariadb-rhel9:18.0
endif::[]
SOURCE_MARIADB_IP=172.17.0.2
$ PASSWORD_FILE="$HOME/overcloud-passwords.yaml"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bogdando bogdando force-pushed the multi_cell_database branch from 8d81b48 to cdff544 Compare December 6, 2024 14:20
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/513df93417f943f8bc8b6d3849192150

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 1h 42m 09s
adoption-standalone-to-crc-no-ceph FAILURE in 1h 44m 20s
✔️ adoption-docs-preview SUCCESS in 1m 28s

@bogdando bogdando force-pushed the multi_cell_database branch from c1a9693 to 3b6c0c3 Compare December 12, 2024 15:21
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/bb124871501c4f128f85c5ba04239944

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 2h 22m 42s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 13m 27s
✔️ adoption-docs-preview SUCCESS in 1m 16s

Copy link
Contributor

@klgill klgill left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review my comments and suggestions.
It would really help me to look at the downstream preview to check for any formatting issues.

$ chmod 0600 ~/.source_cloud_exported_variables*
----
+
<1> If `neutron-sriov-nic-agent` agents are running in your {OpenStackShort} deployment, get the configuration to use for the data plane adoption
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<1> If `neutron-sriov-nic-agent` agents are running in your {OpenStackShort} deployment, get the configuration to use for the data plane adoption
<1> If `neutron-sriov-nic-agent` agents are running in your {OpenStackShort} deployment, get the configuration to use for the data plane adoption.

Where do you get this configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably a formatting change only, I was not introducing this statement in the 1st place, so I cannot answer to that question (@karelyatin perchance? we could address that improvement to SR-IOV adoption guide in follow up)

docs_user/modules/proc_stopping-openstack-services.adoc Outdated Show resolved Hide resolved
----
ifeval::["{build}" == "downstream"]
<1> Replace `<path_to_SSH_key>` with the path to your SSH key.
<2> Replace `<controller-X IP>` with IP addresses of all controllers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<2> Replace `<controller-X IP>` with IP addresses of all controllers.
<2> Replace `<controller-1 IP>` with the IP addresses of all Controllers.

The value to replace here should match the line of code.

Copy link
Contributor Author

@bogdando bogdando Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In code there is

ifeval::["{build}" == "downstream"]
CONTROLLER1_SSH="ssh -i *<path to SSH key>* root@*<controller-1 IP>*" <2>
CONTROLLER2_SSH="ssh -i *<path to SSH key>* root@*<controller-2 IP>*"
CONTROLLER3_SSH="ssh -i *<path to SSH key>* root@*<node3 IP>*"
# ...
endif::[]

isn't that matches to <controller-X IP> (or shall we fix to node3?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@klgill sorry, I meant this renders OK for me

Copy link
Contributor

@klgill klgill Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. You used as a "catch-all" for each value.
Typically, our style is to list each env var. But you could probably simplify it by using only one bullet point, for example:

  • Replace <controller-1 IP>, <controller-2 IP>, and <node3 IP> with the IP addresses of each Controller node.

@bogdando bogdando force-pushed the multi_cell_database branch from 3b6c0c3 to 658849b Compare December 16, 2024 13:50
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/813ed9ca7fb3499e9c8fedf30ef0f316

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph NODE_FAILURE Node request 100-0007705163 failed in 0s
adoption-standalone-to-crc-no-ceph NODE_FAILURE Node request 100-0007705164 failed in 0s
✔️ adoption-docs-preview SUCCESS in 1m 15s

@bogdando bogdando force-pushed the multi_cell_database branch 2 times, most recently from 7e46a13 to 8416c99 Compare December 17, 2024 17:10
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/ac1f2c5cea9b4e63a9759e67b769b052

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 2h 57m 12s
adoption-standalone-to-crc-no-ceph FAILURE in 2h 13m 26s
✔️ adoption-docs-preview SUCCESS in 1m 16s

Copy link
Contributor

@pinikomarov pinikomarov Dec 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct this variable is not used , I'm using "mysql_client_override" instead , supplied by the job definitions file from ci-fmw-jobs:
https://gitlab.cee.redhat.com/ci-framework/ci-framework-jobs/-/blob/main/playbooks/adoption/files/secrets_ospdo.j2?ref_type=heads#L24

@pinikomarov
Copy link
Contributor

Looks ok to me

Copy link

This change depends on a change that failed to merge.

Change openstack-k8s-operators/install_yamls#985 is needed.

Copy link
Contributor

@gibizer gibizer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see nothing that is clearly wrong in this change. But also I'm not comfortable to approve it as I lost in the scripts. Probably it is easier to review the result of the adoption to verify that this procedure is correct than looking at the procedure alone. If it the end result is a properly adopted DB then I'm OK to land this.

bogdando and others added 8 commits December 20, 2024 14:40
Provide a static multi-cell config for databases and messaging
for adoption guide and tests, which comprises a 3 cells.

Keep renaming 'default' cell consistent for single and multi cells:

Default becomes cellX (or it can be imported as is, for a multi-cell
case only)
cell1 becomes mapped to openstack-cell1 osdp node set
cell2 becomes mapped to openstack-cell2 osdp node set, etc.
cellX (X=3 here) becomes mapped to openstack-cell3. Alternatively,
default cell retains its name for the openstack-default osdpns
mapping
Evaluate podified MariaDB passwords for cells from osp-secret
to align the tests with documented commands. Remove no longer
needed podified DB password variable.

Make ansible and shell variables compute cells aware.

Rework vars and secrets YAML values for the source and edpm
nodes to not confuse its different naming schemes for cells
in OSP/TripleO and RHOSO.

Remove cached fact for pulled OSP configuration as it can no longer
be generated in a multi-cell setup, where related shell variables
become bash arrays.

Simplify ENV headers management by collecting in a single place.

Adjust storage/storageRequests values to make it better fitting
a multi-cell test scenarios. Also provide values in docs and
add a comment to adjust them as needed.

Remove source_db_root_password as it is directly evaluated from
tripleo passwords into an env var.

Run mysql commands in individual pods.
Finished pods take time to terminate, avoid errors where
consequent mysql commands failing because the old and new pod use the
same name.

Rename nodesets to openstack-cell1, which is needed for adoption of
remaining multi-cell aware services in a follow up.

Signed-off-by: Bohdan Dobrelia <[email protected]>

Fix

Signed-off-by: Bohdan Dobrelia <[email protected]>
Declare RUN_OVERRIDES before it is used.

Use env vars instead of docs generation conditions to reuse the same
code in tests:
* Add MARIADB_RUN_OVERRIDES to cover all overrides and client annotations
* Add missing definitions for rhoso/ospd namespace specific vars
* Use env TRIPLEO_PASSWORDS for all cases as OSPDo still deploys
  tripleo
* Define and use NAMESPACE (default openstack) instead of
  RHOSO18_NAMESPACE or OSPDO_NAMESPACE. Remove unused rhoso18 ns value
  (only in these guide).

Signed-off-by: Bohdan Dobrelia <[email protected]>
Illustrate how commands in scripts could have comments
that become (almost as is) native ascii docs foot-notes.

When copying code into docs, the minimal adjustments will
be needed, like adding '$' prefix (or '>' for multiline commands).

Signed-off-by: Bohdan Dobrelia <[email protected]>
Those will be added back in a follow up, which completes
the guide and tests for extra cell2 and cell3.

Signed-off-by: Bohdan Dobrelia <[email protected]>
Assume a single cell1 yet.

Remove edpm_computes and computes env var
from tests as it is not multi-cell aware, and should be no longer
needed. The docs still use that env var, it will be removed in
multi-cell adoption follow up, where we also cover EDPM multi-cell
adoption.

This is required as rhe rdo-jobs dependency introduces that
change for edpm_nodes and provides a common base for this and future
multi-cell follow ups.

Signed-off-by: Bohdan Dobrelia <[email protected]>
Signed-off-by: Bohdan Dobrelia <[email protected]>
@bogdando bogdando force-pushed the multi_cell_database branch from 0945a95 to 84f57aa Compare December 20, 2024 13:40
Copy link

Build failed (check pipeline). Post recheck (without leading slash)
to rerun all jobs. Make sure the failure cause has been resolved before
you rerun jobs.

https://softwarefactory-project.io/zuul/t/rdoproject.org/buildset/5285d7fd1fc94b238838ed4c2293032e

✔️ noop SUCCESS in 0s
adoption-standalone-to-crc-ceph FAILURE in 1h 47m 11s
adoption-standalone-to-crc-no-ceph FAILURE in 1h 50m 34s
✔️ adoption-docs-preview SUCCESS in 1m 21s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
check-before-merge/depends-on Don't forget to check depends-on before merging
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Nova multi-cell adoption requires different renaming technics for cells databases during importing it
5 participants