Skip to content

Commit

Permalink
Update telemetry adoption guide
Browse files Browse the repository at this point in the history
  • Loading branch information
yadneshk committed May 15, 2024
1 parent faba01b commit 9527471
Show file tree
Hide file tree
Showing 17 changed files with 331 additions and 74 deletions.
102 changes: 102 additions & 0 deletions docs_user/modules/openstack-stop_remaining_services.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
[id="stopping-infrastructure-management-and-compute-services_{context}"]

//:context: stopping-infrastructure-management
//kgilliga: This module might be converted to an assembly, or a procedure as a standalone chapter.

= Stopping infrastructure management and Compute services

Before you start the EDPM adoption, make sure that you stop the Compute,
libvirt, load balancing, messaging, and database services on the source cloud. You also need to disable repositories for modular libvirt daemons on Compute hosts.

After this step, the source cloud's control plane can be decomissioned,
which is taking down only cloud controllers, database and messaging nodes.
Nodes that must remain functional are those running the compute, storage,
or networker roles (in terms of composable roles covered by Tripleo Heat
Templates).

== Variables

Define the shell variables used in the steps below.
Define the map of compute node name, IP pairs.
The values are just illustrative and refer to a single node standalone director deployment, use values that are correct for your environment:

[subs=+quotes]
----
ifeval::["{build}" != "downstream"]
EDPM_PRIVATEKEY_PATH="~/install_yamls/out/edpm/ansibleee-ssh-key-id_rsa"
endif::[]
ifeval::["{build}" == "downstream"]
EDPM_PRIVATEKEY_PATH="*<path to SSH key>*"
endif::[]
declare -A computes
computes=(
["standalone.localdomain"]="192.168.122.100"
# ...
)
----

These ssh variables with the ssh commands are used instead of ansible to try to create instructions that are independent on where they are running. But ansible commands could be used to achieve the same result if you are in the right host, for example to stop a service:

----
. stackrc
ansible -i $(which tripleo-ansible-inventory) Compute -m shell -a "sudo systemctl stop tripleo_virtqemud.service" -b
----

== Stopping remaining services

Remove the conflicting repositories and packages (in case of a devsetup that
uses Standalone TripleO) from all compute hosts. That is required to install
libvirt packages, when these hosts become adopted as External DataPlane Managed
(EDPM) nodes, where modular libvirt daemons are no longer running in podman
containers.

These steps can be automated with a simple script that relies on the previously
defined environmental variables and function:

----
ComputeServicesToStop=(
"tripleo_nova_compute.service"
"tripleo_nova_libvirt.target"
"tripleo_nova_migration_target.service"
"tripleo_nova_virtlogd_wrapper.service"
"tripleo_nova_virtnodedevd.service"
"tripleo_nova_virtproxyd.service"
"tripleo_nova_virtqemud.service"
"tripleo_nova_virtsecretd.service"
"tripleo_nova_virtstoraged.service"
"tripleo_ceilometer_agent_compute.service"
"tripleo_ceilometer_agent_ipmi.service"
"tripleo_collectd.service")
PacemakerResourcesToStop=(
"galera-bundle"
"haproxy-bundle"
"rabbitmq-bundle")
echo "Disabling systemd units and cleaning up for compute services"
for i in "${!computes[@]}"; do
SSH_CMD="ssh -i $EDPM_PRIVATEKEY_PATH root@${computes[$i]}"
for service in ${ComputeServicesToStop[*]}; do
echo "Stopping the $service in compute $i"
if ${SSH_CMD} sudo systemctl is-active $service; then
${SSH_CMD} sudo systemctl disable --now $service
${SSH_CMD} test -f /etc/systemd/system/$service '||' sudo systemctl mask $service
fi
done
done
echo "Stopping pacemaker services"
for i in {1..3}; do
SSH_CMD=CONTROLLER${i}_SSH
if [ ! -z "${!SSH_CMD}" ]; then
echo "Using controller $i to run pacemaker commands"
for resource in ${PacemakerResourcesToStop[*]}; do
if ${!SSH_CMD} sudo pcs resource config $resource; then
${!SSH_CMD} sudo pcs resource disable $resource
fi
done
break
fi
done
----
84 changes: 50 additions & 34 deletions docs_user/modules/proc_adopting-autoscaling.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[id="adopting-autoscaling_{context}"]

= Adopting autoscaling
= Adopting Autoscaling

Adopting autoscaling means that an existing `OpenStackControlPlane` custom resource (CR), where Aodh services are supposed to be disabled, should be patched to start the service with the configuration parameters provided by the source environment.

Expand All @@ -20,9 +20,9 @@ should be already adopted.
. Patch the `OpenStackControlPlane` CR to deploy autoscaling services:
+
----
cat << EOF > aodh_patch.yaml
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
autoscaling:
telemetry:
enabled: true
prometheus:
deployPrometheus: false
Expand All @@ -31,18 +31,6 @@ spec:
[DEFAULT]
debug=true
secret: osp-secret
ifeval::["{build}" != "downstream"]
apiImage: "quay.io/podified-antelope-centos9/openstack-aodh-api:current-podified"
evaluatorImage: "quay.io/podified-antelope-centos9/openstack-aodh-evaluator:current-podified"
notifierImage: "quay.io/podified-antelope-centos9/openstack-aodh-notifier:current-podified"
listenerImage: "quay.io/podified-antelope-centos9/openstack-aodh-listener:current-podified"
endif::[]
ifeval::["{build}" == "downstream"]
apiImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-api-rhel9:18.0"
evaluatorImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-evaluator-rhel9:18.0"
notifierImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-notifier-rhel9:18.0"
listenerImage: "registry.redhat.io/rhosp-dev-preview/openstack-aodh-listener-rhel9:18.0"
endif::[]
passwordSelectors:
databaseUser: aodh
databaseInstance: openstack
Expand All @@ -58,10 +46,41 @@ os-diff diff /tmp/collect_tripleo_configs/aodh/etc/aodh/aodh.conf aodh_patch.yam
+
For more information, see xref:reviewing-the-openstack-control-plane-configuration_{context}[Reviewing the {rhos_prev_long} control plane configuration].

. Patch the `OpenStackControlPlane` CR to deploy Aodh services:
. Patch the `OpenStackControlPlane` CR to deploy cluster-observability-operator:
+
----
oc patch openstackcontrolplane openstack --type=merge --patch-file aodh_patch.yaml
oc create -f - <<EOF
apiVersion: operators.coreos.com/v1alpha1
kind: Subscription
metadata:
name: cluster-observability-operator
namespace: openshift-operators
spec:
channel: development
installPlanApproval: Automatic
name: cluster-observability-operator
source: redhat-operators
sourceNamespace: openshift-marketplace
EOF
----

. Wait for the installation to succeed
+
----
oc wait --for jsonpath="{.status.phase}"=Succeeded csv --namespace=openshift-operators -l operators.coreos.com/cluster-observability-operator.openshift-operators
----

. Enable metrics storage backend
+
----
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
telemetry:
enabled: true
template:
metricStorage:
enabled: true
'
----

.Verification
Expand All @@ -77,30 +96,27 @@ oc exec -t $AODH_POD -c aodh-api -- cat /etc/aodh/aodh.conf
+
----
openstack endpoint list | grep aodh
| 6a805bd6c9f54658ad2f24e5a0ae0ab6 | regionOne | aodh | network | True | public | http://aodh-public-openstack.apps-crc.testing |
| b943243e596847a9a317c8ce1800fa98 | regionOne | aodh | network | True | internal | http://aodh-internal.openstack.svc:9696 |
| f97f2b8f7559476bb7a5eafe3d33cee7 | regionOne | aodh | network | True | admin | http://192.168.122.99:9696 |
| d05d120153cd4f9b8310ac396b572926 | regionOne | aodh | alarming | True | internal | http://aodh-internal.openstack.svc:8042 |
| d6daee0183494d7a9a5faee681c79046 | regionOne | aodh | alarming | True | public | http://aodh-public.openstack.svc:8042 |
----

. Create sample resources. You can test whether you can create alarms:
+
----
openstack alarm create \
--name low_alarm \
--type gnocchi_resources_threshold \
--metric cpu \
--resource-id b7ac84e4-b5ca-4f9e-a15c-ece7aaf68987 \
--threshold 35000000000 \
--comparison-operator lt \
--aggregation-method rate:mean \
--granularity 300 \
--evaluation-periods 3 \
--alarm-action 'log:\\' \
--ok-action 'log:\\' \
--resource-type instance
oc get pods -l alertmanager=metric-storage
NAME READY STATUS RESTARTS AGE
alertmanager-metric-storage-0 2/2 Running 0 17h
alertmanager-metric-storage-1 2/2 Running 0 17h
oc get pods -l prometheus=metric-storage
NAME READY STATUS RESTARTS AGE
prometheus-metric-storage-0 3/3 Running 0 17h
----

.Autoscaling template adoption

* `PrometheusAlarm` alarm type must be used instead of `GnocchiAggregationByResourcesAlarm`

//=== (TODO)

//* Include adopted autoscaling heat templates
//* Include adopted Aodh alarm create commands of type prometheus
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,7 @@ spec:
- nova-compute-extraconfig
- ovn
- neutron-metadata
- telemetry
env:
- name: ANSIBLE_CALLBACKS_ENABLED
value: "profile_tasks"
Expand Down
44 changes: 12 additions & 32 deletions docs_user/modules/proc_adopting-telemetry-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -20,35 +20,16 @@ This guide also assumes that:
// TODO(jistr): There are still some quay.io images in the downstream build.
+
----
cat << EOF > ceilometer_patch.yaml
oc patch openstackcontrolplane openstack --type=merge --patch '
spec:
ceilometer:
telemetry:
enabled: true
template:
ifeval::["{build}" != "downstream"]
centralImage: quay.io/podified-antelope-centos9/openstack-ceilometer-central:current-podified
computeImage: quay.io/podified-antelope-centos9/openstack-ceilometer-compute:current-podified
customServiceConfig: |
[DEFAULT]
debug=true
ipmiImage: quay.io/podified-antelope-centos9/openstack-ceilometer-ipmi:current-podified
nodeExporterImage: quay.io/prometheus/node-exporter:v1.5.0
notificationImage: quay.io/podified-antelope-centos9/openstack-ceilometer-notification:current-podified
secret: osp-secret
sgCoreImage: quay.io/infrawatch/sg-core:v5.1.1
endif::[]
ifeval::["{build}" == "downstream"]
centralImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-central-rhel9:18.0
computeImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-compute-rhel9:18.0
customServiceConfig: |
[DEFAULT]
debug=true
ipmiImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-ipmi-rhel9:18.0
nodeExporterImage: quay.io/prometheus/node-exporter:v1.5.0
notificationImage: registry.redhat.io/rhosp-dev-preview/openstack-ceilometer-notification-rhel9:18.0
secret: osp-secret
sgCoreImage: quay.io/infrawatch/sg-core:v5.1.1
endif::[]
ceilometer:
customServiceConfig: |
[DEFAULT]
debug=true
secret: osp-secret
EOF
----

Expand All @@ -60,11 +41,6 @@ os-diff diff /tmp/collect_tripleo_configs/ceilometer/etc/ceilometer/ceilometer.c
+
For more information, see xref:reviewing-the-openstack-control-plane-configuration_{context}[Reviewing the {rhos_prev_long} control plane configuration].

. Patch the `OpenStackControlPlane` CR to deploy Ceilometer services:
+
----
oc patch openstackcontrolplane openstack --type=merge --patch-file ceilometer_patch.yaml
----

.Verification

Expand Down Expand Up @@ -99,8 +75,12 @@ sources:
- volume.size
- image.size
- cpu
- memory
- memory.usage
EOF
----

. Update ceilometer configuration with new pollsters:
+
----
oc patch secret ceilometer-config-data --patch="{\"data\": { \"polling.yaml\": \"$(base64 -w0 polling.yaml)\"}}"
----
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,8 @@ spec:
authEncryptionKey: HeatAuthEncryptionKey
database: HeatDatabasePassword
service: HeatPassword
rabbitMqClusterName: rabbitmq
serviceUser: heat
'
----

Expand Down
20 changes: 14 additions & 6 deletions docs_user/modules/proc_deploying-backend-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -302,13 +302,21 @@ spec:
spec:
type: LoadBalancer
ceilometer:
telemetry:
enabled: false
template: {}
autoscaling:
enabled: false
template: {}
template:
ceilometer:
enabled: false
template: {}
autoscaling:
enabled: false
template: {}
metricStorage:
enabled: false
template: {}
logging:
enabled: false
template: {}
EOF
----

Expand Down
13 changes: 12 additions & 1 deletion docs_user/modules/proc_stopping-openstack-services.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,14 @@ environmental variables and function:

----
# Update the services list to be stopped
ServicesToStop=("tripleo_horizon.service"
ServicesToStop=("tripleo_aodh_api.service"
"tripleo_aodh_api_cron.service"
"tripleo_aodh_evaluator.service"
"tripleo_aodh_listener.service"
"tripleo_aodh_notifier.service"
"tripleo_ceilometer_agent_central.service"
"tripleo_ceilometer_agent_notification.service"
"tripleo_horizon.service"
"tripleo_keystone.service"
"tripleo_barbican_api.service"
"tripleo_barbican_worker.service"
Expand All @@ -90,7 +97,11 @@ ServicesToStop=("tripleo_horizon.service"
"tripleo_cinder_scheduler.service"
"tripleo_cinder_volume.service"
"tripleo_cinder_backup.service"
"tripleo_collectd.service"
"tripleo_glance_api.service"
"tripleo_gnocchi_api.service"
"tripleo_gnocchi_metricd.service"
"tripleo_gnocchi_statsd.service"
"tripleo_manila_api.service"
"tripleo_manila_api_cron.service"
"tripleo_manila_scheduler.service"
Expand Down
2 changes: 2 additions & 0 deletions tests/roles/autoscaling_adoption/meta/main.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
dependencies:
- role: common_defaults
Loading

0 comments on commit 9527471

Please sign in to comment.