Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

doc: document ARO-HCP Clusters creation in CS with Managed Identities #858

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

miguelsorianod
Copy link
Collaborator

@miguelsorianod miguelsorianod commented Nov 20, 2024

What this PR does

We document how to create the N needed Managed Identities when creating ARO-HCP Clusters directly against CS.

As an important remark, this assumes 4.17.x clusters creation. The set of required managed identities can differ between OCP versions. For now assuming 4.17.x is good enough but in the future we might want to improve on this.

As an important note, similar documentation / scripts should be created for the frontend side.

Special notes for your reviewer

@@ -353,8 +355,44 @@ Then register it with the Maestro Server
```
az network vnet subnet update -g <resource-group> -n <subnet-name> --vnet-name <vnet-name> --network-security-group <nsg-name>
```
- Generate a random alphanumeric string used as a suffix for the User-Assigned Managed Identities of the operators of the cluster
Copy link
Collaborator Author

@miguelsorianod miguelsorianod Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although for now we document this, I think the whole step 2, which is creation of prerequisites, could be automated or partially automated by having a script that executes the commands.

We can work on that in the near future if we want.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 that would be great.

a different managed identity, and different clusters must use different managed identities, even for the same
operators.
```
az identity create -n ${USER}-cp-cloud-controller-manager-${OPERATORS_UAMIS_SUFFIX} -g <resource-group>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example of a name of a managed identity is:

msoriano-cp-cloud-controller-manager-8ffe3f

operators.
```
az identity create -n ${USER}-cp-cloud-controller-manager-${OPERATORS_UAMIS_SUFFIX} -g <resource-group>
az identity create -n ${USER}-cp-ingress-${OPERATORS_UAMIS_SUFFIX} -g <resource-group>
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice how the control plane side managed identities contain a "cp" part. The same happens for the data plane ones with "dp". This is intentional and the reason is that the same operator can be installed in both the control plane and the data plane but we still need different managed identities for it so we add cp/dp to allow having different names for them

@miguelsorianod
Copy link
Collaborator Author

cc @bennerv , @machi1990, @geoberle as relevant people too

@miguelsorianod
Copy link
Collaborator Author

I think we could already require people to do this even if not fully used by CS, to start preparing and getting them used to this.

Before though we would need all the pieces in the frontend side too so the information can be passed from user -> frontend -> cs

- Generate a random alphanumeric string used as a suffix for the User-Assigned Managed Identities of the operators of the cluster
> NOTE: The random suffix used has to be different for each cluster to be created
```
export OPERATORS_UAMIS_SUFFIX=$(openssl rand -hex 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should use the future clusters name as the suffix instead of a random string. Easier for traceability. Wdyt?

Copy link
Collaborator Author

@miguelsorianod miguelsorianod Nov 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I added randomness is because:

  • Two different people could create a cluster with the same name
  • The same person could create two clusters with the same name, in different RGs
  • In the Azure portal UI you see the name of the managed identity

We could add the name of the cluster on top of that at some point too

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on adding some element that could enable us to associate the MIs to the cluster being created.

Copy link
Collaborator Author

@miguelsorianod miguelsorianod Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the MIs creation to also include the desired CS Cluster name.

I am not super convinced about it because:

  • Different people or even the same person can have the same cluster name for different clusters, which doesn't really help to differentiate then
  • Name length limit concerns. If we do this here people will tend to try do to the same or similar when writing automated tests. In those cases reaching the length limit is quite more probable because names formed there usually concatenate several parts adding quickly. MIs have a max length of 128 characters. But right now in this name we already have: username (unbounded), operator name (unbounded), cluster name (54 characters), other extra characters like hyphens, ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Miguel. I really expect authors of the automated tests to take name limits onto account and not copy and paste the command as it or its equivalent.

@miguelsorianod
Copy link
Collaborator Author

@geoberle I also added a cleaning up a cluster section for CS

@miguelsorianod
Copy link
Collaborator Author

Hi @machi1990 , @geoberle, @JameelB, @bennerv please take a look. Also including the threads with my replies.

Thanks.

@@ -391,7 +457,52 @@ Then register it with the Maestro Server
"tenant_id": "$TENANTID",
"managed_resource_group_name": "$MANAGEDRGNAME",
"subnet_resource_id": "$SUBNETRESOURCEID",
"network_security_group_resource_id":"$NSG"
"network_security_group_resource_id":"$NSG",
"operators_authentication": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this operators_authentication if you're already splitting by control_plane_operators inside it? Also, why the nested resource_id for each operator? Are we expecting additional attributes inside (if so, which attributes?)? Would simplifying a bit like this make sense?

{
  ...
  "managed_identities": {
    "control_plane_operators": {
      "cloud-controller-manager": "$CP_CCM_UAMI",
      "ingress": "$CP_INGRESS_UAMI",
      ...
    }
  }
}

Copy link
Collaborator Author

@miguelsorianod miguelsorianod Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this operators_authentication if you're already splitting by control_plane_operators inside it?

Having operators_authentication and managed_identities within it is intentional. The reasons are because:

  • We have this intermediate operators_authentication section is so we keep it open for extension. It is not unreasonable that in the future we have other related information not specific to managed identities. This allows us for example to have other types of operators authentication in the future (although not limited to that) if desired/needed (SP based one for example). In case that happens that avoid having a breaking change or adding it in a “parallel” attribute. This also allows us avoiding to be in a scenario where we have to put an alternative at the same top level together with unrelated configurations, causing confusion. A clear example is with AWS based clusters where we have aws.sts and also aws.access_key, aws.secret_key, aws.private_hosted_zone_id all in the same level
  • The managed identities information is specific to the context of operators authentication. There could be other managed identities related data that is not used for operators authentication
  • It gives additional context

Because of this we decided to go for this format.

Also, why the nested resource_id for each operator? Are we expecting additional attributes inside (if so, which attributes?)

Yes. The information is not only the resource id. We are going to also provide the client id and the principal id for most of them. This is also another example on why it is a good idea to in some cases keep it open for extension even if not needed now and it is not unreasonable to expect additional needs on it. Imagine for example another scenario where we had made an attribute a string because that's the only thing we needed then. Think for example a "status" field that we decided to make a string, and in the future we needed more information that would have needed a breaking change which we cannot perform once the application is GA.

Regarding the names. Within managed_identities the following names were chosen:

  • managed_identities_data_plane_identity_url instead of data_plane_identity_url : We are aware that the managed_identities part is redundant in here. The decision taken was to still have it in this attribute. The reason is because managed identities in this context is the managed identities data plane identity url and not managed identities themselves
  • control_plane_operators_managed_identities and data_plane_operators_managed_identities instead of control_plane_operators and data_plane_operators: This was also intentional, and the reasoning back then was that you are not defining the operators that will be used. These operators will be there independently on whether you define them in here or not. You are defining the identities information for them.

Although some information is "repeated" the reasoning was that it gave clarity about it.

All these reasonings, the API definitions and also the DB level definitions, with the discarded alternatives and the reasons why were written down in: https://docs.google.com/document/d/1wj3Q7SLGeVjIcekLdGpeXGnsE-Wg7oKDUYPKAvQkz5M/edit

An example on how a cluster creation API payload is:

{
  "name": "ms-aro-hcp-1",
  "product": {
    "id": "aro"
  },
  "ccs": {
    "enabled": true
  },
  "region": {
    "id": "westus3"
  },
  "hypershift": {
    "enabled": true
  },
  "multi_az": true,
  "azure": {
    "subscription_id": "YYYYY",
    "resource_group_name": "ms-aro-hcp-clusters-rg-1",
    "resource_name": "ms_resource_name_1",
    "tenant_id": "XXXXX",
    "managed_resource_group_name": "ms-aro-hcp-mrg-1",
    "subnet_resource_id": "/subscriptions/YYYYY/resourceGroups/ms-aro-hcp-byovnet-rg-1/providers/Microsoft.Network/virtualNetworks/ms-aro-hcp-vnet-1/subnets/ms-aro-hcp-subnet-1",
    "network_security_group_resource_id": "/subscriptions/YYYYY/resourceGroups/ms-aro-hcp-byovnet-rg-1/providers/Microsoft.Network/networkSecurityGroups/ms-aro-hcp-nsg-1",
    "operators_authentication": {
      "managed_identities": {
        "managed_identities_data_plane_identity_url": "https://dummyurl.com",
        "control_plane_operators_managed_identities": {
          "cloud-controller-manager": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<cp-ccm-uami-name>"
          },
          "ingress": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<cp-ingress-uami-name>"
          },
          "disk-csi-driver": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<cp-disk-csi-driver-uami-name>"
          },
          "file-csi-driver": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<cp-file-csi-driver-uami-name>"
          },
          "image-registry": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<cp-image-registry-uami-name>"
          },
          "cloud-network-config": {
            "resource_id": "/subscriptions/YYYYYb/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<cp-cnc-uami-name>"
          }
        },
        "data_plane_operators_managed_identities": {
          "disk-csi-driver": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<dp-disk-csi-driver-uami-name>"
          },
          "image-registry": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<dp-image-registry-uami-name>"
          },
          "file-csi-driver": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<-dp-file-csi-driver-uami-name>"
          },
          "ingress": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<dp-ingress-uami-name>"
          },
          "cloud-network-config": {
            "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<dp-cnc-uami-name>"
          }
        },
        "service_managed_identity": {
          "resource_id": "/subscriptions/YYYYY/resourcegroups/<resource-group>/providers/Microsoft.ManagedIdentity/userAssignedIdentities/<service-managed-identity-uami-name>"
        }
      }
    }
  },
}

You will see that only the resoure id is not specified. That's because that's the only element passed at creation time. There are other attributes (client id, principal id) that will be returned with GET at some point (they are readonly)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation @miguelsorianod - sounds good to me.

Copy link

Please rebase pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants