-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: properly support the PXE and ISO machines in the secure tokens flow #974
base: main
Are you sure you want to change the base?
Conversation
The unique token flow was reworked to support machines running from PXE and ISO. As they do not support META persistence, Omni doesn't enforce secure tokens for them. But to distinguish machines and make the UUID conflict resolution to work, Omni now calculates the node fingerprints out of the mac addresses of all physical interfaces on the node. So now each unique token consists of two parts: - fingerprint. - a random string. Omni detects Talos installation on the machine in the following way: - check if the pending machine status exists and it detected the system disk. - overwrite the previous check if the existing link was labeled with the Talos being installed. - lastly if the `MachineStatus` exists, overwrite all checks with the installed label from it (ensures bare-metal provider workflow which goes to installed to not installed and PXE booted). Then when a machine joins Omni with some token, Omni checks if the random part is equal. If it is equal, the machine is immediately accepted. If the random part is different and the fingerprint matches: - if Talos is installed - reject the machine and log the warning in the logs. - if Talos is not installed - replace the existing link with the new one (only if the request has a valid join token). Then if nothing matches, the UUID conflict resolution kicks in. Provisioner creates a `PendingMachine` which is marked with UUID conflict label and `PendingMachineStatus` controller generates a random UUID for the node. Signed-off-by: Artem Chernyshev <[email protected]>
ca8362a
to
f77cd97
Compare
hasValidNodeUniqueToken bool | ||
nodeUniqueTokensEnabled bool | ||
forceValidNodeUniqueToken bool | ||
supportsSecureJoinTokens bool | ||
} | ||
|
||
// isAuthorized returns true if the request has valid creds for at least one of the auth flows. | ||
func (pc *provisionContext) isAuthorized() bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe can split to smaller logical parts related to each action we do
} | ||
} | ||
|
||
machineStatus, err := safe.ReaderGetByID[*system.ResourceLabels[*omni.MachineStatus]](ctx, h.state, req.NodeUuid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use InfraMachineStatus last wipe time?
Change the infra machine status resource to have the last wiped token value.
Populate it this way Link -> InfraMachine -> InfraMachineStatus.
// - has matching fingerprint and Talos is not installed. | ||
// | ||
//nolint:gocognit | ||
func ensureResource[T res](ctx context.Context, logger *zap.Logger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to split it to parts
if linkToken.IsSameFingerprint(provisionContext.requestNodeUniqueToken) { | ||
logger.Warn("machine connection rejected: the machine has the correct fingerprint, but the random token part doesn't match") | ||
|
||
return nil, status.Error(codes.PermissionDenied, "unauthorized") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can move this part inside the ensureResource method probably.
continue | ||
} | ||
|
||
macAddresses = append(macAddresses, string(link.TypedSpec().PermanentAddr)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can be empty, if not set use HardwareAddr
.
macAddresses := make([]string, 0, links.Len()) | ||
|
||
for link := range links.All() { | ||
if !link.TypedSpec().Physical() { | ||
continue | ||
} | ||
|
||
macAddresses = append(macAddresses, string(link.TypedSpec().PermanentAddr)) | ||
} | ||
|
||
slices.Sort(macAddresses) | ||
|
||
hash := sha256.New() | ||
|
||
for _, addr := range macAddresses { | ||
hash.Write([]byte(addr)) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
macAddresses := make([]string, 0, links.Len()) | |
for link := range links.All() { | |
if !link.TypedSpec().Physical() { | |
continue | |
} | |
macAddresses = append(macAddresses, string(link.TypedSpec().PermanentAddr)) | |
} | |
slices.Sort(macAddresses) | |
hash := sha256.New() | |
for _, addr := range macAddresses { | |
hash.Write([]byte(addr)) | |
} | |
macAddresses := make([][]byte, 0, links.Len()) | |
for link := range links.All() { | |
if !link.TypedSpec().Physical() { | |
continue | |
} | |
macAddresses = append(macAddresses, link.TypedSpec().PermanentAddr) | |
} | |
slices.SortFunc(macAddresses. bytes.Compare) | |
hash := sha256.New() | |
for _, addr := range macAddresses { | |
hash.Write(addr) | |
} |
pointer.SafeDeref(req.TalosVersion), | ||
) | ||
} | ||
|
||
return h.updateLink(ctx, provisionContext) | ||
return h.provision(ctx, provisionContext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sanitize errors returned here.
@@ -198,7 +199,7 @@ func HandleInput[T generic.ResourceWithRD, S generic.ResourceWithRD](ctx context | |||
|
|||
// GetTalosClient for the machine id. | |||
// Automatically pick secure or insecure client. | |||
func GetTalosClient[T generic.ResourceWithRD](ctx context.Context, r controller.Reader, address string, machine T) (*client.Client, error) { | |||
func GetTalosClient[T generic.ResourceWithRD](ctx context.Context, r controller.Reader, address string, machineResource T) (*client.Client, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func GetTalosClient[T generic.ResourceWithRD](ctx context.Context, r controller.Reader, address string, machineResource T) (*client.Client, error) { | |
func GetTalosClient[T interface { | |
*V | |
generic.ResourceWithRD | |
}, V any](ctx context.Context, r controller.Reader, address string, machineResource T) (*client.Client, error) { |
@@ -212,11 +213,20 @@ func GetTalosClient[T generic.ResourceWithRD](ctx context.Context, r controller. | |||
)...) | |||
} | |||
|
|||
if reflect.ValueOf(machine).IsNil() { | |||
if reflect.ValueOf(machineResource).IsNil() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if reflect.ValueOf(machineResource).IsNil() { | |
if machineResource == nil { |
The unique token flow was reworked to support machines running from PXE and ISO.
As they do not support META persistence, Omni doesn't enforce secure tokens for them.
But to distinguish machines and make the UUID conflict resolution to work, Omni now calculates the node fingerprints out of the mac addresses of all physical interfaces on the node.
So now each unique token consists of two parts:
Omni detects Talos installation on the machine in the following way:
MachineStatus
exists, overwrite all checks with the installed label from it (ensures bare-metal provider workflow which goes to installed to not installed and PXE booted).Then when a machine joins Omni with some token, Omni checks if the random part is equal. If it is equal, the machine is immediately accepted.
If the random part is different and the fingerprint matches:
Then if nothing matches, the UUID conflict resolution kicks in. Provisioner creates a
PendingMachine
which is marked with UUID conflict label andPendingMachineStatus
controller generates a random UUID for the node.Fixes: #944