Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use configure-aws-credentials workflow instead of passing secret_access_key #1361

Closed
wants to merge 19 commits into from

Conversation

vudiep411
Copy link

Summary

This PR fixes #1346 where we can get rid of the long term credentials by using OpenID Connect. OpenID Connect (OIDC) allows your GitHub Actions workflows to access resources in Amazon Web Services (AWS), without needing to store the AWS credentials as long-lived GitHub secrets.

Changes

We can remove these secrets that were passed in previously:

token: ${{ secrets.GITHUB_TOKEN }}
      bucket: ${{ secrets.AWS_S3_BUCKET }}
      access_key_id: ${{ secrets.AWS_S3_ACCESS_KEY_ID }}
      secret_access_key: ${{ secrets.AWS_S3_ACCESS_KEY }}

Instead we only need the role-to-assume arn. For more information OIDC.

Prerequisites

Before merging this PR, we need to make sure to set up the proper Identity providers on the production AWS account. Follow this guides.
Quick guide:

  1. Create an Identity provider with OpenID Connect.
    Provider url: https://token.actions.githubusercontent.com

    Audience: sts.amazonaws.com
Screenshot 2024-11-26 at 1 02 23 PM
  1. Assign the role into this new provider with this trust policy like below
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "arn:aws:iam::<aws_account_id>:oidc-provider/token.actions.githubusercontent.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringLike": {
                    "token.actions.githubusercontent.com:sub": [
                        "repo:valkey-io/valkey:ref:refs/heads/unstable",
                        "repo:valkey-io/valkey:ref:refs/tags/*"
                    ],
                    "token.actions.githubusercontent.com:aud": "sts.amazonaws.com"
                }
            }
        }
    ]
}
  1. In Github secrets, Add these secrets:
Screenshot 2024-11-26 at 2 11 02 PM

Results

Github action run:
Screenshot 2024-11-26 at 2 06 17 PM

Files in S3 Dev env:
Screenshot 2024-11-26 at 2 09 42 PM

Note: the failed workflow can be found in this issue #1343 which will be in a separate PR.

enjoy-binbin and others added 13 commits November 26, 2024 15:03
When a node role changes, we should brocast the change to notify other nodes.
For example, one primary and one replica, after a failover, the replica became
a new primary, the primary became a new replica.

And then we trigger a second cluster failover for the new replica, the
new replica will send a MFSTART to its primary, ie, the new primary.

But the new primary may reject the MFSTART due to this logic:
```
    } else if (type == CLUSTERMSG_TYPE_MFSTART) {
        if (!sender || sender->replicaof != myself) return 1;
```

In the new primary views, sender is still a primary, and sender->replicaof
is NULL, so we will return. Then the manual failover timedout.

Another possibility is that other primaries refuse to vote after receiving
the FAILOVER_AUTH_REQUEST, since in their's views, sender is still a primary,
so it refuse to vote, and then manual failover timedout.
```
void clusterSendFailoverAuthIfNeeded(clusterNode *node, clusterMsg *request) {
    ...
        if (clusterNodeIsPrimary(node)) {
            serverLog(LL_WARNING, "Failover auth denied to...
```

The reason is that, currently, we only update the node->replicaof information
when we receive a PING/PONG from the sender. For details, see clusterProcessPacket.
Therefore, in some scenarios, such as clusters with many nodes and a large
cluster-ping-interval (that is, cluster-node-timeout), the role change of the node
will be very delayed.

Added a DEBUG DISABLE-CLUSTER-RANDOM-PING command, send cluster ping
to a random node every second (see clusterCron).

Signed-off-by: Binbin <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
…alkey-io#1249)

These commands are all administrator commands. If they are operated
incorrectly, serious consequences may occur. Print the full client
info by using catClientInfoString, the info is useful when we want
to identify the source of request.

Since the origin client info is very large and might complicate the
output, we added a catClientInfoShortString function, it will only
print some basic fields, we want these fields that are useful to
identify the client. These fields are:
- id
- addr
- laddr
- connection info
- name
- user
- lib-name
- lib-ver

And also used it to replace the origin client info where it has the
same purpose. Some logging is changed from full client info to short
client info:
- CLUSTER FAILOVER
- FAILOVER / PSYNC
- REPLICAOF NO ONE
- SHUTDOWN

Signed-off-by: Binbin <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
valkey-io#1334)

We have a replicationEmptyDbCallback, it is a callback used by emptyData
while flushing away old data. Previously, we did not add this callback
logic for function, in case of abuse, there may be a lot of functions,
and also to make the code consistent, we add the same callback logic
for function.

Changes around this commit:
1. Extend emptyData / functionsLibCtxClear to support passing callback
   when flushing functions.
2. Added disklessLoad function create and discard helper function, just
   like disklessLoadInitTempDb and disklessLoadDiscardTempDb), we wll
   always flush the temp function in a async way to avoid any block.
3. Cleanup around discardTempDb, remove the callback pointer since in
   async way we don't need the callback.
4. Remove functionsLibCtxClear call in readSyncBulkPayload, because we
   called emptyData in the previous lines, which also empty functions.

We are doing this callback in replication is because during the flush,
replica may block a while if the flush is doing in the sync way, to
avoid the primary to detect the replica is timing out, replica will use this
callback to notify the primary (we also do this callback when loading
a RDB). And in the async way, we empty the data in the bio and there is
no slw operation, so it will ignores the callback.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
Fast_float is a C++ header-only library to parse doubles using SIMD
instructions. The purpose is to speed up sorted sets and other commands
that use doubles. A single-file copy of fast_float is included in this
repo. This introduces an optional dependency on a C++ compiler.

The use of fast_float is enabled at compile time using the make variable
`USE_FAST_FLOAT=yes`. It is disabled by default.

Fixes valkey-io#1069.

---------

Signed-off-by: Parth Patel <[email protected]>
Signed-off-by: Parth <[email protected]>
Signed-off-by: Madelyn Olson <[email protected]>
Signed-off-by: Viktor Söderqvist <[email protected]>
Co-authored-by: Roshan Swain <[email protected]>
Co-authored-by: Madelyn Olson <[email protected]>
Co-authored-by: Viktor Söderqvist <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
…io#1342)

Optimize sdscatrepr by reducing realloc calls, furthermore, we can reduce memcpy calls by
batch processing of consecutive printable characters.

Signed-off-by: Ray Cao <[email protected]>
Co-authored-by: Ray Cao <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
…g modified (valkey-io#1347)

Apparently on Mac, sleep will modify errno to ETIMEDOUT, and then it
prints the misleading message: Operation timed out.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
The code is ok before 2de544c,
but now we will set server.repl_transfer_fd right after dfd was
initiated, and in here we have a double close error since dfd and
server.repl_transfer_fd are the same fd.

Also move the declaration of dfd/maxtries to a small scope to avoid
the confusion since they are only used in this code.

Signed-off-by: Binbin <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
This PR introduces a consistent tagging system for dual-channel logs.
The goal is to improve log readability and filterability, making it
easier for operators to manage and analyze log entries.

Resolves valkey-io#986

---------

Signed-off-by: naglera <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
Signed-off-by: vudiep411 <[email protected]>
@vudiep411 vudiep411 closed this Nov 26, 2024
@vudiep411 vudiep411 deleted the aws-creds-workflow branch November 26, 2024 23:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use configure-aws-credentials instead of passing secret_access_key
6 participants