Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KBS: Optimize performance and memory usage #258

Merged
merged 2 commits into from
Jan 3, 2024

Conversation

Xynnn007
Copy link
Member

Currently we are using a global Mutex to protect the only one attestation service client from unsafe thread sync & send. This brings performance bottle neck.

This commit will create a temporary client object to handle the current attestation request, which can break the mutex.

Fixes #256

@Xynnn007 Xynnn007 marked this pull request as ready for review December 12, 2023 01:48
Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is good but see one comment

@@ -25,9 +25,6 @@ pub(crate) async fn attestation_policy(
}

attestation_service
.0
.lock()
.await
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concerns about not locking here given that this will change the state of the AS?

Copy link
Member Author

@Xynnn007 Xynnn007 Dec 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out this. IMO, the lock/synchronization mechanism should be token cared of on the server side, like in CoCo-AS we use a read-write lock to handle the requests.

The client side code just call the API and the server can response in some order. That means, if users want to use a new policy to do attestation. In their code should call set_policy of KBS and wait for the successful return. Then call attestation API.

Among current three implementations of AS client, built-in-coco-as seems to have some trouble for synchronization. This is because AS is just a lib and KBS is actually the server. Let me add aRwlock on the client side as a workaround.

@Xynnn007 Xynnn007 force-pushed the fix-mutex branch 2 times, most recently from 506a748 to 98e03cc Compare December 20, 2023 05:33
@Xynnn007 Xynnn007 added the do-not-merge Do not merge due to some reason label Dec 20, 2023
@Xynnn007
Copy link
Member Author

Xynnn007 commented Dec 20, 2023

We are doing some tests together with @Lu-Biao to check if this works.

@Xynnn007
Copy link
Member Author

With much help from @Lu-Biao , we found some bottle necks:

Throughput

  1. KBS uses session to save the context of an RCAR handshake protocol. This session is currently saved globally using a Mutex-modified HashMap, which has become the current concurrency performance bottleneck.
  2. KBS uses a single gRPC client globally to connect to gRPC-CoCo-AS, and all requests will be processed serially.
  3. A Mutex is used for the AS object during KBS programming. In fact, it is only mutually exclusive when modifying the policy operation.

Memory usage

  1. Currently, the global Hashmap that manages sessions only records new entries. Old expired session records are actually useless, but they will not be eliminated, resulting in a one-way increase in memory usage.

We did some optimizations

  1. Replace the HashMap that manages sessions with a concurrently safe HashMap, abandon Mutex, and improve concurrent throughput.
  2. Create a new gRPC client connection pool mechanism. On the one hand, it ensures that attestation requests can be processed concurrently and connections are reused. On the other hand, it prevents too many connections from being created at the same time and exhausting the temporary ports at the bottom of the operating system.
  3. Encapsulate concurrency security inside the Attestation object and change Mutex to read-write lock RwLock
  4. Add a coroutine (green thread in Tokio) to regularly release expired sessions in the session map

@Xynnn007
Copy link
Member Author

The following are the test results before and after optimization

image
Throughput without the patch
image
Throughput with the patch

Throughput increased by ~5x

Memory usage will never increase endlessly.

image
Memory usage without the patch ( ~7min)
image
Memory usage with the patch ( ~1hour)

@Xynnn007 Xynnn007 changed the title KBS: Fix Mutex performance bottle neck KBS: Optimize performance and memory usage Dec 29, 2023
Copy link
Member

@fitzthum fitzthum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great. I really like the new SessionStatus enum (even though the variants share some of the same fields) and the macro for its fields. Ofc the results are great too.

I have one minor comment but generally seems ready.

kbs/src/api/src/lib.rs Outdated Show resolved Hide resolved
@Xynnn007 Xynnn007 force-pushed the fix-mutex branch 3 times, most recently from 1378394 to c6445c7 Compare January 2, 2024 02:06
1. Add a more detailed error information for failures of attestation
2. remove unwrap and use an error handling

Signed-off-by: Xynnn007 <[email protected]>
Before this commit, we never free the memory of the sessions that is
expired, which will let the kbs process memory usage monotonically
increase.

Signed-off-by: Biao Lu <[email protected]>
Signed-off-by: Xynnn007 <[email protected]>
@Xynnn007 Xynnn007 merged commit 0e45446 into confidential-containers:main Jan 3, 2024
8 checks passed
@Xynnn007 Xynnn007 deleted the fix-mutex branch January 3, 2024 05:35
@Xynnn007 Xynnn007 removed the do-not-merge Do not merge due to some reason label Jan 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

KBS: Abondon the Mutex to promote the concurrency performance
2 participants