-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Blog about io_thread performance contribution #98
Conversation
Signed-off-by: Dan Touitou <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of interesting things in this blog post, but it needs a little bit before it's ready.
A few general comments:
- The preferred format is one sentence per line (it makes reviews easier).
- There are a few places where you talk about the AWS customer, but this post could use a little zoom out to talk about how this benefits the project
# Each author corresponds to a biography file (more info later in this document) | ||
authors= [ "dantouitou", "uriyagelnik"] | ||
+++ | ||
## AWS to Contribute Efficiency Improvements for Valkey 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for this line. It makes a redundant heading
|
||
From simple in-memory caching implementations to complex job queues, real-time collaboration, and leaderboards applications, we at AWS are continually amazed by how innovatively users employ Valkey. | ||
|
||
Clearly, this is just the tip of the iceberg. As more use cases and industries want to benefit from the speed, low latency, and cost reduction advantages of in-memory processing as introduced by Valkey, we are fully committed to this vision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand what "this vision" means.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just need to modify the wording to better indicate that the vision is performance.
@@ -0,0 +1,67 @@ | |||
+++ | |||
# `title` is how your post will be listed and what will appear at the top of the post | |||
title= "AWS to Contribute Efficiency Improvements to Valkey 8 " |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'AWS to Contribute' indicates that it hasn't yet contributed to this functionality. If this is the case, better to release this blog when there is some contribution that it can link to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have the PR now here: valkey-io/valkey#758.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think this is a great title. "AWS contributes significant performance improvement to Valkey 8", seems a lot more exciting.
|
||
### Our Commitment to Efficiency | ||
|
||
One of our primary goals at AWS is to ensure our customers receive the most efficient services. Efficiency not only leads to lower costs, better latency and a greener environment, but also enhances resilience. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"greener environment" is used here and in the summary. Can you back this up?
|
||
The main thread orchestrates all the jobs spawned to the io_threads, ensuring that no race conditions occur. Io_threads can be easily added and removed by the main thread based on the current load to ensure efficient utilization of the underlying hardware. Despite the dynamic nature of io_threads, the main thread attempts to maintain thread affinity, ensuring that the same io_thread will handle IO for the same client to improve memory access locality. | ||
|
||
Before executing commands, the main thread performs a new procedure, prefetch-commands-keys, which aims to reduce the number of external memory accesses needed when executing the commands on the main dictionary. A detailed explanation of the technique used in that procedure will be described in our next blog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'A detailed explanation...' I would just that sentence out. Someone interested in the subject will find it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider dropping this whole paragraph, since it's missing from the current set of PRs. We can produce blogs as necessary talking about our performance improvements.
|
||
Before executing commands, the main thread performs a new procedure, prefetch-commands-keys, which aims to reduce the number of external memory accesses needed when executing the commands on the main dictionary. A detailed explanation of the technique used in that procedure will be described in our next blog. | ||
|
||
Socket polling system calls, such as epoll_wait, are expensive procedures. When executed solely by the main thread, epoll_wait consumes more than 20 percent of the time. Therefore, we decided to offload epoll_wait execution to the io_threads in the following way: to avoid race conditions, at any time, at most one thread, either an io_thread or the main thread, is executing epoll_wait. Io_threads never sleep on epoll, and whenever there are pending IO operations or commands to be executed, epoll_wait calls are scheduled to the io_threads by the main thread. In all other cases, the main thread executes the epoll_wait with the waiting time as in the original Valkey implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'epoll_wait' should be in backticks for all mentions (maybe also epoll
)
|
||
Socket polling system calls, such as epoll_wait, are expensive procedures. When executed solely by the main thread, epoll_wait consumes more than 20 percent of the time. Therefore, we decided to offload epoll_wait execution to the io_threads in the following way: to avoid race conditions, at any time, at most one thread, either an io_thread or the main thread, is executing epoll_wait. Io_threads never sleep on epoll, and whenever there are pending IO operations or commands to be executed, epoll_wait calls are scheduled to the io_threads by the main thread. In all other cases, the main thread executes the epoll_wait with the waiting time as in the original Valkey implementation | ||
|
||
### Future Enhancements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this section can be removed. What should the reader of this blog post do next? Can they read the PR? Can they download the development branch and try it out?
github: uriyage | ||
--- | ||
|
||
Uri is a software engineer at AWS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uri needs a more complete bio if possible.
--- | ||
title: Uri Yagelnik | ||
extra: | ||
photo: '/assets/media/authors/uriyagelnik.png' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A real photo would be preferred, but not required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with a lot of what Kyle said, I added some more concrete suggestions in a bunch of sections.
![io_threads high level design](/assets/media/pictures/io_threads.png) | ||
|
||
### High Level Design | ||
The above diagram depicts the io_threads implementation from a high-level perspective. Io_threads are stateless worker threads that receive jobs to execute from the main thread. In Valkey 8, a job can involve reading and parsing a command from a client, writing back responses to the client, polling for IO events on TCP connections, or de-allocating memory. This leaves the main thread with more time to execute commands. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above diagram depicts the io_threads implementation from a high-level perspective. Io_threads are stateless worker threads that receive jobs to execute from the main thread. In Valkey 8, a job can involve reading and parsing a command from a client, writing back responses to the client, polling for IO events on TCP connections, or de-allocating memory. This leaves the main thread with more time to execute commands. | |
The above diagram depicts the high-level design of how IO threading processes work in Valkey 8. | |
IO threads are worker threads that receive jobs to execute from the main thread. | |
A job can involve reading and parsing a command from a client, writing back responses to the client, polling for IO events on TCP connections, or de-allocating memory. | |
While IO threads are busy handling IO, the main thread is able to spend more time executing commands. |
### High Level Design | ||
The above diagram depicts the io_threads implementation from a high-level perspective. Io_threads are stateless worker threads that receive jobs to execute from the main thread. In Valkey 8, a job can involve reading and parsing a command from a client, writing back responses to the client, polling for IO events on TCP connections, or de-allocating memory. This leaves the main thread with more time to execute commands. | ||
|
||
The main thread orchestrates all the jobs spawned to the io_threads, ensuring that no race conditions occur. Io_threads can be easily added and removed by the main thread based on the current load to ensure efficient utilization of the underlying hardware. Despite the dynamic nature of io_threads, the main thread attempts to maintain thread affinity, ensuring that the same io_thread will handle IO for the same client to improve memory access locality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thread orchestrates all the jobs spawned to the io_threads, ensuring that no race conditions occur. Io_threads can be easily added and removed by the main thread based on the current load to ensure efficient utilization of the underlying hardware. Despite the dynamic nature of io_threads, the main thread attempts to maintain thread affinity, ensuring that the same io_thread will handle IO for the same client to improve memory access locality. | |
The main thread orchestrates all the jobs spawned to the I/O threads, ensuring that no race conditions occur. | |
The number of active I/O threads can be changed by the main thread based on the current load to ensure efficient utilization of the underlying hardware. | |
Despite the dynamic nature of I/O threads, the main thread attempts to maintain thread affinity, ensuring that the same I/O thread will handle I/O for the same client to improve memory access locality. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"IO threads" or "I/O threads"?
Wikipedia says
input/output (I/O, i/o, or informally io or IO)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll update all my suggestions to I/O threads.
|
||
The main thread orchestrates all the jobs spawned to the io_threads, ensuring that no race conditions occur. Io_threads can be easily added and removed by the main thread based on the current load to ensure efficient utilization of the underlying hardware. Despite the dynamic nature of io_threads, the main thread attempts to maintain thread affinity, ensuring that the same io_thread will handle IO for the same client to improve memory access locality. | ||
|
||
Before executing commands, the main thread performs a new procedure, prefetch-commands-keys, which aims to reduce the number of external memory accesses needed when executing the commands on the main dictionary. A detailed explanation of the technique used in that procedure will be described in our next blog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would consider dropping this whole paragraph, since it's missing from the current set of PRs. We can produce blogs as necessary talking about our performance improvements.
|
||
Before executing commands, the main thread performs a new procedure, prefetch-commands-keys, which aims to reduce the number of external memory accesses needed when executing the commands on the main dictionary. A detailed explanation of the technique used in that procedure will be described in our next blog. | ||
|
||
Socket polling system calls, such as epoll_wait, are expensive procedures. When executed solely by the main thread, epoll_wait consumes more than 20 percent of the time. Therefore, we decided to offload epoll_wait execution to the io_threads in the following way: to avoid race conditions, at any time, at most one thread, either an io_thread or the main thread, is executing epoll_wait. Io_threads never sleep on epoll, and whenever there are pending IO operations or commands to be executed, epoll_wait calls are scheduled to the io_threads by the main thread. In all other cases, the main thread executes the epoll_wait with the waiting time as in the original Valkey implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Socket polling system calls, such as epoll_wait, are expensive procedures. When executed solely by the main thread, epoll_wait consumes more than 20 percent of the time. Therefore, we decided to offload epoll_wait execution to the io_threads in the following way: to avoid race conditions, at any time, at most one thread, either an io_thread or the main thread, is executing epoll_wait. Io_threads never sleep on epoll, and whenever there are pending IO operations or commands to be executed, epoll_wait calls are scheduled to the io_threads by the main thread. In all other cases, the main thread executes the epoll_wait with the waiting time as in the original Valkey implementation | |
Socket polling system calls, such as epoll_wait, are expensive procedures. | |
When executed solely by the main thread, epoll_wait consumes more than 20 percent of the CPU time of the process. | |
Therefore, we offload `epoll_wait` execution to the I/O thread when necessary by scheduling an epoll job from the main thread to an I/O thread. | |
To avoid race conditions, the main thread will no longer execute `epoll_wait` until the poll job has completed, ensuring that only one thread is executing the `epoll_wait` at a given time. |
|
||
### Performance Without Compromising Simplicity | ||
|
||
Implementing multi-threading can be a complex task. Over the years, contributors have been careful to maintain Redis’ and now Valkey’s simplicity . This ensures an API that can continuously evolve without the need to use complex synchronization and avoid race conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementing multi-threading can be a complex task. Over the years, contributors have been careful to maintain Redis’ and now Valkey’s simplicity . This ensures an API that can continuously evolve without the need to use complex synchronization and avoid race conditions. | |
Implementing multi-threading can be a complex task. | |
Valkey strives to stay simple by executing as much code in a single thread as possible. | |
This ensures an API that can continuously evolve without the need to use complex synchronization and avoid race conditions. |
+++ | ||
## AWS to Contribute Efficiency Improvements for Valkey 8 | ||
|
||
From simple in-memory caching implementations to complex job queues, real-time collaboration, and leaderboards applications, we at AWS are continually amazed by how innovatively users employ Valkey. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From simple in-memory caching implementations to complex job queues, real-time collaboration, and leaderboards applications, we at AWS are continually amazed by how innovatively users employ Valkey. | |
From simple in-memory caching implementations to complex job queues, real-time collaboration, and leaderboard applications, we at AWS are continually amazed by how innovatively users employ Valkey. |
|
||
From simple in-memory caching implementations to complex job queues, real-time collaboration, and leaderboards applications, we at AWS are continually amazed by how innovatively users employ Valkey. | ||
|
||
Clearly, this is just the tip of the iceberg. As more use cases and industries want to benefit from the speed, low latency, and cost reduction advantages of in-memory processing as introduced by Valkey, we are fully committed to this vision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just need to modify the wording to better indicate that the vision is performance.
|
||
Clearly, this is just the tip of the iceberg. As more use cases and industries want to benefit from the speed, low latency, and cost reduction advantages of in-memory processing as introduced by Valkey, we are fully committed to this vision. | ||
|
||
### Our Commitment to Efficiency |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### Our Commitment to Efficiency | |
### Our Commitment to Performance Efficiency |
We have two efficiency threads, memory density and performance, I think we should be clear about which one we're optimizing here.
|
||
We are excited about the new Linux Foundation sponsorship for Valkey and are taking a bigger step, contributing our major performance improvements and expertise. Starting with version 8, Valkey users will benefit from a breakthrough in performance, thanks to a new multi-threading implementation that can can considerably boost performance on a wide range of hardware types. | ||
|
||
For caching workloads, Valkey users will be able to increase maximum requests per second to over 1 million on multi-core machines such as AWS EC2 r7g.4xl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to couch this statement as well, because in OSS we talk a lot about pipeline performance, and Valkey can already do 1M+ RPS per process with batching.
I couldn't update the current PR due to permission issues, so I addressed the PR comments in a new PR: #102. |
new PR: #102. |
### Description Addressed PR comments for I/O threads blog post. Previous PR: #98 ### Issues Resolved - ### Check List - [ V] Commits are signed per the DCO using `--signoff` By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License. --------- Signed-off-by: Dan Touitou <[email protected]> Signed-off-by: Uri Yagelnik <[email protected]> Co-authored-by: Dan Touitou <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
### Description Addressed PR comments for I/O threads blog post. Previous PR: #98 ### Issues Resolved - ### Check List - [ V] Commits are signed per the DCO using `--signoff` By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License. --------- Signed-off-by: Dan Touitou <[email protected]> Signed-off-by: Uri Yagelnik <[email protected]> Co-authored-by: Dan Touitou <[email protected]> Co-authored-by: Madelyn Olson <[email protected]>
Description
Adding a blog describing at high level the performance contribution for Valkey8
Issues Resolved
Check List
--signoff
By submitting this pull request, I confirm that my contribution is made under the terms of the BSD-3-Clause License.