Hardware: Intel(R) Xeon(R) CPU E3-1225 V2 @ 3.20GHz
Software: Windows 10, MSVC 2017, MinGW GCC 7.2.0
Time unit: milliseconds (unless explicitly specified)
Iterations | Queue size | Event count | Event Types | Listener count | Time of single threading | Time of multi threading |
---|---|---|---|---|---|---|
100k | 100 | 10M | 100 | 100 | 401 | 1146 |
100k | 1000 | 100M | 100 | 100 | 4012 | 11467 |
100k | 1000 | 100M | 1000 | 1000 | 4102 | 11600 |
Given eventpp::EventQueue<size_t, void (size_t), Policies>
, which Policies
is either single threading or multi threading, the benchmark adds Listener count
listeners to the queue, each listener is an empty lambda. Then the benchmark starts timing. It loops Iterations
times. In each loop, the benchmark puts Queue size
events, then process the event queue.
There are Event types
kinds of event type. Event count
is Iterations * Queue size
.
The EventQueue is processed in one thread. The Single/Multi threading in the table means the policies used.
Mutex | Enqueue threads | Process threads | Event count | Event Types | Listener count | Time |
---|---|---|---|---|---|---|
std::mutex | 1 | 1 | 10M | 100 | 100 | 2283 |
SpinLock | 1 | 1 | 10M | 100 | 100 | 1692 |
std::mutex | 1 | 3 | 10M | 100 | 100 | 3446 |
SpinLock | 1 | 3 | 10M | 100 | 100 | 3025 |
std::mutex | 2 | 2 | 10M | 100 | 100 | 4000 |
SpinLock | 2 | 2 | 10M | 100 | 100 | 3076 |
std::mutex | 4 | 4 | 10M | 100 | 100 | 1971 |
SpinLock | 4 | 4 | 10M | 100 | 100 | 1755 |
std::mutex | 16 | 16 | 10M | 100 | 100 | 928 |
SpinLock | 16 | 16 | 10M | 100 | 100 | 2082 |
There are Enqueue threads
threads enqueuing events to the queue, and Process threads
threads processing the events. The total event count is Event count
. Mutex
is the mutex type used to protect the data.
The multi threading version shows slower than previous single threading version, since the mutex locks cost time.
When there are fewer threads (about around the number of CPU cores which is 4 here), eventpp::SpinLock
has better performance than std::mutex
. But there are much more threads than CPU cores (here is 16 enqueue threads and 16 process threads), eventpp::SpinLock
has worse performance than std::mutex
.
The benchmark loops 100K times, in each loop it appends 1000 empty callbacks to a CallbackList, then remove all that 1000 callbacks. So there are totally 100M append/remove operations.
The total benchmarked time is about 21000 milliseconds. That's to say in 1 milliseconds there can be 5000 append/remove operations.
Iterations: 100,000,000
Function | Compiler | Native invoking | CallbackList single threading | CallbackList multi threading |
---|---|---|---|---|
Inline global function | MSVC 2017 | 217 | 1501 | 6921 |
GCC 7.2 | 187 | 1489 | 4463 | |
Non-inline global function | MSVC 2017 | 241 | 1526 | 6544 |
GCC 7.2 | 233 | 1488 | 4787 | |
Function object | MSVC 2017 | 194 | 1498 | 6433 |
GCC 7.2 | 212 | 1485 | 4951 | |
Member virtual function | MSVC 2017 | 207 | 1533 | 6558 |
GCC 7.2 | 212 | 1485 | 4489 | |
Member non-virtual function | MSVC 2017 | 214 | 1533 | 6390 |
GCC 7.2 | 211 | 1486 | 4872 | |
Member non-inline virtual function | MSVC 2017 | 206 | 1522 | 6578 |
GCC 7.2 | 182 | 1666 | 4593 | |
Member non-inline non-virtual function | MSVC 2017 | 206 | 1491 | 6992 |
GCC 7.2 | 205 | 1486 | 4490 | |
All functions | MSVC 2017 | 1374 | 10951 | 29973 |
GCC 7.2 | 1223 | 9770 | 22958 |
Testing functions
#if defined(_MSC_VER)
#define NON_INLINE __declspec(noinline)
#else
// gcc
#define NON_INLINE __attribute__((noinline))
#endif
volatile int globalValue = 0;
void globalFunction(int a, const int b)
{
globalValue += a + b;
}
NON_INLINE void nonInlineGlobalFunction(int a, const int b)
{
globalValue += a + b;
}
struct FunctionObject
{
void operator() (int a, const int b)
{
globalValue += a + b;
}
virtual void virFunc(int a, const int b)
{
globalValue += a + b;
}
void nonVirFunc(int a, const int b)
{
globalValue += a + b;
}
NON_INLINE virtual void nonInlineVirFunc(int a, const int b)
{
globalValue += a + b;
}
NON_INLINE void nonInlineNonVirFunc(int a, const int b)
{
globalValue += a + b;
}
};
#undef NON_INLINE