-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Copy more efficiently in Vector::realloc. #26
base: master
Are you sure you want to change the base?
Conversation
} else { | ||
for (std::size_t i = 0; i < size_; ++i) { | ||
new (&new_objs[i]) T(objs_[i]); | ||
} | ||
} | ||
for (std::size_t i = 0; i < size_; ++i) { | ||
objs_[i].~T(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop can be moved inside the else clause above.
If a class is trivially copyable, it must have a trivial destructor:
Requirements
- Trivial non-deleted destructor
https://en.cppreference.com/w/cpp/named_req/TriviallyCopyable
Trivial destructors are no-op:
Objects with trivial destructors don't require a delete-expression and may be disposed of by simply deallocating their storage.
https://en.cppreference.com/w/cpp/language/destructor#Trivial_destructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are quite a few other ~T
calls, so it seems inconsistent to avoid it here, but not the other places.
The loop is optimized away with -O1
or higher anyway: https://godbolt.org/z/-zH4gd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a bit confused by the destructor calls in the trivially copyable case. It made me stop, question my understanding of C++, and look up the definition of trivially copyable.
#else | ||
const bool trivially_copyable = false; | ||
#endif | ||
if (trivially_copyable) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do this with SFINAE we can avoid both the trivially_copyable
bool and including <cstring>
for C++98.
private:
#if __cplusplus >= 201103L
template <typename T,
std::enable_if<std::is_trivially_copyable<T>::value, int>::type = 0>
static void realloc_move(T *dest, const T *src, std::size_t size) {
std::memcpy(reinterpret_cast<void*>(dest),
reinterpret_cast<const void*>(src),
sizeof(T) * size);
}
template <typename T,
std::enable_if<!std::is_trivially_copyable<T>::value, int>::type = 0>
#endif
static void realloc_move(T *dest, const T *src, std::size_t size) {
for (std::size_t i = 0; i < size; ++i) {
new (&dest[i]) T(src[i]);
}
for (std::size_t i = 0; i < size; ++i) {
src[i].~T();
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This greatly increases the level of C++ knowledge required to understand the code. All to avoid one #include
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I find the template easier to read but it's subjective.
As this code already uses placement-new and calls destructors explicitly I think we can assume that the reader has to be an expert C++ user :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
More concretely: I don't understand std::enable_if well enough to want to use it in code I write, and I wrote this. :-)
@@ -231,8 +235,19 @@ class Vector { | |||
MARISA_DEBUG_IF(new_buf.get() == NULL, MARISA_MEMORY_ERROR); | |||
T *new_objs = reinterpret_cast<T *>(new_buf.get()); | |||
|
|||
for (std::size_t i = 0; i < size_; ++i) { | |||
new (&new_objs[i]) T(objs_[i]); | |||
#if __cplusplus >= 201103L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this just require C++11?
It looks is_trivially_copyable
has been implemented by libstdc++
for 6 years and libc++
for a similar amount of time.
I'm not sure about "Copying via memcpy is significantly faster than iteratively copying individual elements" in actual cases. Do you have any data about this?
|
We did get about a 0.5-1% speedup on a large benchmark that uses marisa-trie as a small part. Microbenchmarking wasn't done or was inadequately documented, but could be done.
Lack of reuse is probably part of the problem, but harder to solve because higher level APIs wrapping marisa-trie might not operate on the Agent paradigm. Initially, I thought it would be overkill, but now I'm thinking a nicer solution would be to rewrite this using Similarly, |
const bool trivially_copyable = false; | ||
#endif | ||
if (trivially_copyable) { | ||
std::memcpy(reinterpret_cast<void*>(new_objs), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be memmove
.
I was just playing around with godbolt.org and noticed that std::copy_n
, uninitialized_copy_n
and uninitialized_move_n
all compile to memmove
for char
. This definitely makes the uninitialized_move_n
plus compatibility wrapper seem nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
memcpy
is usually faster than memmove
(but requires the memory regions to not overlap).
I'm surprised about std::copy_n
, it should compile to memcpy
as it also forbids overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I just assumed that the optimizer was doing the most efficient thing, since it usually does, so got memcpy
and memmove
switched in my head.
My initial example was actually insufficient, because my pointers could have overlapped. When one comes from new
, memcpy
is used.
-R, --reuse-on reuse agents (default) -r, --reuse-off don't reuse agents Ref: #26
The above non-trivial types have user-defined copy constructors and assignment operators, but they only copy member variables. |
Hmm, I could not confirm significant difference among for-loop, std::memcpy, and std::copy, even if agent reuse is disabled... |
The only two Vectors in grimoire::trie::State have char and History as their value types, which are both trivially copyable. Copying via memcpy is significantly faster than iteratively copying individual elements.
I saw We are seeing a few percent faster other benchmarks, though. Maybe the optimizer is doing different things between the different benchmarks and it's hard to reproduce. Would you accept that there's some situation where memcpy is faster? Would you merge a version that used something like the |
Sync to the HEAD
Merge recent changes from master into the patch-1 branch.
The only two Vectors in grimoire::trie::State have char and History as their value types, which are both trivially copyable. Copying via memcpy is significantly faster than iteratively copying individual elements.