-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add C++17 Port #84
Add C++17 Port #84
Conversation
Nice! Would you have any benchmark numbers and how it compares to other languages? |
The benchmarking infrastructure is currently blocked by a student who wants to submit their PhD as soon as possible. So, coming soon... I am also still working on some odd performance numbers compared to the other two C++ versions. So, yeah, soon... |
Very interesting, thank you. I just run your and my version of the benchmark on my Windows 11 machine (Intel Core i3-1215U 1200MHz, 8GB RAM) with MinGW x64 12.2. Here are the results: Both perform nearly the same with geomean. Json and Havlak likely seem to do additional copies in your version, but I didn't check the source code yet. |
I did look a little at Havlak, but couldn't really see much in a basic time profiler. |
My major changes to the Vector implementation compared to Java was that the default constructor no longer does an initial allocation (so creating Vectors becomes cheaper, especially when embedded in classes by value), and the operator= does no extra allocations. Also note that I used my original version of the code (i.e. the main branch), not the version where I replaced the dynamic by static arrays (see #80 (comment)). EDIT: the hash function is likely insignificant; my version uses a functor passed by template argument, so my implementation makes a function call and yours (assumingly) a method call; I thus spare the dispatch, which is very fast in C++. |
Hm, I am specifically looking at this: https://github.com/rochus-keller/Are-we-fast-yet/blob/main/Cpp/som/Vector.h#L34-L46 So, here, I am concerned about the following
If we aim to expose the compilers/optimizers to the same challenges, these changes seem problematic to me, because I would naively assume that they change what the programs are doing. |
I would assume that the only performance relevant point is 4) the initial size of 0 vs. 50, which calls 50 times the default constructor with no need, if elements are by value. This adds up when Vector is used by value in other classes. The other points make likely less than 10% difference. But I will eventually go over my code again, as soon as your version is official, so that we can compare "modern" with "modern modern" C++. What is actually your official target C++ version (didn't check yet)? |
Indeed, changing this and removing the extra +50 on the growth brings the two versions roughly on par. Though, the "no need" discussion is in my mind another discussion we may want to have than rule compliance. Same with the value "50". I have no recollection where it is coming from, and it seems high. So, one could think about changing all implementations to not allocate storage initially, or use a conservative size of 3 or 10. But I think that's something that would need to be applied to all languages.
Yeah, I am not claiming it makes a difference in practice, I am merely claiming that these are differences, and at least in my mind, not fully rule compliant.
Well, "official"... I mean, I am happy to have these types of discussions to make sure things "make sense". I aimed for C++17 as language target, also when choosing things like |
But keep in mind that C++ is the first language in your benchmark where objects can be either reference or value types. The issue with the 50 preallocated slots is strongly connected with using objects by value and the fact that C++ generates a couple of methods for every class (e.g. a constructor) even if there isn't one in the code. I recommend that this is considered in your rules, otherwise you risk to end up with code no experienced C++ programmer would write. The "+50" have likely very little influence on performance; it's just to accommodate for newLength==0; we could also write
Even your C++17 code didn't compile with my pretty recent Visual Studio compiler, why I switched to MinGW. |
I now had a look at Json. The key difference here seems to be that the The whole |
But it's really a major change of behavior. Yes, there's a goal to make sure the code is "idiomatic", but that can't come at the cost of behavioral changes. I am very happy to consider changing the Vector behavior in general, but I strongly believe that this would need to be applied consistently. The fortunate thing is that this should be a local change to Vector, so, it shouldn't have too many unintended consequences in terms of code. Though, it will have consequences in terms of what the benchmarks measures.
Hm, what does it complain about? |
Hmm, well, the other question one might want to ask is whether Vector should really be treated as a value. I do stack allocations for it in a number of places, but I think I rarely use it as a true value. The few places where I do, it's fields in objects that always allocate it. In CD and Havlak where it is used in a Vector itself, I did only use pointers. |
C++ doesn't even have a switch statement which can compare strings. It will be pretty difficult to convince the C++ community to use a string of length one instead of a char. I did comparisons (see https://github.com/rochus-keller/Are-we-fast-yet/blob/2f44088dfc0500a597cee3f1f9a165502c26b28b/Cpp/Json.cpp#L33) and decided for char mostly because I was interested in comparing my different language implementations with a decently credible C++ implementation; so I left the design similar to Java als long as not too costly, but in this case using a string instead of char would have made my performance claims implausible (I have even received very nasty comments for the parts I left like Java, even though the impact on performance was small). |
The original design assumes a language where objects are "by reference" and optimizes for such a language in that there is an inital length of 50. Allocating the 50 for those languages causes no additional cost. In C++, we could (and maybe should if you insist on maximum similarity with the original design) also use all objects by reference. But if you allow for "objects by value", then the rules should also consider constructors and assignment operators. EDIT: I would rather consider to change the original design, so that every Vector implementation starts with length 0.
Didn't even look at the errors, just switched. I think my compiler is also version 17 i.e. MSVC version 19. But using MinGW instead is no problem from my perspective. EDIT: I will try again and check what's the problem. |
That decision should then affect all classes. You could decide to fully do without profiting from the capability of C++ to use classes by value (either on stack or embedded als fields in other objects) in favour of comparability. But that would be very difficult to sell to the C++ community.
I used classes by value whereever it makes sense, even when a Vector is a value of another collection. I did the same for my Oberon+ implementation, because from my point of view that's an essential advantage compared to languages like Java or Smalltalk. So if the benchmark should demonstrate the full potential of languages like C++ or Oberon compared to more dynamic languages, using classes by value should be allowed. |
#90 proposes a change to |
I appreciate that single-char strings might not be idiomatic to C++ programmers. I will adapt the list of abstractions of our imaginary Core Language accordingly, and explicitly exclude |
Isn't this a similar issue like the "by value" vs "by reference" topic? I understand the rule when it comes to core language vs. standard library features. But shouldn't the features of the core language be honored when available, especially when performance relevant? A language designer might decide for valid reasons to only offer dynamic strings and no char data type, or a char data type which is a dynamic object as well; but should we then "punish" (in that the possibilities of the language are artificially limited) all languages which have a native char datatype and can thus avoid allocations? Isn't that exactly the result we expect to see when comparing static vs dynamic languages? |
Yeah, so, that's a bit of a debate to be had. My reasoning was that by-value vs by-reference is something a compiler/runtime system may optimize at least to a degree. Of course, many caveat apply.
Well, only to a degree. The most important principle for me is comparability, and "idiomatic language use" comes for me only after that.
Why is this a punishment? There may be perfectly valid reasons to end up with idiomatic code that has such challenges, i.e., using
No? And to me that's also not the question. Instead, I want to see where compilers can learn from each other. |
Perhaps to put it another way: One could ask the question of what can we squeeze out of a language when using all its specific tricks. I believe that's roughly the question behind the Computer Language Benchmarks Game. Though, I'd rather try to turn the question on its head and ask: what do we need to teach our compilers to get the best possible run time performance for code written in a "reasonable subset" that happens to be comparable across different languages. Unfortunately, an entirely different question is then "How relevant is this subset to application performance?". I am not sure how to really approach that question though. |
This inevitably leads to the question of what we expect from a cross-language benchmark. Since programming languages can differ not only in their syntax, but also in their fundamental design concepts, for me personally it is very interesting what the cost is of these specific design concepts. E.g. treating everything as objects exchanging messages looks like a very attractive approach; but I would like to understand the trade-off; Awfy gives me a tool to quantify this tradeoff; of course this only works if each implementation of the benchmark actually makes use of the design concepts of the specific language in use. I could e.g. see how the same application performs in Ada, which is interesting because Ada offers means to avoid dynamic allocation alltogether, combined with strict typing. If the Ada implementation would be forced to look like Java because of comparability rules, the comparison would not be helpful, because this wouldn't make use of the specific design concepts Ada offers. On the other hand, neither a fragile construct making use of all compiler specific backdoors no Ada developer in his right mind would ever write, just to be as fast as possible, would neither lead to a helpful comparison. That's why I like the approach of Awfy, to just use language features and implement collections as part of the benchmark, not using libraries the design and optimization of which is not controllable nor comparable. The goal is to understand the cost and benefit of language features, not to be as fast as possible (as in the CLBG, which lacks specific benchmark rules). Therefore, from my point of view, the benchmark is most useful, if a language can be used as conceived by its designer. Since it is not known what languages will exist in future and used to implement the benchmark, the rules must be adapted to accomodate the specific design concepts of each new language. Even different versions of a language can require different benchmark rules, as e.g. with C++; and there might even be different "schools" per language or version that have completely different views on how the language should be used (as again with C++). The benchmark should therefore also allow for specific implementations for different schools, insofar as they are clearly identifiable and specifiable. Only in this way I could e.g. compare my "modern" C++98 approach with a "modern modern" policy based header-heavy C++-17 design, or understand why by the end of the day a C++ implementation is actually faster than an equivalent Smalltalk implementation. If the benchmark forces C++ to look like Java or JavaScript instead, the result is not representative. Sorry for the long response. PS: concerning the JSON benchmark and "current" as char vs std::string, the conclusion that the former is twice as fast as the latter is a valuable result from my point of view and should be possible to achieve within the benchmark rules. PS2: most modern languages support some kind of conditional compilation which could be used to maintain a "core language version" in parallel with as many versions using specific features of the language at hand as required. |
The key point I take from all these considerations is that there can't be a single true set of rules that covers all interesting questions. And in many ways, I think that is natural, because all the different aspects can't be reduced to a single number in a useful way. (Though, it also then raises the question of whether the benchmarks currently included cover enough ground. But let's put that aside for the moment.) Since you mention collections, there's a variant of AWFY for the dynamic languages. I think this is a very useful variant, with more "idiomatic" code. However, this comes at a cost: we cannot guarantee anymore that execution is deterministic, because relying on builtin hash maps for instance, my use non-deterministic hash criteria, like object memory addresses. In a similar vain, the question of "fully heap allocated" vs. "value/stack allocation" only in a context of C++, Ada, or other languages makes perfect sense. And indeed, because the languages may make such concepts explicit, it may not make sense to teach the compiler all possible tricks, and instead rely on the developer to make educated decisions. Of course, one could have both versions/style and compare how far compilers can take it. With this all said, I think for the "standard" AWFY benchmarks, that is, the main branch, I would like to make a reasonable-enough decision, which walks an interesting enough path between the extrema. Simply because I think we can't really cover all possible variants.
My perspective is still that it is more useful for my purposes to have a single well-enough defined and small core language.
I think there's indeed room for various versions, though, I personally do not want to volunteer to maintain those additional versions. There are too many perfectly reasonable goals one may have, but I think, for my own sanity, I need to stick to a very small set, that I realistically can handle... |
ec40750
to
ed17feb
Compare
…s from the methods so that they can be null even when T is a value type - also hange order in signature to matchthe other languages Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
- make int types consistent to avoid random conversion - add memory management, sanatized version works, but normal version segfaults (there’s a missing initialization fix for dictionary.h in some later commit) Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
- added style target Signed-off-by: Stefan Marr <[email protected]>
…e language And some minor edits. Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
Signed-off-by: Stefan Marr <[email protected]>
The PR is now merged. Things can always be changed at a later point. |
This PR adds a C++ port, using C++17 language and library features.
Below, I am also adding the corresponding rules and guidance, which is still open for discussion.
The first version of the code in this PR was written by @Hjeanningros and can be found here.
@rochus-keller also provided valuable inspiration with his earlier port here.
The GitHub CI setup currently doesn't run style checks, because of some issue with the config format, probably using the wrong clang version.
The GHActions Ubuntu image is also updated to 22.04, except for SOMns, which requires Python 2 because of an old mx version.
C++ Guidelines
With C++, we add support for the first language that does not have garbage
collection, which comes with a new dimension of issues to be considered.
Explicit Memory Management Rules:
benchmarks have to run without memory leaks
stack allocation can be used where natural and where it does not change the
nature of the benchmark, for instance when an object/array is used only in
the dynamic scope of a function
existing data structures should be used where possible to manage
dynamically-created objects, for instance, iterate over an already existing
list of objects at the end of the benchmark or a method to free them
changes to code structure and APIs should be as minimal as possible,
for instance, if a method returns an allocated object, leave it as such,
and let the caller manage the memory
if the useful lifetime of object fields is restricted to a method,
the allocations referenced by these fields should be freed before
the end of a method, but the field should remain a field.
for arbitrary object graphs, as in DeltaBlue, memory/object_tracker.h can
be used to free the objects when not needed.
The use of shared_ptr may also be appropriate, but did not work for DeltaBlue.
Memory Management Strategies Per Benchmarks:
CD use value objects for most data. Since it's a tree, the red/black tree
is trivially managed by deleting the nodes from the root. Vectors are managed
explicitly for the voxel map. Don't miss the empty vectors that are not
passed on as result though.
DeltaBlue uses
object_tracker
, since there are cyclic dependencies,but we can free the full setup once it's not needed.
A mix of
shared_ptr
andweak_ptr
would probably also work.Havlak manages memory explicitly by assigning ownership to specific
classes. Specifically, the ControlFlowGraph owns the basic blocks and
block edges, the LoopStructureGraph owns the loops, the HavlakLoopFinder
owns its data including UnionFindNodes. Thus, the destructors can free
the corresponding memory.
Json relies on JSON documents being trees, and uses the
tree to free objects. The major tradeoff here is that we need to allocate
true
,false
, andnull
literal objects to have a uniform memory representation.Though, otherwise, we do not require any management overhead.
Richards uses
object_tracker
for simplicity.It could use
shared_ptr
and accounting for cyclic references that wouldwork, too. A naive using of the task list did not seem to work,
but I might have missed something.
Bounce allocates everything statically, i.e., on the stack.
List trivially uses the list structure for freeing the list.
Mandelbrot does not allocate any data structures.
NBody allocates everything statically, i.e., on the stack.
Permute allocates an array dynamically, and frees it directly.
Since the benchmark holds the reference in a field, and allocates on
each iteration, the new/delete dance is needed to comply
Queens allocates its arrays dynamically, and frees them directly,
same as Permute.
Sieve allocates everything statically, i.e., on the stack.
Storage allocates its tree dynamically, and frees it from the root.
Towers allocates the disks dynamically, which form a linked list,
that is used to free them once not needed anymore.
General C++-isms:
the benchmarks, where possible, can be in headers only to match the code
structure of other languages
we use clang-tidy and clang-format
use std::array for fixed-sized arrays
use
const
where it is appropriate, but it won't really work with containersand can be problematic for value classes
use
auto
andauto*
to make code more concise as recommended by linter,for instance for allocations
use annotations like [[nodiscard]] where indicated by the linter
use modern C++-isms, for instance range loops and .at instead of [] on std::array
use initializer syntax for default values and member initializers lists when depending on constructor parameter
prefer
int32_t
,size_t
and similar to be more explicit about semanticsand size of value, plain
int
/long
shouldn't be usedavoid changing signatures for the sake of the compiler. It should do an
appropriate return-value optimization itself.