-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
experimental using openMP for encode #257
Conversation
7a295af
to
50ff6dc
Compare
Apparently cmake has issues, I don't know why, I just added -fopenmp |
On the following HW:
|
@@ -121,6 +121,7 @@ include(CheckCXXCompilerFlag) | |||
set(COMMON_CXX_FLAGS | |||
-pipe | |||
-Wall | |||
-fopenmp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To do it in a portable way, we should use use find_package
and use the populated variable.
An example is shown here :
cmake_minimum_required(VERSION 3.9)
project(solver LANGUAGES CXX)
find_package(OpenMP REQUIRED)
add_executable(solver solver.cc)
target_link_libraries(solver PRIVATE OpenMP::OpenMP_CXX)
Note that we may need to bump our minimum version for CMake, because OpenMP support got reworked in 3.9 (which is only one year old: I don't know if it's too recent to be easily used from most of Linux distribution).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative way:
find_package(OpenMP)
if (OPENMP_FOUND)
set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")
set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")
set (CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} ${OpenMP_EXE_LINKER_FLAGS}")
endif()
If OpenMP is not found, its acceleration #pragma will be disabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't always work (hence the revamp in 3.9), see https://cmake.org/Bug/view.php?id=15393
Yes, OpenMP is nice and easy to use, that's for sure and thus it can be a useful tool to quickly test which part we want to parallelize and check if it's worth it or not. Now, as you mentionned, I'm not sure if we do want to use it in "production" because:
So, if we have to implement our own parallelization solution as fallback for these cases, why not only use our own implementation? Using both OpenMP/homemade fallback will make us maintain and test these two codebase/codepath? We already have 3 differents build to test now (scalar, SSE and AVX (and we may get AVX512 and NEON in the future)), if we add another variation point (OpenMP/fallback), the number of combinations to test will grows Now, if OpenMP is well supported on all our targets then we should go for it instead of using an homemade solution, of course. |
@catid what do you think ? shall we use OpenMP or threads in this lib ? Is it interesting to provide the threading facility for the caller instead of providing a library operating on a single core ? |
My opinion on this part is that implementing multithreading into the lib (if it's worth it, and your little experiment with OpenMP seems to show that it's worth it!) then we give our users more choice/flexibility. If we support multithreading in the library, then our users have the choice between:
If we don't implement multithreading, well then only the approach 2 is possible for the user, no choice. |
Please continue the discussion on issue #249 |
FIX #249
I was playing with openMP and encode speed, and just by doing this, on my 2-core VirtualBox I was able to double the speed of the encode basically as shown in those 2 measures:
Without:
With:
I assume using it on 4 cores with quadruple the speed, etc
There is a little bit more work due to breaks that will require using shared conditions, but parallelizing the decode is feasible because there are a lot of for loops.
I think more and more this is the right approach for threading this lib (indeed using threads will involve using thread pools and can complexify the code, etc).
We nevertheless need to have a clean implem in case the pragma is not supported by the compiler.