Explicitly specify used NVTX3 C++ wrapper version #1688

bernhardmgruber · 2024-05-02T12:59:02Z

This is suggested in the corresponding documentation: https://github.com/NVIDIA/NVTX/blob/09e0d23a61ae86cc381eef85d012afc3a2e6eeea/c/include/nvtx3/nvtx3.hpp#L21-L39

gevtushenko · 2024-05-02T14:07:38Z

@bernhardmgruber how does this change interact with user code relying on nvtx API?

gevtushenko

I rushed approving the PR. Answering my own question, the change doesn't play well with user code:

#include <cub/device/device_reduce.cuh>
#include <nvtx3/nvtx3.hpp>

struct my_domain{ static constexpr char const* name{"my domain"}; };

int main() {
  nvtx3::scoped_range_in<my_domain> domain; // fails to compile
}

bernhardmgruber · 2024-05-02T15:34:40Z

#include <cub/device/device_reduce.cuh>
#include <nvtx3/nvtx3.hpp>

struct my_domain{ static constexpr char const* name{"my domain"}; };

int main() {
  nvtx3::scoped_range_in<my_domain> domain; // fails to compile
}

Where would the #include <nvtx3/nvtx3.hpp> resolve to? This header does not ship in any CTK and is also not present in this repository. Funnily though, if I compile this code, nvcc errors the same way as you described and does not get stuck at the second include (which does not exist).

bernhardmgruber · 2024-05-02T15:49:51Z

@bernhardmgruber how does this change interact with user code relying on nvtx API?

After looking through nvtx3.hpp, I understand the header to work as follows:

all symbols are exported in namespace nvtx3::v1, and the namespace v1 is marked as inline. Therefore, all symbols are available as nvtx3::* and nvtx3::v1::*.
when NVTX3_CPP_REQUIRE_EXPLICIT_VERSION is defined before including the header, v1 is not marked as inline. Symbols are only available as nvtx3::v1::*.
nvtx3.hpp does not have traditional header guards and remembers which API declarations it already provided instead. It defines NVTX3_CPP_DEFINITIONS_V1_0 for example. So if a different nvtx3.hpp would be included that defines a newer API, that header would not emit the V1 declarations, but only add declarations of later versions.
API declarations are only emitted once per version
Whether the version namespaces are inline or not can only be chosen at the time the declarations are emitted.

Therefore, when we define NVTX3_CPP_REQUIRE_EXPLICIT_VERSION and then include nvtx3.hpp, it will emit a non-inlined V1 API. Then we #undef NVTX3_CPP_REQUIRE_EXPLICIT_VERSION. When the user later includes an nvtx3.hpp containing the same API version (major and minor), the header does nothing. If the user includes an nvtx3.hpp containing a different API version, that version will be emitted with inline namespace in addition to the explicit version that the include from CUB emitted.

Bottom line: this PR breaks user-side use of NVTX3 if they happen to use the non-explicit API (which is the default) and use the same major and minor API version (also the default because only API 1.0 exists).

bernhardmgruber · 2024-05-02T16:18:55Z

#include <cub/device/device_reduce.cuh>
#include <nvtx3/nvtx3.hpp>

struct my_domain{ static constexpr char const* name{"my domain"}; };

int main() {
  nvtx3::scoped_range_in<my_domain> domain; // fails to compile
}
Where would the #include <nvtx3/nvtx3.hpp> resolve to? This header does not ship in any CTK and is also not present in this repository. Funnily though, if I compile this code, nvcc errors the same way as you described and does not get stuck at the second include (which does not exist).

After playing with this for a while and supplying nvtx3.hpp, either via an include path pointing to a cloned NVTX repository or our vendored nvtx3.hpp, I conclude that we cannot define NVTX3_CPP_REQUIRE_EXPLICIT_VERSION as recommended by the NVTX documentation, because it can break usercode.

copy-pr-bot · 2024-05-03T07:20:06Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

This is suggested in the corresponding documentation: https://github.com/NVIDIA/NVTX/blob/09e0d23a61ae86cc381eef85d012afc3a2e6eeea/c/include/nvtx3/nvtx3.hpp#L21-L39

bernhardmgruber · 2024-05-03T18:22:46Z

As expected, with the test from #1690, this PR now fails with:

cub/test/CMakeFiles/cub.cpp17.test.nvtx_in_usercode.dir/test_nvtx_in_usercode.cu.o
  /home/coder/cccl/cub/test/test_nvtx_in_usercode.cu(17): error: namespace "nvtx3" has no member "scoped_range"
      nvtx3::scoped_range range("user-range");
             ^
  
  1 error detected in the compilation of "/home/coder/cccl/cub/test/test_nvtx_in_usercode.cu".

We thus cannot use the explicitly versioned API from the NVTX3 C++ wrapper.

gevtushenko · 2024-05-06T16:37:38Z

@bernhardmgruber I believe this was closed prematurely. We discussed that we don't need NVTX3_CPP_REQUIRE_EXPLICIT_VERSION in CUB sources, but I think we need explicit ::v1 when referring to nvtx. Without ::v1 the following code breaks:

#define NVTX3_CPP_REQUIRE_EXPLICIT_VERSION
#include <nvtx3/nvtx3.hpp>
#include <cub/device/device_reduce.cuh>

int main() {}

bernhardmgruber · 2024-05-07T10:47:08Z

I think we need explicit ::v1 when referring to nvtx. Without ::v1 the following code breaks:
#define NVTX3_CPP_REQUIRE_EXPLICIT_VERSION
#include <nvtx3/nvtx3.hpp>
#include <cub/device/device_reduce.cuh>

int main() {}

This code does indeed break. However, CUB internally does this:

#  if __has_include(<nvtx3/nvtx3.hpp>)
#    include <nvtx3/nvtx3.hpp>
#  else // __has_include(<nvtx3/nvtx3.hpp>)
#    include "nvtx3.hpp"
#  endif // __has_include(<nvtx3/nvtx3.hpp>)

to choose whether to pick the vendored nvtx3.hpp (V1), or take the one available in the include directories (e.g. from the CTK, could be any version). So if CUB would use explicitely versioned symbols like:

::nvtx3::v1::scoped_range_in<...>

and the file <nvtx3/nvtx3.hpp> happens to exist in the include directories and provide a version different than V1. CUB would break, because it would not take the vendored nvtx3.hpp, but the user provided and version incompatible <nvtx3/nvtx3.hpp>.

bernhardmgruber · 2024-05-07T10:55:45Z

As discussed above, we can also not ship our own nvtx3.hpp that we use with an explicit version (define NVTX3_CPP_REQUIRE_EXPLICIT_VERSION, include, undef), becasue if the user happens to use the same version as we use internally but wants to have the implicit versions, the user is broken as well.

I see no way how to create a situation where neither CUB nor the user can be broken in some way when various or equal, explicit or implicit nvtx3 versions are in the game, without modifiying nvtx3.hpp to allow including the same header multiple times and letting it re-emit the API when NVTX3_CPP_REQUIRE_EXPLICIT_VERSION is defined or not. So this has to be fixed upstream in NVTX. Should we attempt that?

The root culprit is simply that whatever API version CUB uses, the user cannot have the same API version with the opposite explicitness.

bernhardmgruber · 2024-05-16T22:02:37Z

@gevtushenko I created an issue that covers the failing program you listed above: #1750

bernhardmgruber requested review from a team as code owners May 2, 2024 12:59

bernhardmgruber requested review from gevtushenko and alliepiper May 2, 2024 12:59

bernhardmgruber added the cub For all items related to CUB label May 2, 2024

gevtushenko approved these changes May 2, 2024

View reviewed changes

gevtushenko requested changes May 2, 2024

View reviewed changes

bernhardmgruber marked this pull request as draft May 2, 2024 15:53

bernhardmgruber mentioned this pull request May 2, 2024

NVTX3_CPP_REQUIRE_EXPLICIT_VERSION is problematic in header-only libraries NVIDIA/NVTX#93

Open

bernhardmgruber force-pushed the explicit_nvtx3_cpp branch from a4b30cb to caf3944 Compare May 3, 2024 07:20

Explicitly specify used NVTX3 C++ wrapper version

2c9e51d

This is suggested in the corresponding documentation: https://github.com/NVIDIA/NVTX/blob/09e0d23a61ae86cc381eef85d012afc3a2e6eeea/c/include/nvtx3/nvtx3.hpp#L21-L39

bernhardmgruber force-pushed the explicit_nvtx3_cpp branch from caf3944 to 2c9e51d Compare May 3, 2024 07:36

bernhardmgruber closed this May 3, 2024

bernhardmgruber mentioned this pull request May 16, 2024

CUB's NVTX ranges fail to compile when usercode uses explicitly versioned NVTX API #1750

Closed

bernhardmgruber mentioned this pull request May 16, 2024

Use only explicit NVTX3 V1 API in CUB #1751

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explicitly specify used NVTX3 C++ wrapper version #1688

Explicitly specify used NVTX3 C++ wrapper version #1688

bernhardmgruber commented May 2, 2024

gevtushenko commented May 2, 2024

gevtushenko left a comment

bernhardmgruber commented May 2, 2024

bernhardmgruber commented May 2, 2024

bernhardmgruber commented May 2, 2024

copy-pr-bot bot commented May 3, 2024

bernhardmgruber commented May 3, 2024

gevtushenko commented May 6, 2024

bernhardmgruber commented May 7, 2024 •

edited

Loading

bernhardmgruber commented May 7, 2024 •

edited

Loading

bernhardmgruber commented May 16, 2024

Explicitly specify used NVTX3 C++ wrapper version #1688

Explicitly specify used NVTX3 C++ wrapper version #1688

Conversation

bernhardmgruber commented May 2, 2024

gevtushenko commented May 2, 2024

gevtushenko left a comment

Choose a reason for hiding this comment

bernhardmgruber commented May 2, 2024

bernhardmgruber commented May 2, 2024

bernhardmgruber commented May 2, 2024

copy-pr-bot bot commented May 3, 2024

bernhardmgruber commented May 3, 2024

gevtushenko commented May 6, 2024

bernhardmgruber commented May 7, 2024 • edited Loading

bernhardmgruber commented May 7, 2024 • edited Loading

bernhardmgruber commented May 16, 2024

bernhardmgruber commented May 7, 2024 •

edited

Loading

bernhardmgruber commented May 7, 2024 •

edited

Loading