Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for extended memops #59

Merged
merged 99 commits into from
May 30, 2018
Merged

support for extended memops #59

merged 99 commits into from
May 30, 2018

Conversation

drossetti
Copy link
Contributor

support for experimental APIs.
it must build without those if --enable-extended-memops is not passed in build.sh

@drossetti drossetti requested a review from e-ago May 10, 2018 02:36
@e-ago
Copy link
Collaborator

e-ago commented May 23, 2018

Test on DGX-1V (lab12) with regular driver 410.02 (http://linuxqa/builds/release/display/x86_64/410.02 ) and both CUDA 9.0 / 9.2.
Script used for tests: run_libgdsync.sh script, gdasync suite ( https://github.com/gpudirect/gdasync/blob/master/Scripts/run_libgdsync.sh )

Test without extmemops, CUDA 9.2:

Test with extmemops, CUDA 9.2:

  • gds_sanity*: FAILED
[13689] GDS DBG  gds_map_mem() ptr=0x2cb7000 size=64 mem_type=00000002
[13689] GDS ERR   gds_fill_inlcpy() error, inline copy is unsupported
[13689] GDS ERR   gds_stream_post_descriptors() error 22 in gds_fill_poke
Assertion "gds_stream_post_descriptors(gpu_stream, k, descs, 0) != cudaSuccess" failed at ../tests/gds_sanity.c:241 error=22(Invalid argument)

Same results for CUDA 9.0.

Same results on brdw0 with GPU P100, driver 410.02 and new installed CUDA 9.2. gds_kernel_latency can't be tested because brdw1 is still down.

@drossetti
Copy link
Contributor Author

drossetti commented May 23, 2018

gds_kernel_latency works for me on ivy2/3 with:
EXTRA+=" --enable-test"
with and without
EXTRA+=" --enable-extended-memops"
EXTRA+=" --enable-nvtx"

my test script is ~drossetti/.../peersync/src/libgdsync/gds_kernel_latency.sh

for example:

# running NP=2, ./../scripts/wrapper.sh <>/peersync/local-ofed4.2/bin/gds_kernel_latency -E -G 0 -S 1048576 -s 4096 -B 20 -n 5000 -K 1 -U, stdout:1
...
pre-posting took 323.00 usec
[0] 40960000 bytes in 0.08 seconds = 3952.76 Mbit/sec
[0] 5000 iters in 0.08 seconds = 16.58 usec/iter
[0] dumping prof
[1] 40960000 bytes in 0.08 seconds = 3952.57 Mbit/sec
[1] 5000 iters in 0.08 seconds = 16.58 usec/iter
[1] dumping prof

@e-ago
Copy link
Collaborator

e-ago commented May 24, 2018

On ivy2/3 there is a Kepler (and MLNX_OFED_LINUX-4.2-1.0.0.0) while on DGX there is a Volta (and MLNX_OFED_LINUX-4.3-1.0.1.0) and the difference seems that the NOR op on Volta is enabled and used by libgdsync.

Disabling the NOR peer->has_wait_nor = false; in gdsync.cpp fixes both problems in issue #61.
That is:

  • gds_kernel_latency properly works with all run_libgdsync.sh different parameters without returning warnings like: gpu_wait_tracking_event nothing to do (12) or [7] unexpected rx ev 13, batch len 20
  • hpgmg returns correct results everywhere

@drossetti
Copy link
Contributor Author

note that parameters name for both gds_kernel tests have changed, e.g. -K vs -k

@drossetti
Copy link
Contributor Author

NOR should be enabled on master already, so let's treat this as a separate problem. filed #68 to track progress on that.

@e-ago
Copy link
Collaborator

e-ago commented May 25, 2018

Tested on DGX-1V lab12, r410.04 without new memops, most updated mlnx fw.
With GDS_DISABLE_WAIT_NOR=1 everything works fine.
With GDS_DISABLE_WAIT_NOR=0 gds_kernel_latency is not working with both OFED 4.3 and 4.2-1.2.0.0

@e-ago
Copy link
Collaborator

e-ago commented May 30, 2018

Tested on DGX-1V with CUDA 9.2 and r410.04 (no memops).
All tests in https://github.com/gpudirect/gdasync/blob/master/Scripts/run_libgdsync.sh passed with NOR disabled.
Merging this PR.

@e-ago e-ago merged commit 205432d into master May 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants