Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

testing for portblas portfft #495

Merged
merged 10 commits into from
May 30, 2024
Merged

Conversation

rscohn2
Copy link
Member

@rscohn2 rscohn2 commented May 21, 2024

Description

Add Unit tests for portfft, portblas to CI

@rscohn2 rscohn2 marked this pull request as ready for review May 21, 2024 20:55
.github/workflows/pr.yml Outdated Show resolved Hide resolved
.github/workflows/pr.yml Show resolved Hide resolved
Copy link
Contributor

@hjabird hjabird left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With Romain's comments, LGTM.

@Rbiessy
Copy link
Contributor

Rbiessy commented May 22, 2024

We'll look into ways to reduce the time it takes to run the tests. We may need to cache the SYCL kernels across the tests.

@rscohn2
Copy link
Member Author

rscohn2 commented May 22, 2024

We'll look into ways to reduce the time it takes to run the tests. We may need to cache the SYCL kernels across the tests.

The build times are comparable to MKL backend. MKL tests run in < 5 minutes. CI failed with a timeout because both portblas and portfft tests take > 4 hours to run. Are you saying that driver compile time is the cause? I can add a regex to select a subset of the tests to run, with a target of running for 5 minutes. What do you recommend?

@Rbiessy
Copy link
Contributor

Rbiessy commented May 22, 2024

The build times are comparable to MKL backend. MKL tests run in < 5 minutes. CI failed with a timeout because both portblas and portfft tests take > 4 hours to run. Are you saying that driver compile time is the cause? I can add a regex to select a subset of the tests to run, with a target of running for 5 minutes. What do you recommend?

Yes the JIT compilation may be the issue, we'll need to confirm. I think in the current situation we would need to disable too many tests so I would suggest to wait until we know more. Running all the tests under 5 minutes seems very optimistic for portFFT or portBLAS though.

@rscohn2
Copy link
Member Author

rscohn2 commented May 22, 2024

I changed it so we build all the unit tests for portfft/portblas but only run a handful of tests. Everything passes and port* finishes before MKL. It is not what we want for the long term, but we can enable more as we get faster machines and hopefully find a way to reduce the JIT time. Are you OK with committing what is there now?

@dnhsieh-intel
Copy link
Contributor

An indirectly related question: I noticed that previous workflows were run on my forked repo. Do you think a condition like this can avoid the runs on forked repos?

@rscohn2
Copy link
Member Author

rscohn2 commented May 22, 2024

Do you think a condition like this can avoid the runs on forked repos?

That will work, but you have to put the condition in every job for every workflow. I don't think there is a file-level way to do it. An alternative is to disable github actions for your fork. It is in settings/actions/general, then look near the top.

@dnhsieh-intel
Copy link
Contributor

I see. I hope GitHub can provide a more maintainable way for this. At the same time, we may need to communicate this with contributors, but this is out of scope of this PR.

@Rbiessy
Copy link
Contributor

Rbiessy commented May 23, 2024

@rscohn2 from our local testing we have confirmed that using AOT compilation of the kernels speeds up the testing a lot. This is best to use if we know we will run on a specific device.
For the portBLAS backend it turns out this can be done by setting the cmake flag -DPORTBLAS_TUNING_TARGET=INTEL_CPU. This sets the SYCL target to spir64_x86_64. With this we were able to compile in 32 minutes and run all the tests in 8 minutes on a i9-12900K. I think it would take a bit longer with the current GitHub runner though.

For the portFFT backend you might as well directly set the target to spir64_x86_64 and enable more tests if that's ok.

We'll try to improve the documentation for this. We need to find a meaningful default behavior while still letting the user tune for more specific use-cases.

@rscohn2
Copy link
Member Author

rscohn2 commented May 23, 2024

The github runner has 2 cores/4 threads, but I think we will have more cores available soon. For the 8 minute test, were they parallel or serial? @dnhsieh-intel prefers serial tests because the logs are easier to read.

@Rbiessy
Copy link
Contributor

Rbiessy commented May 23, 2024

We just ran ctest so the tests should have been run serially when we measured 8 minutes.

@rscohn2
Copy link
Member Author

rscohn2 commented May 23, 2024

@Rbiessy
Copy link
Contributor

Rbiessy commented May 24, 2024

Right, that makes sense actually portFFT is using spec constants. Let's revert the target to spir64 for portFFT then.

@rscohn2
Copy link
Member Author

rscohn2 commented May 26, 2024

With AOT, portblas takes 1.5 hours to compile and 5 minutes to run. Runners only have 2 cores, but we are getting runners with more cores so parallel build will be faster. 2nd longest is mkl blas, which takes 1 hour with most time spent in compile.
portfft is back to not using aot and running a couple tests. Everything passes and runs in an acceptable amount of time.

Copy link
Contributor

@Rbiessy Rbiessy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a minor comment, looks good to me otherwise, thanks

.github/workflows/pr.yml Outdated Show resolved Hide resolved
Co-authored-by: Romain Biessy <[email protected]>
@rscohn2 rscohn2 merged commit 6d6a7b7 into uxlfoundation:develop May 30, 2024
6 checks passed
Rbiessy added a commit to Rbiessy/oneMath that referenced this pull request Jun 7, 2024
hjabird added a commit that referenced this pull request Jun 17, 2024
…get (#511)

* The portFFT backends targets Nvidia by default
* In some situations, this causes a failure whilst compiling (CUDA might not be installed)
* This checks the target is supported before use

This solves the issue in PR #495
normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024
normallytangent pushed a commit to normallytangent/oneMKL that referenced this pull request Aug 6, 2024
…get (uxlfoundation#511)

* The portFFT backends targets Nvidia by default
* In some situations, this causes a failure whilst compiling (CUDA might not be installed)
* This checks the target is supported before use

This solves the issue in PR uxlfoundation#495
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants