Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAOS-15596: Update to argobots 1.2 #26

Merged
merged 12 commits into from
Oct 7, 2024

Conversation

knard-intel
Copy link
Contributor

@knard-intel knard-intel commented Aug 28, 2024

Description

  • Update to argobots 1.2
  • Update packaging/
  • Add patch 411e5b3 fixing DAOS-14248
  • Add patch bb0c908 fixing libunwind support
  • remove deprecated patch 7fd1987

brianjmurrell and others added 3 commits June 5, 2024 10:22
Update packaging.

Cease building on CentOS 7.

Test with 'pr' tests.

Skip-PR-comments: true
Skip-list: test_daos_single_rdg_tx:DAOS-14982

Required-githooks: true

Signed-off-by: Brian J. Murrell <[email protected]>
Skip-PR-comments: true
Skip-list: test_daos_single_rdg_tx:DAOS-14982

Required-githooks: true

Signed-off-by: Brian J. Murrell <[email protected]>
Update packaging/
Add patch 411e5b3
remove deprecated patch 7fd1987

Signed-off-by: Cedric Koch-Hofer <[email protected]>
Fix source patching

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@knard-intel
Copy link
Contributor Author

knard-intel commented Aug 30, 2024

The test failure with master is a known issue:

Issue of the release 2.6 are also related to the same tickets: timeout not respected for an not yet defined reason.

@knard-intel knard-intel marked this pull request as ready for review August 30, 2024 07:09
@knard-intel knard-intel requested a review from a team as a code owner August 30, 2024 07:09
@knard-intel knard-intel self-assigned this Aug 30, 2024
Makefile Outdated Show resolved Hide resolved
argobots.spec Show resolved Hide resolved
argobots.spec Show resolved Hide resolved
argobots.spec Outdated Show resolved Hide resolved
debian/changelog Outdated Show resolved Hide resolved
Integrate reviewers comments.

Signed-off-by: Cedric Koch-Hofer <[email protected]>
Remove useless local patch

Signed-off-by: Cedric Koch-Hofer <[email protected]>
Copy link
Contributor

@brianjmurrell brianjmurrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one small change to revert the Makefile changes I think.

Makefile Outdated Show resolved Hide resolved
Remove useless local patch management.

Signed-off-by: Cedric Koch-Hofer <[email protected]>
argobots.spec Outdated Show resolved Hide resolved
Fix invalid patch path.

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@brianjmurrell
Copy link
Contributor

@knard-intel Please add @daos-stack/release-engineering as reviewer when you are ready for this to land.

@knard-intel
Copy link
Contributor Author

knard-intel commented Sep 9, 2024

@brianjmurrell , just discovered this morning a potential regression of the configuration option with argobots v1.2.
Going to check it.
To be more accurate, it seems that the option --enable-stack-unwind is not well managed with this new version.
Indeed, from the build log, the following warning is output:

+ /usr/bin/sed -i.backup -e 's~compiler_flags=$~compiler_flags="-Wl,-z,now -specs=/usr/lib/rpm/redhat/redhat-hardened-ld"~' ./m4/ltmain.sh
+ ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-valgrind --enable-stack-unwind
configure: WARNING: unrecognized options: --enable-stack-unwind

I am able to reproduce it locally.
Note: I found them on working on a different task: integration of the asan lib.

@brianjmurrell
Copy link
Contributor

@brianjmurrell , just discovered this morning a potential regression of the configuration option with argobots v1.2. Going to check it.

Sounds good. If there is a regression, it would be prudent for us to understand how it was not found in testing and rectify that.

Copy link
Contributor

@brianjmurrell brianjmurrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-1 pending @knard-intel's investigation of a potential regression.

@knard-intel knard-intel changed the title Ckochhof/dev/master/daos 15596 DAOS-XXXX: Update to argobots 1.2 Sep 11, 2024
@knard-intel
Copy link
Contributor Author

This PR is blocked until the PR pmodels/argobots#401 has been landed.

Add configure option to enable check of configure step

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@knard-intel
Copy link
Contributor Author

@brianjmurrell, the regression is not easy do identify: it does not affect the general behavior of argbots. From my understanding the unwind lib is a debugging functionality allowing to help for the debugging of argobots crash. The only trace of the regression is a warning displayed during the config step of the argobots building process. At this time, I am still trying to land the fix (which has already been approved).

We should be using --enable-option-checking=fatal in our configure arguments to detect and fail on this kind of thing. Could you temporarily revert your Patch2: and enable this option in argobots.spec and confirm that it would have found this regression?

Yep, I am going to do it. However, I will not be able to look at the result as I have still not received my new laptop.

Done with commit dbcdfee and cddd5b2

@brianjmurrell
Copy link
Contributor

Done with commit dbcdfee and cddd5b2

It works!

[2024-09-17T07:02:38.206Z] + ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-valgrind --enable-stack-unwind --enable-option-checking=fatal
[2024-09-17T07:02:38.206Z] configure: error: unrecognized options: --enable-stack-unwind
[2024-09-17T07:02:38.206Z] error: Bad exit status from /var/tmp/rpm-tmp.o8heuC (%build)
[2024-09-17T07:02:38.206Z]     Bad exit status from /var/tmp/rpm-tmp.o8heuC (%build)
[2024-09-17T07:02:38.206Z] RPM build errors:
[2024-09-17T07:02:38.206Z] Child return code was: 1

Great job @knard-intel.

@knard-intel
Copy link
Contributor Author

Done with commit dbcdfee and cddd5b2

It works!

[2024-09-17T07:02:38.206Z] + ./configure --build=x86_64-redhat-linux-gnu --host=x86_64-redhat-linux-gnu --program-prefix= --disable-dependency-tracking --prefix=/usr --exec-prefix=/usr --bindir=/usr/bin --sbindir=/usr/sbin --sysconfdir=/etc --datadir=/usr/share --includedir=/usr/include --libdir=/usr/lib64 --libexecdir=/usr/libexec --localstatedir=/var --sharedstatedir=/var/lib --mandir=/usr/share/man --infodir=/usr/share/info --enable-valgrind --enable-stack-unwind --enable-option-checking=fatal
[2024-09-17T07:02:38.206Z] configure: error: unrecognized options: --enable-stack-unwind
[2024-09-17T07:02:38.206Z] error: Bad exit status from /var/tmp/rpm-tmp.o8heuC (%build)
[2024-09-17T07:02:38.206Z]     Bad exit status from /var/tmp/rpm-tmp.o8heuC (%build)
[2024-09-17T07:02:38.206Z] RPM build errors:
[2024-09-17T07:02:38.206Z] Child return code was: 1

Great job @knard-intel.

OK, so lets try it without the faulty commit.
🚀

@knard-intel
Copy link
Contributor Author

I am not able to investigate why the CI is failing.
I will do the investigation as soon as I have received my new company laptop.

@knard-intel
Copy link
Contributor Author

I am able again to access the CI.
Restart it as the logs seem to not be available any more.

@knard-intel knard-intel changed the title DAOS-XXXX: Update to argobots 1.2 DAOS-15596: Update to argobots 1.2 Sep 24, 2024
@knard-intel
Copy link
Contributor Author

The only failure is a known issue with argobots stack overflow and UCX:

@knard-intel knard-intel requested review from brianjmurrell and a team September 25, 2024 07:15
@knard-intel
Copy link
Contributor Author

Please @daos-stack/release-engineering could you lend this PR.
From my understanding, the PR daos-stack/daos#15181 should be landed asap after this one has been landed

Copy link
Contributor

@brianjmurrell brianjmurrell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR fixes more than one ticket, please also add Fixes: DAOS-14248 to your commit message so that the commit message makes it clear which additional issues are being fixed (i.e. in addition to the one mentioned in the commit subject).

argobots.spec Outdated
Comment on lines 118 to 120
* Wed Sep 04 2024 Brian J. Murrell <[email protected]> - 1.2-1
- Update to 1.2
- Add patch 411e5b3 fixing DAOS-14248
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please update the datestamp here and add a comment about the 2nd patch that has been added and it's ticket number.

Please also feel free to change the author here to yourself since you did the most significant amount of work.

Copy link
Contributor Author

@knard-intel knard-intel Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add reference to the new patch bb0c908 fixing configuration issue
  • Update author and timestamp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Add reference to the new patch bb0c908 fixing configuration issue
  • Update author and timestamp

Fixed with commit b740042

debian/changelog Outdated Show resolved Hide resolved
Integrate reviewers comments:
- Add reference to the new patch bb0c908 fixing configuration issue
- Update author and timestamp

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@knard-intel
Copy link
Contributor Author

I have restarted the stage which was failing on getting access to external resources.
The second run have succeed :-)

Please @daos-stack/release-engineering could you please land this PR and its associated DAOS one daos-stack/daos#15181

argobots.spec Outdated Show resolved Hide resolved
Integrate reviewers comments:
- Harmonize documentation

Signed-off-by: Cedric Koch-Hofer <[email protected]>
@knard-intel
Copy link
Contributor Author

Please @daos-stack/release-engineering could you please land this PR and its associated DAOS one daos-stack/daos#15181

@brianjmurrell
Copy link
Contributor

Please @daos-stack/release-engineering could you please land this PR

I will go ahead and do that. In the future, you can signal you are ready for landing by simply adding @daos-stack/release-engineering as a reviewer, similar to how you add @daos-stack/daos-gatekeeper to daos requests.

and its associated DAOS one daos-stack/daos#15181

I cannot land that one. You will need to request @daos-stack/daos-gatekeeper for that one. But I have also requested changes on that one so it will need another push before it's ready.

@brianjmurrell brianjmurrell merged commit adc7a1c into master Oct 7, 2024
0 of 2 checks passed
@brianjmurrell brianjmurrell deleted the ckochhof/dev/master/daos-15596 branch October 7, 2024 15:17
@brianjmurrell brianjmurrell mentioned this pull request Oct 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants