Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing dependencies to prevent add_custom_command race #457

Merged
merged 1 commit into from
Jan 5, 2024

Conversation

g2flyer
Copy link
Contributor

@g2flyer g2flyer commented Jan 3, 2024

our use add_custom_command lead to a race and same failure due to parallel invocation of common/crypto/verify_ias_report/build_ias_certificates_cpp.sh. Issue is that add_custom_command generated files have to be explicitly serialized in dependency graph which we only partially did by defining custom target but we didn't use it so ucrypto and tcrypto defines still created parallel dependencies via IAS_SOURCES. Not clear why (a) i was the first to seemingly trigger that and (b) it also seemed to depend on how i invoked (e.g., output redirect seemed to play a role but also which target and whether this triggered an explicit docker build in our makefile or an implicit one via invoking docker-compose ...)

@g2flyer g2flyer requested review from cmickeyb and bvavala January 3, 2024 22:54
@@ -74,6 +75,7 @@ ENDIF()
################################################################################
IF (BUILD_UNTRUSTED)
ADD_LIBRARY(${U_CRYPTO_LIB_NAME} STATIC ${PROJECT_HEADERS} ${PROJECT_SOURCES} ${IAS_HEADERS} ${IAS_SOURCES})
ADD_DEPENDENCIES(${U_CRYPTO_LIB_NAME} generate-ias-files)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we believe this fixes the problem? its very timing dependent. can you verify that only one instance is run?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i could consistently produce concurrent colliding invocations before the fix and it consistently works now. More importanly, the issue of conflict is described in docu and that docu also outlines the strategy of defining a separate add_custom_target (and using it as the sole (explicit) dependency)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Given the timing nature of the problem, "it fixes it on my machine" is a bit weak for evidence of a fix rather than simply shifting the timing. The documentation looks promising. Regardless... this looks like an improvement. Another critical problem that will continue to cause problem, however, is in the way the file is created. Rather than using an atomic swap, we write directly to the file. The result is that one thread has a reasonable chance of corrupting the file. Will be pushing a separate PR that attempts to address this problem and remove the entire templating approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still maintain from what i learned debugging this problem, the reproducible failure of dual-cmake tasks (as "threatened" by the documentation), the complete disappearance of parallel identical tasks in repeated experiments after following the mitigation strategy, i'm convinced that an atomic swap is not necessary in "normal" builds. That said, i guess of course there could be other imaginable cases where somehow the script is invoked twice at the same time, so making it more atomic doesn't hurt (for SIM mode which caused my problem, the old script though was probably with cp reasonably atomic, the issue there was more that it would have required an -f to deal with another script already having run?)

Copy link
Contributor

@cmickeyb cmickeyb Jan 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've had this problem. fixed it. had it come back when some other changes happened. the fact that it was repeatable in ONE situation does not mean that fixing that ONE situation fixed all situations. and... mv is definitely atomic (its a single syscall) cp is NOT atomic (its multiple reads/writes) which is why the file corruption happens. and worse if we're in HW mode we are piping stdout into the file which is definitely open to corruption.

and to be clear... i'm fine with this change. it will certainly make the situation better & may actually ensure that there is only one instance triggered (which, i think we can only search the verbose cmake logs to verify). regardless... the scripts used to generate the file through a template are not the best way to incorporate a cert in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we've had this problem. fixed it. had it come back when some other changes happened. the fact that it was repeatable in ONE situation does not mean that fixing that ONE situation fixed all situations.

If we would add IAS_SOURCES to another library or alike while forgetting the explicit dependency to generate-ias-files indeed the issue would re-occur

and... mv is definitely atomic (its a single syscall) cp is NOT atomic (its multiple reads/writes) which is why the file corruption happens. and worse if we're in HW mode we are piping stdout into the file which is definitely open to corruption.

i was just referring to that in bash without -f both cp and mv complained if the target already existed when i did it manually ...

and to be clear... i'm fine with this change. it will certainly make the situation better & may actually ensure that there is only one instance triggered (which, i think we can only search the verbose cmake logs to verify). regardless... the scripts used to generate the file through a template are not the best way to incorporate a cert in the code.

@cmickeyb cmickeyb merged commit 88257d7 into hyperledger-labs:main Jan 5, 2024
4 checks passed
@g2flyer g2flyer deleted the msteiner.cmake-unparallel branch January 29, 2024 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants