-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binutils 2.29 followup #288
Conversation
This is needed to test more possible combinations, as apparently, failed assertion already on the client side will yield no visibility into how "interlib" would react, for instance. Signed-off-by: Jan Pokorný <[email protected]>
Signed-off-by: Jan Pokorný <[email protected]>
This is to cut down some time when only particular item of the test matrix is of interest, such as in ./log_test_mock.sh -v -ncl -ncsc -tqb+_il+_c+ <srpm> command that can be used to highlight the issue with binutils 2.29+ compatibility fix making for a new issue when all of following are satisfied: - there are three link participants (reusing terminology establish in the patches constituting said fix) client program > intermediate library > libqb (client is not necessarily directly dependent on libqb) - client program is build-time linked using ld/binutils either < or >= 2.29 but with libqb 1.0.3 (containing the said fix, i.e., new linker script is in effect for building the client program) - client program does not use any logging at all - intermediate library triggers some checks when gets loaded, through usage of QB_LOG_INIT_DATA macro from qb/qblog.h, which are meant to check itself, but alas, the symbols to check on behalf of this library get run-time overridden with symbols of the client program, which only do exist because the libqb.so linker script (intended for other effect) made it happen despite the section is empty So far, log_test_mock.sh was only thoroughly run without any log-suppressing options (e.g. -ncl), which together with some extra linker/build flags (distro specific?) in case of compiling pacemaker (exercising the same scheme with some CLI utilities) masked this problem. [this all is still to be verified] This asks for a follow-up solution, and this commit is a good start as running whole matrix would keep increasing turnaround time needlessly (the new skipping harness is not ideal, but serves the purpose). Signed-off-by: Jan Pokorný <[email protected]>
6c58f2b
to
bc6a8c3
Compare
Also make the in-line progress show the actual output of the current test matrix item along with pre-existing syslog one for that very step. Signed-off-by: Jan Pokorný <[email protected]>
bc6a8c3
to
03497c7
Compare
Current patchset (ending with 03497c7) looks good to me so far, but Regarding testing, I realized that protected visibility of the section boundary |
Mainly to avoid duplicate runs because of those details getting lost as used to happen. Signed-off-by: Jan Pokorný <[email protected]>
... as they can be neglected without any loss of check preciseness, and ignoring them makes for easier (isomorphic) changes in the future. Signed-off-by: Jan Pokorný <[email protected]>
Consequently, turn original catch-all reliance on standard headers through "os_base.h" facade include into regular demand-driven ones, as the former had a side-effect of spoiling the included system headers with _GNU_SOURCE macro (through internal "config.h" unless HAVE_CONFIG_H is defined), which hence could not be undone by the means of command-line requested clearance of such macro, colliding hence with the objective of this change. Note that mere undefining of HAVE_CONFIG_H alone would not work, as the selected standard headers have to be included the alternative way, i.e. directly, so the compatibility of log_test_mock.sh with now-outdated SRPM would not be preserved for "POSIX only" test run, anyway (and in turn, "-UHAVE_CONFIG_H" is omitted here), hence the test to prevent this confusion is also added. Signed-off-by: Jan Pokorný <[email protected]>
Thanks for all the effort on this! Tried libqb-1.0.3+20171226.6d62b64 plus 03497c7 with binutils-2.26.1. pacemaker-1.1.18+20180112.bd7115b8c compiles now :)
--- No idea what "bad section" indicates. I see the similar with the things linked with binutils-2.29.1 too. Fencing regression tests (which reply on logging) all passed. Pacemaker seems to work well, but every time on shutting down cluster services, crmd encounters:
With the same sources but linked with binutils-2.29.1, crmd encounters the similar on shutting down and has an additional error from crm_glib_handler:
The relevant code in pacemaker/lib/common/logging.c: crm_glib_handler()
Not sure if it's necessarily related this change. |
No idea what's going on there. I'm particularly concerned that with 2.26.1 crm_glib_handler() appears not to be called for a glib error. I wonder if a similar linker issue is affecting glib itself. Most likely the particular glib error involves calling g_source_remove() on a timer. Enabling trace logs would probably narrow down which one and where. Since it's at shutdown, maybe it's the throttle timer. It might be useful to try with different glib versions and/or glib+linker combinations to see if some behavior changed there rather than in the linker, libqb, or pacemaker. But that's more work than I want to put into this :) |
Well, without 03497c7, crmd encounters the same. |
On 18/01/18 15:06 +0000, Gao,Yan wrote:
Thanks for all the effort on this!
Tried libqb-1.0.3+20171226.6d62b64 plus
03497c7
with binutils-2.26.1.
pacemaker-1.1.18+20180112.bd7115b8c compiles now :)
Perfect!
```
# readelf -s /usr/lib64/libqb.so.0.19.0 | grep verbose
162: 0000000000228058 0 NOTYPE GLOBAL DEFAULT 26 __stop___verbose
239: 0000000000226870 0 NOTYPE GLOBAL DEFAULT 26 __start___verbose
# readelf -s /usr/lib/pacemaker/stonithd | grep verbose
163: 0000000000a1ddd0 0 NOTYPE GLOBAL DEFAULT bad section index[ 36] __start___verbose
166: 0000000000a1f270 0 NOTYPE GLOBAL DEFAULT bad section index[ 36] __stop___verbose
```
--- No idea what it's supposed to mean ;) I think I've seen the
similar with the things linked with binutils-2.29.1 too.
Haven't seen this so far, honestly.
Fencing regression tests (which reply on logging) all passed.
Pacemaker seems to work well, but every time on shutting down cluster services, crmd encounters:
```
Jan 18 15:17:46 f171 crmd[30523]: crit: GLib: Source ID 22 was not found when attempting to remove it
```
With the same sources but linked with binutils-2.29.1, it looks
similar and has an additional error from crm_glib_handler:
```
Jan 18 15:41:13 f58.suse.de crmd[10048]: error: crm_glib_handler: Forked child 10080 to record non-fatal assert at logging.c:73 : Source ID 23 was not found when attempting to remove it
Jan 18 15:41:13 f58.suse.de crmd[10048]: crit: GLib: Source ID 23 was not found when attempting to remove it
```
This means mixing static section callsite handling with on-the-fly
on-heap tracking, which I haven't tested but I guess I should extend
the test for this.
The relevant code in pacemaker/lib/common/logging.c: crm_glib_handler()
```
63 glib_cs = qb_log_callsite_get(__FUNCTION__, __FILE__, "glib-handler", LOG_DEBUG, __LINE__, c
rm_trace_nonlog);g
64 }g
65g
66g
67 switch (msg_level) {g
68 case G_LOG_LEVEL_CRITICAL:g
69 log_level = LOG_CRIT;g
70g
71 if (crm_is_callsite_active(glib_cs, LOG_DEBUG, 0) == FALSE) {g
72 /* log and record how we got here */g
73 crm_abort(__FILE__, __FUNCTION__, __LINE__, message, TRUE, TRUE);g
74 }g
75 break;g
76g
```
Not sure if it's necessarily related this change.
Yeah, strange. Let me extend lock_test_mock.sh + surrounded files
first if it shows any new regressions wrt. on-the-fly callsites.
…--
Jan (Poki)
|
03497c7
to
3143753
Compare
In the meantime, current patchset up to 3143753 is just slightly refined version |
FYI, the issue itself is due to double-removal of glib source: |
It turned out that while log_test_mock.sh already provided ways for extra variations in the composite body of the program to inspect for a working logging, "omit logging at the target client side completely, leave it along with the respective self-check just at the intermediate library this program uses" one hadn't apparently been exercised with binutils < 2.29.1 (which is the only one OK from the get-go since it arranges boundary denoting symbols for orphan sections as protected). This made the pacemaker building fail half the way with libqb 1.0.3 and binutils < 2.29.1 because the built executables are instantly run as to extract their help screen[1], and they represent the said pain pattern: - program not using libqb log subsystem directly, but - linked to dynamic library that itself does + it also utilizes QB_LOG_INIT_DATA macro, which - upon loading the executable and the the linked dependencies prior to proper run invokes the checks, where - one in particular tries to assess whether the direct-access boundary symbols are not aliasing (equal), but - because with standard visibility, symbols from the program take a precedence, actually its symbols are prioritized (instead of the ones pertaining to the library layer as expected, remember that that check was intended as self-contained, targeting just it's own participating part in the link scheme), and - since previous fix to accommodate binutils 2.29+ fancy/new handling of the boundary symbols they are not subject to any sort of garbage collection (no symbols were emitted for a section that was not known because there were no instruction it should contain anything, until we stuck those symbols in place by force by the means of linker script to that effect), these boundary symbols indeed occur in this end program having no callsite data (see first point), meaning that start and stop address indications for the section equals, hence - the said assertion triggers (despite it should not = false positive), implying breach of self-containment of the check (which naturally cannot be responsible for any other linked part) This clearly means that "QB_ATTR_SECTION_START != QB_ATTR_SECTION_STOP" cannot be used unconditionally. However, it would left the software using logging along with QB_LOG_INIT_DATA macro (which was already highlighted as its vital part in order to avoid silent logging malfunction) that doesn't utilize _GNU_SOURCE macro (unlocking some GNU extensions on top of basic POSIX interface) short of the main checks (because they are conditionalized for requiring said extensions), and libqb never required the client side to define that macro nor it dared to define it behind user's back as it can have unexpected consequences. Luckily, the first paragraph carries the key: protected symbols behave exactly as required here[2]: > Protected visibility is like default visibility except that it > indicates that references within the defining module bind to the > definition in that module. That is, the declared entity cannot be > overridden by another module. So we just move said comparison to "!defined(_GNU_SOURCE)" conditional branch within qblog.h, and to cater binutils prior to 2.29.1 (which has it like that by default as mentioned), we also mark the declarations of the boundary symbols, likewise conditionally, with protected visilibity (there appears to be no way for that in the linker script). But that would be too easy as once again, 2.29 linker begs to differ and these two "!defined(_GNU_SOURCE)" measures will actually do more harm than good when in the mix. Hence, new QB_LD_2_29 macro is proclaimed the kill-switch for that case, and the user becomes responsible to either define it when building with this 2.29 troublemaker (as a recap, 2.29.1 is fine from this perspective), or to start using _GNU_SOURCE. One more question mark remains, though: are not the QB_LOG_INIT_DATA's checks in _GNU_SOURCE case weaker now than used to be? The anwer is: no, as long as looking at the symbols through dlopen'd particular shared object will contain no overrides from either already loaded dependencies (if caching is so aggressive to detect it's already in loaded in caller's context) or from recursive load. Currently, it doesn't seem to be the case, but it may depend on the implementation of dynamic linking library or the toolchain. If this observation is proved to be wrong, the solution may be as simple as dropping !defined(_GNU_C) condition from the guard of making the boundary symbols with protected visibility -- it is not done now also with respect to non-gcc compilers which may not recognize that. Last but not least, dl* calls in the QB_LOG_INIT_DATA's checks are themselves subject to scrutiny now: should they fail unexpectedly, the run is terminated just as well. This led to discovery of the issue masked so far, boiling down to unability to dlopen executable (as opposed to shared object)[3]. It did not matter before, because of rather best-effort, optimistic approach (perform the final check only if precondition steps succeeded), but now, we have to add an extra stipulation that this case won't lead to premature termination -- it just happens to sometimes be the case, and there's not much we can do to detect run down on the level of the executable proactively, at least not based on the brief experiments. [1] ClusterLabs#266 (comment) [2] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-visibility-function-attribute [3] https://sourceware.org/bugzilla/show_bug.cgi?id=11754 Signed-off-by: Jan Pokorný <[email protected]>
It's not expected that basic prerequisite for using callsite section, notably section attribute, would work without the basic "GNU toolchain" support declared during the build. Signed-off-by: Jan Pokorný <[email protected]>
3143753
to
310882c
Compare
Why does libqb go through all of these acrobatics at all? |
On 12/02/18 11:22 +0000, Kristoffer Grönlund wrote:
Why does libqb go through all of these acrobatics at all?
Original linker story:
20246f5
Why libqb involves self-checking:
cdb0269
I wish things hadn't gone south in binutils, to start with.
But it happened and the fallout needs to be dealt with.
…--
Jan (Poki)
|
I mean that the thing I'm trying to understand is why libqb tries to use callsite sections in order to track output filtering in the first place. To be honest, binutils doesn't look like the party at fault here, rather libqb is doing something it probably shouldn't have been doing in the first place. Was it just to avoid requiring matching push/pop calls for filter manipulation? |
If that's the case, then using the GCC cleanup attribute to pop the filter option is probably a better solution. |
I am not the original author but I can see that in part the intention With this arrangement, you effectively offload page-fault prone part I can only suggest reading more on the topic, the linked changes and And no, I cannot see any way how Note also that callsites being tracked systemically at query-able form |
Yeah, now that I've looked at it more I see that the purpose is not entry/exit management (which |
Closed as not needed any more |
see also #266 (comment)