Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: sbd-inquisitor,sbd-md: execute commands with sub-processes for respective devices #148

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gao-yan
Copy link
Member

@gao-yan gao-yan commented Aug 1, 2022

This is an universal solution to prevent fencing from hanging on
silently blocked devices as originally brought up from
#119.

@gao-yan
Copy link
Member Author

gao-yan commented Aug 1, 2022

These are some points I can think of so far:

  • For now it's only done for dump. I'm planning to do this also for list and message commands which are comonly use in the sbd fence agents.

  • If some of the "sub-process" code looks familiar to you, indeed it's mainly borrowed from pacemaker/lib/services/services_linux.c :-)

  • I didn't choose signalfd mechanism for the synchronous way, since it doesn't seem universal for all operating systems. OTOH the "self-pipe implementation" is universal and seems to work well. I wouldn't tend to bloat the implementation too much.

  • Handling of error codes including logging of errors in the "sub-process" code still keeps pacemaker's conventions to be as user-friendly/clear/consistent as possible. It doesn't require anything more than libcrmcommon anyway.

  • For now both synchronous and asynchronous solutions are included. I've been thinking there might be use cases where one of the solutions could be more suitable than the other. And synchronous way is the default for now. Of course that can be changed as well.

  • Async IO timeout (-I) is used for the timeout of list servant. I'm planning to use it also also for dump servant, but msgwait as the timeout for message servant. I wouldn't prefer to introduce separate timeouts for the purpose to confuse users even more :-)

  • I've only done some basic test. I haven't I tried it with a silently blocked device yet. It's supposed to work theoretically :) Hopefully certain regression tests for this can be somehow figured out and added.

  • There's a static mainloop variable in sbd-pacemaker.c. Not yet, but probably this extern one could be reused in there as well.

Any ideas/suggestions are appreciated. And no hurry.

@gao-yan gao-yan force-pushed the commands-sub-processes branch from 8d85762 to bbb3cc9 Compare August 1, 2022 10:38
@wenningerk
Copy link

Thanks! Give me a bit to have a closer look ...

…es for respective devices

This is an universal solution to prevent fencing from hanging on
silently blocked devices as originally brought up from
ClusterLabs#119.
@gao-yan gao-yan force-pushed the commands-sub-processes branch from bbb3cc9 to 1adc18d Compare August 4, 2022 05:18
@gao-yan
Copy link
Member Author

gao-yan commented Aug 4, 2022

Resolved an issue found by coverity.

BTW, I'll be AFK for 4 weeks. So take your time :-) Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants