Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crashed with std::runtime_error - bad file descriptor #1157

Open
tronghung279 opened this issue Aug 3, 2023 · 11 comments
Open

Crashed with std::runtime_error - bad file descriptor #1157

tronghung279 opened this issue Aug 3, 2023 · 11 comments

Comments

@tronghung279
Copy link

tronghung279 commented Aug 3, 2023

My application use Pistache::Http::Endpoint::serveThreaded and got crashed with message
terminate called after throwing an instance of 'std::runtime_error' what(): Bad file descriptor

I use 2 threads for pistache as the following

m_httpEndpoint = std::make_shared<Pistache::Http::Endpoint>(addr);
m_router = std::make_shared<Pistache::Rest::Router>();
m_router->addCustomHandler(Pistache::Rest::Routes::bind(&EventListenerServer::defaultRequestHandling, this));
auto opts = Pistache::Http::Endpoint::options().threads(2);
opts.flags(Pistache::Tcp::Options::ReuseAddr);
opts.maxPayload(32768);
m_httpEndpoint->init(opts);
Pistache::Rest::Routes::Post(*m_router, std::string("/" + m_path), Pistache::Rest::Routes::bind(&EventListenerServer::eventRequest, this));
m_httpEndpoint->setHandler(m_router->handler());
m_httpEndpoint->serveThreaded();

Coredump log

#0 0xa4adc3a0 in epoll_wait () from /home/docker/development/projects/coredump/libs/libc.so.6
#1 0xa3d65de4 in Pistache::Polling::Epoll::poll(std::vector<Pistache::Polling::Event, std::allocator<Pistache::Polling::Event> >&, std::chrono::duration<long long, std::ratio<1ll, 1000ll> >) const () from /home/docker/development/projects/coredump/libs/libpistache.so.0
#2 0xa3d82610 in Pistache::Tcp::Listener::run() () from /home/docker/development/projects/coredump/libs/libpistache.so.0
#3 0xa4c1a44c in execute_native_thread_routine () from /home/docker/development/projects/coredump/libs/libstdc++.so.6
#4 0xa52a6d94 in ?? () from /home/docker/development/projects/coredump/libs/libpthread.so.0
#5 0xa4adbf68 in ?? () from /home/docker/development/projects/coredump/libs/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Take a look into Pistache::Tcp::Listener::run I see it throws a ServerError when handling EBADF, which then print the same crash message.
Is it possible to not throw the exception? Or could you guide me how to catch and handle it effectively? I see it breaks the whole run loop on exception but I don't want it.

Thank you

@kiplingw
Copy link
Member

kiplingw commented Aug 3, 2023

Hey @tronghung279. What is your RLIMIT_NOFILE set to and how many logical cores does your machine have? Pistache scales file descriptors at around 6N + 5, where N is the number of threads.

@tronghung279
Copy link
Author

tronghung279 commented Aug 4, 2023

Hey @tronghung279. What is your RLIMIT_NOFILE set to and how many logical cores does your machine have? Pistache scales file descriptors at around 6N + 5, where N is the number of threads.

Thank you @kiplingw for quick response.
My machine has 1 core and RLIMIT_NOFILE is 1024

nproc
1
ulimit -n
1024

@kiplingw
Copy link
Member

kiplingw commented Aug 4, 2023

Hmm, probably not file descriptor exhaustion then if only two service threads.

@tronghung279
Copy link
Author

Yes I think. I have no idea why it raises EBADF when accept the connection. :(. But should we handle the error instead of throwing?

@kiplingw
Copy link
Member

kiplingw commented Aug 4, 2023

Come to think of it, EBADF I don't think is generated on file descriptor exhaustion. It sounds like one is being used in an invalid state (e.g. after it's already been closed).

Are you using the binary package from the PPA? Can you show us your endpoint handler?

@tronghung279
Copy link
Author

tronghung279 commented Aug 6, 2023

Hi @kiplingw
I'm using it in an embedded system so I build it from source at commit f5b780f.
Unfortunately I'm not able to share source code.
But I see another potential case inside _reactor.run(), I see that it might call to void Transport::handleIncoming(const std::shared_ptr<Peer>& peer) and here if recv returns EBADFD, it throws std::runtime_error(strerror(errno)).. The error is never handled. I think it should handle disconnection in this case

@kiplingw
Copy link
Member

kiplingw commented Aug 6, 2023

What distro are you running on the embedded system?

@tronghung279
Copy link
Author

Hi @kiplingw
We use Yocto (with some customizations).

@kiplingw
Copy link
Member

kiplingw commented Aug 7, 2023

Hmm in that case you will have to build from source, as you did. My guess is you might be doing something else that's corrupting memory.

@tronghung279
Copy link
Author

Not yet figure out the reason but when I use thread Pistache::Http::Endpoint::options().threads(1); the issue doesn't happen. :D. Do you have any idea?

@kiplingw
Copy link
Member

kiplingw commented Aug 9, 2023

I'm not an strace(1) expert, but I suspect @dennisjenkins75 might give some suggestions on using it to figure out which system call is creating the EBADFD.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants