-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reader listener hangs in GUI application when samples are present #456
Comments
I'm getting what I think is the same issue. In my case, I'm not using a GUI, the issue seems to occur when calling The issue seems to be due to a To resolve this, perhaps the original lock needs to be unlocked before pending events are processed? Stack trace:
I've included a modified version of the helloWorldPublisher application which reproduces the problem. Note: I've only tested this on windows. #include <cstdlib>
#include <iostream>
#include <chrono>
#include <thread>
/* Include the C++ DDS API. */
#include "dds/dds.hpp"
/* Include data type and specific traits to be used with the C++ DDS API. */
#include "HelloWorldData.hpp"
using namespace org::eclipse::cyclonedds;
class TakeListener : public dds::sub::NoOpDataReaderListener<HelloWorldData::Msg>
{
void on_data_available(dds::sub::DataReader<HelloWorldData::Msg>& reader) override
{
reader.take();
std::cout << "=== Took data." << std::endl;
}
};
int main() {
std::cout << "=== [Publisher] Create writer." << std::endl;
/* First, a domain participant is needed.
* Create one on the default domain. */
dds::domain::DomainParticipant participant(domain::default_id());
/* To publish something, a topic is needed. */
dds::topic::Topic<HelloWorldData::Msg> topic(participant, "HelloWorldData_Msg");
/* A writer also needs a publisher. */
dds::pub::Publisher publisher(participant);
auto data_writer_qos = publisher.default_datawriter_qos();
data_writer_qos << dds::core::policy::Reliability::Reliable();
data_writer_qos << dds::core::policy::Durability::TransientLocal();
data_writer_qos << dds::core::policy::History::KeepLast(1);
/* Now, the writer can be created to publish a HelloWorld message. */
dds::pub::DataWriter<HelloWorldData::Msg> writer(publisher, topic, data_writer_qos);
/* Create a message to write. */
HelloWorldData::Msg msg(1, "Hello World");
/* Write the message. */
std::cout << "=== [Publisher] Write sample." << std::endl;
writer.write(msg);
std::cout << "=== [Publisher] Done." << std::endl;
std::cout << "=== [Subscriber] Create reader." << std::endl;
/* A reader also needs a subscriber. */
dds::sub::Subscriber subscriber(participant);
auto data_reader_qos = subscriber.default_datareader_qos();
data_reader_qos << dds::core::policy::Reliability::Reliable();
data_reader_qos << dds::core::policy::Durability::TransientLocal();
data_reader_qos << dds::core::policy::History::KeepLast(1);
/* Now, the reader can be created to subscribe to a HelloWorld message. */
dds::sub::DataReader<HelloWorldData::Msg> reader(subscriber, topic, data_reader_qos);
TakeListener listener;
reader.listener(&listener, dds::core::status::StatusMask::data_available());
std::cout << "=== Added listener." << std::endl;
while (true)
{
// Wait for data.
}
return EXIT_SUCCESS;
} |
Those two do indeed look very much the same. When I looked at the first stack trace @upstreammuse kindly provided, I had an inkling of what might be going on, and then I read @TimMooreAu excellent and spot-on analysis. So thank you both for reporting it in such detail. Bummer. I am usually pretty good at avoiding deadlocks, but I find it harder to spot them in the C++ binding than in the core, because things are so spread out (it really just follows the pattern set by the specification) and things are tied to Anyway, it is not at all obvious to me that the lock is needed either for setting a listener and for I suspect that if one were to simply delete the That said, in reviewing I overlooked a deadlock, so who's to say my first impressions are right this time? @e-hndrks you were kind of enough to figure out how to invoke the listeners for events that occurred prior to installing them and implementing it. Would you be kind enough to have a look at this and tell me what you make of it? Thanks! |
It seems this deadlock is caused by the different event models used by OpenSplice DDS (on which the original implementation of this C++ API was initially based) and Cyclone DDS, (for which the implementation of this C++ API was later modified). In OpenSplice DDS every event is by definition handled by a thread that runs asynchronously to the thread that sets the listener. However, in Cyclone DDS the thread that sets the listener might also be the thread that handles any pending events. The C++ API should have been modified for this purpose, but apparently the different event models were 'lost in translation'. So how to solve this issue? It makes sense that the 'enabled' flag of the Entity needs to be evaluated within the context of the Entity's mutex, since a disabled Entity might result in in the corresponding Listener not being set on the C-API. This would mean that if somebody would enable a disabled Entity right after checking its enabled flag, no Listener would be set. However, this rationale doesn't seem to be followed in the implementation of the enable() method:
I will take some more time to look into this particular problem and hope to propose a fix before the end of this week. |
I thought about it a bit after I wrote the comment as well and I think it makes the most sense to share my thinking here: with a bit of luck it is useful to @e-hndrks and can save him some time 🙂 The implementation, of, e.g., cyclonedds-cxx/src/ddscxx/src/org/eclipse/cyclonedds/sub/AnyDataReaderDelegate.cpp Lines 357 to 374 in 0ddd39b
dds_take_with_collector will immediately error out if the handle has already been closed.
It looks to me like cyclonedds-cxx/src/ddscxx/src/org/eclipse/cyclonedds/core/ObjectDelegate.cpp Lines 26 to 32 in 0ddd39b
closed flag. On the face of it then, it seems therefore that the call to check() could be removed without doing much harm.
The cyclonedds-cxx/src/ddscxx/include/dds/sub/detail/TDataReaderImpl.hpp Lines 704 to 726 in 0ddd39b
take , it only matters if some state in the C++ binding that take depends on becomes invalid between that close and calling dds_delete to actually delete the data reader.
As far as I can see now, the only other thing at play is the cyclonedds-cxx/src/ddscxx/include/org/eclipse/cyclonedds/sub/AnyDataReaderDelegate.hpp Lines 62 to 72 in 0ddd39b
and that doesn't seem to have any dependency on the contents of the DataReader object.
So my second impression is that removing the lock and the check in |
I think I agree with the analysis of @eboasson . Again, the locking mechanisms used by this C++ API originally came from OpenSplice, and were needed to guard the integrity of the User Layer pointer contained in the C++ object, which otherwise might be dangling when another thread closed the Entity. However, in Cyclone DDS the C++ object doesn't wrap around a pointer but around a handle, whose validity can be separately verified in the handle server. That means we could get rid of the entire 'closed' flag on the C++ object level with respect to the validity of the wrapped C-entity. What is left is the validity checking of the 'enable' flag, which typically should be done inside the underlying C entity. However, the enable functionality hasn't been implemented in the underlying C-API, so it seems a bit messy to have it implemented in the C++ layer on top. I would propose to take the whole 'enable' semantics out of the C++ API so that it behaves identical to C. When it is time to introduce the 'enable' semantics in C, then C++ would automatically benefit from it as well since it is built on top of the C API. Last but not least, the C++ Entity wrapper keeps track of the underlying C-Listener object, which might also need to be protected using a lock on the C++ level. In that particular case I wonder if that is really needed: couldn't we just register the C listener on the C entity, and ask the underlying C entity for its listener when we need it? That would remove the extra duplication we now have on the C++ API, and would remove the need for an Entity lock in C++ altogether. Would you agree with this analysis @eboasson? If you do, I can start working on removing these items from the C++ API. |
Thanks @e-hndrks ! Yep, I agree 🙂 The Cyclone core copies all the listener details into its own memory during entity creation and
should work. There is a small caveat, I just realised it when I read your comment: That's an oversight in the API and/or a bug. There are multiple ways to look at this, and I am not sure yet where I stand. One view is that So, first question: which is the more reasonable view? Second question: is this is an issue for eliminating the C++ listener object? If it is, let's fix the core; if it isn't let's just create an issue for this detail in the core library and deal with it later. |
Hi @TimMooreAu, I am working on a fix along the lines that @eboasson described, but that is quite a big change to make. If you just need a temporary fix that bypasses the problem for now, you could use the following code-change: diff --git a/src/ddscxx/src/org/eclipse/cyclonedds/core/EntityDelegate.cpp b/src/ddscxx/src/org/eclipse/cyclonedds/core/EntityDelegate.cpp
index 284d4f5..f683d93 100644
--- a/src/ddscxx/src/org/eclipse/cyclonedds/core/EntityDelegate.cpp
+++ b/src/ddscxx/src/org/eclipse/cyclonedds/core/EntityDelegate.cpp
@@ -222,8 +222,9 @@ org::eclipse::cyclonedds::core::EntityDelegate::listener_set(
// If entity enabled: set listener on ddsc entity
if (this->enabled_)
{
- dds_return_t ret;
- ret = dds_set_listener(this->ddsc_entity, callbacks);
+ this->unlock();
+ dds_return_t ret = dds_set_listener(this->ddsc_entity, callbacks);\
+ this->lock();
ISOCPP_DDSC_RESULT_CHECK_AND_THROW(ret, "Setting listener failed.");
} I have tried this fix myself, and it seems to solve your problem provisionally. |
I've just into exactly this issue myself, on Ubuntu Noble. Thanks @TimMooreAu for your summary and stack trace - mine at the point of hang is identical. Is a fix still in progress? |
Greetings, Cyclone team. I've hit an issue trying to use CycloneDDS within a GUI application.
I have a basic "hello world" reader and writer working between two processes, and I've been trying to add a basic GUI to the reader as a proof of concept. In the eventual application, I want late-joining clients to be able to pick up all the existing instances and pass the live ones to a GUI. I think that means that I have to create a Listener, connect it up to my GUI objects, and then register the Listener with a DataReader, so that the GUI doesn't miss anything from that first callback. I do realize that I need to handle threading boundaries and memory ownership between the callback and the GUI event loop. But I haven't gotten that far yet...
As soon as I add even a basic GUI object AND have samples present (using Durable and TransientLocal QoS, with the writer process running), the call to register the Listener hangs. If there are no samples, the application starts up fine. Likewise, if I remove the GUI object and have samples present, the application also starts up fine.
At first I assumed it was something funny with my environment, but I've now tried the basic concept with multiple OSes and GUI toolkits, and I keep getting the same results. I've tried Qt6 on MacOS using both Homebrew and Qt's binary installer. I've tried Qt6 and Qt5 on Ubuntu using the system packages. I started thinking it was a Qt issue, but then I was able to also reproduce the problem with wxWidgets on Linux.
In all these cases, the reader process gets hung on a mutex lock, and the call to listener(...) never returns. The callback does get called, however.
I have some minimal examples that I can commit, if that's easier than posting entire sample applications here.
The CycloneDDS and CycloneDDS-CXX unit tests are passing on my builds.
The text was updated successfully, but these errors were encountered: