Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

velodyne_hw_ros_wrapper_node dies sometimes when launching sensors #181

Open
NilaySener opened this issue Jul 31, 2024 · 8 comments
Open

Comments

@NilaySener
Copy link

NilaySener commented Jul 31, 2024

Description

While running the Nebula driver on Leo Drive's Autonomous Test Vehicle, which is equipped with 4 Velodyne VLP-16 and 1 Velodyne VLS-128 sensors, the velodyne_hw_ros_wrapper_node dies randomly.

Expected Behavior

All LIDAR sensors (4 x Velodyne VLP-16 and 1 x Velodyne VLS-128) should publish ROS2 messages consistently and reliably during the operation of the Nebula driver on the Autonomous Test Vehicle.

Actual Behavior

After the lidars are launched, either all lidars are launched without any problems or some of the lidar component containers dies randomly. You can find some of the test output of node failures mentioned in the output below:

  • Test 1: The Velodyne VLP-16 node is dying
    1722353283.5760190 [component_container-7] VelodyneHwInterface::StringCallback: {"volt_temp":{"bot":{"i_out":2099,"pwr_1_2v":986,"lm20_temp":1110,"pwr_5v":2065,"pwr_2_5v":2048,"pwr_3_3v":2706,"pwr_v_in":936,"pwr_1_25v":0},"top":{"hv":2685,"ad_temp":614,"lm20_temp":1099,"pwr_5v":2070,"pwr_2_5v":2047,"pwr_3_3v":2690,"pwr_5v_raw":2182,"pwr_vccint":974}},"vhv":353,"adc_nf":[14],"adc_stats":[{"mean":14.3,"stddev":0.578}],"ixe":1}
    1722353283.5872929 [component_container-7] VelodyneHwInterface::StringCallback: {"gps":{"pps_state":"Locked","position":"41 01.28703566N 028 53.20751111"},"motor":{"state":"On","rpm":602,"lock":"On","phase":26967},"laser":{"state":"On"}}
    1722353283.5899334 [component_container-7] �[0m[INFO 1722353283.589781410] [sensing.lidar.middle_right.velodyne_hw_interface_ros_wrapper_node]: UDP Driver Started (VelodyneHwInterfaceRosWrapper() at /home/golf/projects/autoware.golf.ups/src/sensor_component/external/nebula/nebula_ros/src/velodyne/velodyne_hw_interface_ros_wrapper.cpp:51)
    1722353284.6931584 [ERROR] [component_container-7]: process has died [pid 147464, exit code -11, cmd '/opt/ros/humble/lib/rclcpp_components/component_container --ros-args -r __node:=pointcloud_container -r __ns:=/sensing/lidar/middle_right/pointcloud_preprocessor -p use_sim_time:=False -p wheel_radius:=0.315 -p wheel_width:=0.1 -p wheel_base:=2.64 -p wheel_tread:=1.75 -p front_overhang:=0.99 -p rear_overhang:=0.81 -p left_overhang:=0.14 -p right_overhang:=0.14 -p vehicle_height:=1.86 -p max_steer_angle:=0.6105'].
    
    
    
  • Test 2: The Velodyne VLS-128 node is dying
    1722351592.2855313 [component_container_mt-3] VelodyneHwInterface::StringCallback: {"volt_temp":{"bot":{"pwr_1_0v":1646,"pwr_1_1v":1774,"pwr_1_2v":1966,"pwr_2_5v":4080,"lm20_temp":1047,"valid":true},"top":{"hv":2078,"ad_temp":603,"lm20_temp":1064,"pwr_5v":2051,"pwr_2_5v":2065,"pwr_3_3v":2739,"pwr_raw":1558,"pwr_vccint":607}},"ixe":1}
    1722351592.7853172 [component_container_mt-3] expired...
    1722351592.7855875 [component_container_mt-3] asyncOnConnect: Operation canceled
    1722351592.7884018 [component_container_mt-3] �[0m[INFO 1722351592.788190431]
    1722351592.7895777 [component_container_mt-3] *** Aborted at 1722351592 (unix time) try "date -d @1722351592" if you are using GNU date ***
    1722351592.7930715 [component_container_mt-3] PC: @                0x0 (unknown
    1722351592.7945938 [component_container_mt-3] *** SIGSEGV (@0x0) received by PID 58105 (TID 0x7ffba97da640) from PID 0; stack trace: ***
    1722351592.7977173 [component_container_mt-3]     @     0x7ffbe006e006 google::(anonymous namespace)::FailureSignalHandler()
    1722351592.7981749 [component_container_mt-3]     @     0x7ffbe4c42520 (unknown)
    1722351677.3886361 [component_container_mt-3]     @                0x0 (unknown)
    1722351593.8865998 [ERROR] [component_container_mt-3]: process has died [pid 58105, exit code -11, cmd '/opt/ros/humble/lib/rclcpp_components/component_container_mt --ros-args -r __node:=pointcloud_container -r __ns:=/sensing/lidar/top/pointcloud_preprocessor -p use_sim_time:=False -p wheel_radius:=0.315 -p wheel_width:=0.1 -p wheel_base:=2.64 -p wheel_tread:=1.75 -p front_overhang:=0.99 -p rear_overhang:=0.81 -p left_overhang:=0.14 -p right_overhang:=0.14 -p vehicle_height:=1.86 -p max_steer_angle:=0.6105'].
    
    

Complete Log Files

If you would like to examine the given scenarios in more detail, you can access the launch logs of the scenarios from the links below:

Test1

Test2

Additional Information

Please let me know if additional information is required or if there are any specific tests that should be performed to help identify the root cause of this issue.

@knzo25
Copy link
Collaborator

knzo25 commented Aug 1, 2024

@NilaySener
Thanks for raising this issue.We run 1xVLS128 + 2-3 VLP16 adn currently do not face this issue.

Things that could give us insight on this issue?

  • Does this happen if the driver itself is not on the containers? (containers are intrinsically more difficult to debug)
  • Can you compile just the driver in debug or with debug symbols? This will give us more info on where it actually dies
  • If this does not happen all the time, would it happen when you replay a rosbag or pcap? (this way we could reproduce it locally)

As a note: I see you are using the GPS's pps right? @drwnz we do not currently use it right?

@NilaySener
Copy link
Author

Hi @knzo25, Thank you for the quick response. Here are the answers to the questions you raised:


Does the issue occur when the driver itself is not in the containers?

Can you compile just the driver with debug symbols?

  • The driver is currently compiled with the debug symbol. It was compiled with the following command:
colcon build --symlink-install --cmake-args -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_EXPORT_COMPILE_COMMANDS=1

Does the issue occur when replaying a ROS bag or pcap file?

  • Here you can downoad the pcap file i recorded with 4 x VLP-16. I did not record the VLS-128 because it was not in the same interface with theese lidars. But in this scenario (4x VP-16) I also observed the launch problem.

Regarding the GPS PPS signal usage

  • Yes, all sensors are fed with GPRMC and PPS signals.

If there is anything I need to provide additional information about, please let me know.

@drwnz
Copy link
Collaborator

drwnz commented Aug 5, 2024

We do use PPS signals to synchronize the LiDAR, but generated from an ECU GPIO rather than from GNSS. However, we don't use GPRMC and timstamping is done from UDP packet header timestamps.
Do you still get the same issue if you remove the HW monitor in the launch?

@knzo25
Copy link
Collaborator

knzo25 commented Aug 5, 2024

@NilaySener
I just tried to reproduce the error with the data and launcher provided, but it works without issues on my end.

My setup:

The logs only tell us that the hw interface dies, but not really where.
Since the errors can be reproduced with isolated examples (no autoware for example), I think you could try with https://github.com/pal-robotics/backward_ros to see if you can get more info for the current problem

@NilaySener
Copy link
Author

Hi, thank you very much for your answers and suggestions.

I will remove the HW monitor from the launch file and share the results.

I also noticed that when the node dies, it only goes into the following callback once.

void VelodyneHwInterfaceRosWrapper::ReceiveScanDataCallback(
std::unique_ptr<velodyne_msgs::msg::VelodyneScan> scan_buffer)
{
// Publish
scan_buffer->header.frame_id = sensor_configuration_.frame_id;
scan_buffer->header.stamp = scan_buffer->packets.front().stamp;
velodyne_scan_pub_->publish(*scan_buffer);
}

As for the pcap file, thank you for testing it @knzo25 but I have a question:

  1. I've had to launch it twenty times or more in a row to reproduce it on the vehicle, have you had the opportunity to repeat it that many times?
  2. I wanted to use this repo to match your testing method, but I don't have permission. When I feed the .pcap file to Nebula using tcpreplay, I encountered a problem. For this reason, I cannot feed Nebula with the .pcap file right now. How can I use the repo you used for Replay?

@NilaySener
Copy link
Author

Hi, is there an update on the issue? I also tested the driver with the .pcap file i shared, the velodyne_hw_ros_wrapper_node still dies after launching repeatedly (about twenty times).

@mojomex
Copy link
Collaborator

mojomex commented Sep 25, 2024

Hi @NilaySener, sorry to keep you waiting.

We merged a big set of new fixes and improvements yesterday, could you try and see if those help (just the newest version of the main branch, or release v0.2.0).

The launch/node structure changed a bit and at this moment I can only refer you to this PR: autowarefoundation/autoware#5275
for the changes necessary. The changes necessary forgolf_sensor_kit_launch should be very similar to the sample_sensor_kit_launch chamges linked in that PR.

We're happy to help if the issue still occurs even with these changes!

@drwnz
Copy link
Collaborator

drwnz commented Dec 3, 2024

@NilaySener was your issue resolved by the above updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants