Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborting Trajectory. Failed to send point (#0): Invalid message (3) : Trajectory start position doesn't match current robot position (3011) #445

Closed
akashjinandra opened this issue Dec 21, 2021 · 59 comments
Labels
more-info-needed Waiting on additional information from poster multi-group Issues related to multi-group support

Comments

@akashjinandra
Copy link

Hello,
My system: ROS Melodic, Ubuntu 18.04, YRC1000 running MotoRos 1.99
I'm running a multi group system with a robot on a track, meaning a group for robot and a group for the track. I am running into this error: Aborting Trajectory. Failed to send point (#0): Invalid message (3) : Trajectory start position doesn't match current robot position (3011)

I am currently on this branch:
#259

The error doesn't seem to happen very frequently. I've read into: #251
and #111
however it doesn't seem like there is a solution to this issue. But I might be getting lost in reading the issues and possible solutions people have tried. Is there any solution/test code I can use to help debug this?

@gavanderhoorn
Copy link
Member

Figuring out what is going wrong is not going to be possible without more information.

I'd suggest making available (or pointing to):

  1. your support package
  2. your MoveIt configuration
  3. example code which shows the problem (ie: generates a JointTrajectory but fails to execute)
  4. a Wireshark capture tracing an unsuccessful execution

PS: this could have been a comment on one of the existing issues. As-is, it seems like it's the same / a similar problem. Opening a new issue increases fragmentation.

@gavanderhoorn gavanderhoorn added the multi-group Issues related to multi-group support label Dec 21, 2021
@akashjinandra
Copy link
Author

Sorry about that, didn't mean to cause fragmentation, as mentioned in #111 if I have a big enough delay between my movements, then the problem doesn't occur, I'm currently trying to implement the checking of the robot status in motion bit.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Dec 21, 2021

Then I'll close this for now assuming the problem is as described in #111 (sending new trajectories before the current one has finished executing).

Note that providing a description of the control flow would have helped to diagnose this (which is what prompted the request for an MWE).

@akashjinandra
Copy link
Author

akashjinandra commented Jan 24, 2022

Hello @gavanderhoorn I am still experiencing this issue. Here are some logs with a python script to reproduce:
motoman_driver_issue.zip

I'm working on getting the moveit config to a MWE for you as well. It is basically just a robot on a track with an external axis.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Jan 26, 2022

We're going to need at least items 1 and 2 before I can help.


Edit: and if you're not using .launch files in your support or MoveIt package to start the driver, we're going to need those as well.

And:

I'm currently trying to implement the checking of the robot status in motion bit.

this is now working?

How are you checking exactly? The code you provided appears to be a copy-paste of a MoveIt tutorial. I don't see any lines which check in_motion in there.

@gavanderhoorn
Copy link
Member

Ping @akashjinandra ?

@gavanderhoorn gavanderhoorn reopened this Jan 28, 2022
@akashjinandra
Copy link
Author

akashjinandra commented Jan 31, 2022

I'm sorry for the delay. Here it is: src.zip after downloading and creating a catkin ws, you should be able to build and run these commands.

Please run roslaunch universal_moveit_config planning_execution.launch mode:=stub for testing in simulation
and
roslaunch universal_moveit_config planning_execution.launch mode:=real for running on real system.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Jan 31, 2022

I just checked the archive you've shared.

It references a robot_hardware_interfaces package, which is not included.

For a multi-group setup, there should be a .yaml file which configures the topic_list parameter. That's present (gp200r_motion_interface.yaml), but that file doesn't appear to be loaded by any of the files in the .zip.

Could you show how the driver is started exactly?


Edit: there is a good chance this is related to what #259 attempts to fix (ie: #111), but it depends on the exact configuration and how things are started.

@gavanderhoorn
Copy link
Member

And could you please also share the console logging output of the driver? I cannot run your real configuration, as I don't have your hw, and the test variant uses ros_control and fake_joint, which doesn't use any of the infrastructure of the driver.

Please make sure to start the driver using the --screen argument to roslaunch.

@gavanderhoorn
Copy link
Member

I've just checked the wireshark traces you've shared in #445 (comment).

While we could still be seeing an instance of #111, at least in real_failure.pcapng I believe MotoROS is doing what it's supposed to do.

From packet 40016 I copied the "last" state of the robot right before a new trajectory starts execution.

From packet 40026 I copied the first Motoman Joint Trajectory Point Full Extended in the new trajectory. The state in this message should correspond to what the current state of the robot is.

Summarising the messages as follows:

current:                  commanded:
  gp0:                      gp0:
    J0:   -1.885846972        J0:   -1.885846972
    J1:   -1.872935772        J1:   -1.872935772
    J2:   -1.037052751        J2:   -1.037052751
    J3:    1.785913348        J3:    1.785913348
    J4:    1.803463221        J4:    1.803463221
    J5:   -0.810656488        J5:   -0.810656488
  gp1:                      gp1:
    J0:    4.357182503        J0:    4.357182503
  gp2:                      gp2:
    J0:    3.141485214        J0:    3.140779495
  gp3:                      gp3:
    J0:    0.179838866        J0:    0.179838866

for all joints current == traj_pt0, except for gp2.J0.

The difference is 3.141485214 - 3.140779495 ~= 0.000705719 rad (approx 0.0404 degrees).

This is likely larger than the max allowed difference which is 30 pulses.

@akashjinandra
Copy link
Author

Hello and thanks for your response. I didn't realize that you would be looking at it during this time(I appreciate it.)
image

It was referencing this launch file, which loads the yaml file which has the topic list parameters
src_fixed_launch.zip

@akashjinandra
Copy link
Author

In our production code we do it in c++ with something like this:
image

I will capture a wireshark log on this, and add the waiting for robot in motion bit to the python script.

@akashjinandra
Copy link
Author

Latest screenshot of the log of the motoman driver:
image

And the log of what happened when we looked at the robot in motion bit. The problem is definitely intermittent, we went through multiple trajectories back to back without issue, and then in the final one we hit an issue.

@akashjinandra
Copy link
Author

Here is a set of logs that are from the past run:
logs_part_1.zip

@akashjinandra
Copy link
Author

failure_2.zip

breaking up cause github has file limit size

@EricMarcil
Copy link
Contributor

If I look at the failure_2.pcapng at the last failure:
Packet 146379 is the failure 3011
It is a response to:
Packet 146350 which is the motion start (Time=0)
Group 1 Position J0: 3.771580935
But current joint feedback:
Packet 146342 (before)
Packet 146372 (after)
Both report the current position as:
Group 1 Position J0: 3.770539284

I have to go back to packet 145667 to find the report of position Group 1 Position J0: 3.771580935
That about 1.2 sec earlier. At that time, I can see that the Group 1 is still moving.
Packet 145668: Status reports that the InMotion flag is True (1).
You need to wait for the InMotion to be False before capturing the position for the start of next motion.

@gavanderhoorn
Copy link
Member

@akashjinandra: could you verify the results of the analysis I did in #445 (comment) and @EricMarcil reports?

I realise it may be inconvenient, but the way MotoROS works, there is a good reason for why it checks current_state == traj_pt0. If there is a difference, you'll get the error you report.

@gavanderhoorn
Copy link
Member

Could I also ask you to please not post screenshots of what is essentially text?

Please just copy-paste it into comments inside a code block.

@gavanderhoorn
Copy link
Member

Latest screenshot of the log of the motoman driver: image

Could you please update your ROSCONSOLE_FORMAT env var to something like this:

export ROSCONSOLE_FORMAT='[${severity}] [${time}] [${node}] [${logger}]: ${message}'

you currently appear to have it at its default, which makes it rather difficult to see which messages are emitted by which nodes / loggers.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Feb 2, 2022

In failure_1.pcapng we see this:

current (pkt 26361):     commanded (pkt 26365):
  gp0:                     gp0:
    J0:   -1.894196153       J0:   -1.894188881    <<--  0.000007272 ( 0.00041 deg)
    J1:   -1.787848830       J1:   -1.787848830
    J2:   -1.366592765       J2:   -1.366592765
    J3:   -0.001671510       J3:   -0.001678563    <<-- -0.000007053 (-0.00040 deg)
    J4:    1.150621653       J4:    1.150621653
    J5:    0.003048580       J5:    0.003048580
  gp1:                     gp1:
    J0:    3.769543171       J0:    3.769543171
  gp2:                     gp2:
    J0:    1.213041306       J0:    1.210648298    <<--  0.002393008 ( 0.137109 deg)
  gp3:                     gp3:
    J0:   -0.179916993       J0:   -0.179916993

Packet 26361 is the last Motoman Joint Feedback Extended before packet 26365, which contains the Motoman Joint Trajectory Point Full Extended (ie: start of the trajectory).

I've marked the joints which show a difference between the current state and the values used in the first trajectory point of the new trajectory.

It's likely gp2.J0 which is again the cause (as in #445 (comment)). The other differences are very small, likely below the threshold.

@gavanderhoorn
Copy link
Member

@akashjinandra: could you perhaps also share a rosbag with the JointState topics, the RobotStatus topics, rosout, rosout_agg and the various FollowJointTrajectoryAction topics?

That might help determine whether there is a problem with the action server reporting motion completion too soon.

@akashjinandra
Copy link
Author

Alright I will work on getting those. Thanks for the help!

@gavanderhoorn
Copy link
Member

Note that so far it looks like the trajectories you are sending are just not starting at current_state. That would not be a problem on the MotoROS side, but with whatever is generating those trajectories.

The reason I ask for additional information / .bags is to make sure motoman_driver and MotoROS do what they are supposed to.

@gavanderhoorn
Copy link
Member

Any update here @akashjinandra ?

@gavanderhoorn gavanderhoorn added the more-info-needed Waiting on additional information from poster label Feb 9, 2022
@akashjinandra
Copy link
Author

Sorry our robot was down for one day, I think it is resolved but I am going to double check with a couple more cycles tomorrow.

@gavanderhoorn gavanderhoorn reopened this Mar 3, 2022
@gavanderhoorn
Copy link
Member

I've taken a look at the two .bags. There are a few things which puzzle me.

According to your earlier descriptions and the files you provided, this system is a multi-group system. The .bag however only contains a single joint_states topic, and a single /arm_controller/follow_joint_trajectory namespace. I would have expected 4 of each of those, 1 per group.

Were the .bags recorded with the same configuration of motoman_driver as was shared previously (in #445 (comment) fi)?

The JointState messages on /joint_states do seem to contain information on all groups.

subset.bag contains 9 trajectory goals. The last one did not execute, as the driver rejected it ("start state does not .."). It's a bit hard to see what's going on due to the missing (and unexpectedly named) topics, but there is something odd.

These are the timestamps of the JointState messages (on /joint_states) just before reception of that trajectory:

track     : 1645720295.662517591
robot     : 1645720296.403789562
gripper j1: 1645720295.579493134
gripper j3: 1645720296.403771242

note: these are the JointState messages most recently published before the trajectory was submitted.

There is some uncertainty, but note how the messages for the track (robot_mount_joint) and the gripper j1 joints (gripper_joint_1) are almost 1 second old.

I can't determine why that is from these .bags, and obviously, if the robot was not in motion during that second, their age should not matter.

If you could supply both a .bag and a Wireshark trace, we can rule some further things out.

I'd also be grateful if you could clear up my confusion about the action server(s) and the configuration of the topics.

@akashjinandra
Copy link
Author

Hello, I'm sorry for the delay in response. We will gather both the logs and ros bags. There are still 4 JointTrajectory Topics for each of the groups, but we only captured one in the rosbag, we will gather all of them in the next one. And I believe we are still using the motoman driver as before. Thanks again for your help.

@akashjinandra
Copy link
Author

Hello @gavanderhoorn sorry for the delay here is the rosbag with all the joint trajectory topics and a wireshark log
failure.zip
.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 17, 2022

I've taken a look, and it seems to be the same issue as earlier. MotoROS seems to be doing what it's supposed to do (ie: refuse to execute motions which don't start at the current state). It's the ROS side which doesn't seem to be doing things correctly.

In the .bag you included, MoveIt also complains about it (at least once):

[1647365689.1] [ERROR] [/move_group] [TrajectoryExecutionManager::validate]: 
Invalid Trajectory: start point deviates from current robot state more than 0.01
joint 'gripper_joint_1': expected: 1.10899, current: 1.123

this is in essence a similar check to what MotoROS does, but more forgiving (0.01 radians instead of a number of encoder pulses). It's probably also related to the problem reported in #451).

After this error, there seems to be one new trajectory submitted, which then passes the MoveIt check, but fails the MotoROS check.

The wireshark trace seems to suggest MotoROS is correct:

Pkt 29204                     Pkt 29185
  Group 0                         Group 0
      Positions                       Positions
          J0:   -2,066520691              J0:   -2,066520691
          J1:   -0,952798724              J1:   -0,952798724
          J2:   -0,334816784              J2:   -0,334816784
          J3:    0,082355373              J3:    0,082355373
          J4:    0,953410983              J4:    0,953410983
          J5:   -0,047742434              J5:   -0,047742434
  Group 1                         Group 1
      Positions                       Positions
          J0:    4,649924278              J0:    4,649930954
  Group 2                         Group 2
      Positions                       Positions
          J0:    1,122996569              J0:    1,108991385    <---
  Group 3                         Group 3
      Positions                       Positions
          J0:   -0,200996101              J0:   -0,200996101

The left packet is the first trajectory point, the right packet is the last joint state packet before the trajectory execution request is submitted.

As we can see, grp2.j0 is different between the two. The SimpleMessage messages show the exact same values as the MoveIt error.

When processing the .bag, for some reason the JointState messages (on the global /joint_states topic) for gripper_joint_1 and robot_mount_joint are really old:

[1647365689.51]: header: 
  seq: 14679
  stamp: 
    secs: 1647365689
    nsecs: 510989038
  frame_id: ''
name: [joint_1_s, joint_2_l, joint_3_u, joint_4_r, joint_5_b, joint_6_t]
position: [-2.0665206909179688, -0.9527987241744995, -0.3348167836666107, 0.08235537260770798, 0.9534109830856323, -0.04774243384599686]
velocity: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
effort: []
---
[1647365689.51]: header: 
  seq: 14676
  stamp: 
    secs: 1647365689
    nsecs: 510952833
  frame_id: ''
name: [gripper_joint_3]
position: [-0.20099610090255737]
velocity: [0.0]
effort: []
---
[1647365688.09]: header: 
  seq: 14469
  stamp: 
    secs: 1647365688
    nsecs:  88279632
  frame_id: ''
name: [gripper_joint_1]
position: [1.1229965686798096]
velocity: [0.030971070751547813]
effort: []
---
[1647365687.76]: header: 
  seq: 14422
  stamp: 
    secs: 1647365687
    nsecs: 756798469
  frame_id: ''
name: [robot_mount_joint]
position: [4.649924278259277]
velocity: [0.0014973023207858205]
effort: []

note the timestamps on the messages for robot_mount_joint and gripper_joint_1: 1647365687.756798469 and 1647365688.88279632 respectively.

That's over 2 seconds difference with the messages for gripper_joint_3 and the main robot JointStates.


Right now, I don't have any hypotheses for why this would happen.

Those messages are all published at the same time, and the Simple Message traffic in the Wireshark capture does not show any such delays.

I feel solving this (ie: figuring out the cause of the delay) would probably make everything work.

@gavanderhoorn
Copy link
Member

rosbag info also shows this:

/gp200r/gp200r_b1_controller/joint_states                     2147 msgs    : sensor_msgs/JointState
/gp200r/gp200r_r1_controller/joint_states                     2146 msgs    : sensor_msgs/JointState
/gp200r/gp200r_s1_controller/joint_states                     2146 msgs    : sensor_msgs/JointState
/gp200r/gp200r_s2_controller/joint_states                     2146 msgs    : sensor_msgs/JointState
/joint_states                                                 4632 msgs    : sensor_msgs/JointState

The total number of messages on /joint_states should be equal to the sum of the messages published on the /gp200r/*/joint_states topics (1 msg per group, per update cycle).

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 21, 2022

@akashjinandra: could you please check kinetic-devel...gavanderhoorn:coalesced_joint_feedback_ex_relay and see whether it resolves your issue?

I've changed two things:

  1. increased the queue sizes of all JointState and related publishers. They were set to 1, which in an ideal system would be a good setting, as we avoid consumers seeing "old" messages. But in situations where there is high system load or slow consumers, and multi-group systems with all groups configured, this can result in lost messages, as there is very little time for those consumers to process the messages (they're essentially published at 4x the base rate of the individual JointState publishers)
  2. the JointState messages published to the global /joint_states topic now contain the coalesced state for all configured joints, across all configured motion groups. The old behaviour was to publish one JointState message per group, per simple message. While JointState consumers are supposed to be able to handle this, in combination with the very low queue size (see previous change), this probably didn't work reliably

Note that since you are using #259, you'll have to rebase that on-top of kinetic-devel...gavanderhoorn:coalesced_joint_feedback_ex_relay. I expect that to not be too difficult, as I believe the changes are orthogonal.


Edit: note that this is all rather ugly. But I want to see whether this helps resolve your issue. I'll work on a nicer way of solving this later.

@akashjinandra
Copy link
Author

Hello @gavanderhoorn, thanks for your response, we will try this new code this coming Wednesday.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 23, 2022

@akashjinandra: could you perhaps test the combination of kinetic-devel...gavanderhoorn:coalesced_joint_feedback_ex_relay and kinetic-devel...gavanderhoorn:rebased_two_arms_on_a_rail?

That would be instead of #259 in your case.

NOTE: this is not a complete solution. There are additional issues to address. I just want to test on more than my own hw config.

And please always keep your -- and your hw's -- safety in mind.

@akashjinandra
Copy link
Author

Hello @gavanderhoorn we will try this today. Thanks for that!

@gavanderhoorn
Copy link
Member

@akashjinandra: did you have a chance to test?

@gavanderhoorn
Copy link
Member

Friendly ping @akashjinandra

@akashjinandra
Copy link
Author

Hello @gavanderhoorn when we merged and ran the system the joint trajectory action server crashed, we are in the process of debugging it.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 29, 2022

Could you tell me what you merged exactly?

Just to reiterate: #259 is not involved at all any more. Creating a new branch (from kinetic-devel) and merging the two branches I mentioned in #445 (comment) into that should work.

(or at least: if I follow 'normal' usage patterns with motoman_driver)

If it doesn't, something is not right.

Could you build motoman_driver with -DCMAKE_BUILD_TYPE=Debug and copy-paste the stack trace here?

@gavanderhoorn
Copy link
Member

Does #487 mean you've got things to work (ie: not crash)?

@akashjinandra
Copy link
Author

Not yet, I'm going to get it tonight, should be within the hour.

@akashjinandra
Copy link
Author

akashjinandra commented Mar 29, 2022

This is our branch: https://github.com/BastianSolutionsRandD/motoman/tree/test_fixes_for_vanderlande

Here is the trace:

SUMMARY
========

PARAMETERS
 * /robot_ip_address: 10.177.82.131
 * /rosdistro: melodic
 * /rosversion: 1.14.12
 * /topic_list: [{'joints': ['joi...

NODES
  /
    joint_state (motoman_driver/robot_state)
    joint_trajectory_action (motoman_driver/motoman_driver_joint_trajectory_action)
    motion_streaming_interface (motoman_driver/motion_streaming_interface)

ROS_MASTER_URI=http://localhost:11311

process[joint_state-1]: started with pid [32936]
process[motion_streaming_interface-2]: started with pid [32937]
[ INFO] [1648580238.505593620]: Added message handler for message type: 0
[ INFO] [1648580238.510165525]: Robot state connecting to IP address: '10.177.82.131:50241'
[ INFO] [1648580238.511373125]: Loading topic list
[ INFO] [1648580238.511540203]: Found 4 topics
[ INFO] [1648580238.511691539]: Topic(state_value): [group:0,joints:{joint_1_s,joint_2_l,joint_3_u,joint_4_r,joint_5_b,joint_6_t},name:gp200r_r1_controller,ns:gp200r]
[ INFO] [1648580238.511827632]: Topic(state_value): [group:1,joints:{robot_mount_joint},name:gp200r_b1_controller,ns:gp200r]
[ INFO] [1648580238.511955905]: Topic(state_value): [group:2,joints:{gripper_joint_1},name:gp200r_s1_controller,ns:gp200r]
[ INFO] [1648580238.512085996]: Topic(state_value): [group:3,joints:{gripper_joint_3},name:gp200r_s2_controller,ns:gp200r]
[ INFO] [1648580238.512218520]: Loading group: [group:0,joints:{joint_1_s,joint_2_l,joint_3_u,joint_4_r,joint_5_b,joint_6_t},name:gp200r_r1_controller,ns:gp200r]
[ INFO] [1648580238.512525670]: Loading group: [group:1,joints:{robot_mount_joint},name:gp200r_b1_controller,ns:gp200r]
[ INFO] [1648580238.512712496]: Loading group: [group:2,joints:{gripper_joint_1},name:gp200r_s1_controller,ns:gp200r]
[ INFO] [1648580238.512860932]: Loading group: [group:3,joints:{gripper_joint_3},name:gp200r_s2_controller,ns:gp200r]
[ INFO] [1648580238.513001498]: Loaded 4 groups
[ INFO] [1648580238.513187367]:  Initializing robot state 4 groups
[ INFO] [1648580238.513338664]: Initializing message manager with default comms fault handler
[ INFO] [1648580238.513469502]: Default communications fault handler successfully initialized
[ INFO] [1648580238.513599025]: Initializing message manager
[ INFO] [1648580238.513689304]: Added message handler for message type: 1
process[joint_trajectory_action-3]: started with pid [32943]
[ INFO] [1648580238.535568754]: Joint Trajectory Interface connecting to IP address: '10.177.82.131:50240'
[ INFO] [1648580238.539775878]: Added message handler for message type: 10
[ INFO] [1648580238.541214043]: Added message handler for message type: 15
[ INFO] [1648580238.541411959]: Creating joint_feedback_ex_relay_handler with 4 groups
[ INFO] [1648580238.551154146]: Loading topic list
[ INFO] [1648580238.551304988]: Found 4 topics
[ INFO] [1648580238.551468156]: Topic(state_value): [group:0,joints:{joint_1_s,joint_2_l,joint_3_u,joint_4_r,joint_5_b,joint_6_t},name:gp200r_r1_controller,ns:gp200r]
[ INFO] [1648580238.551595770]: Topic(state_value): [group:1,joints:{robot_mount_joint},name:gp200r_b1_controller,ns:gp200r]
[ INFO] [1648580238.551770093]: Topic(state_value): [group:2,joints:{gripper_joint_1},name:gp200r_s1_controller,ns:gp200r]
[ INFO] [1648580238.551951259]: Topic(state_value): [group:3,joints:{gripper_joint_3},name:gp200r_s2_controller,ns:gp200r]
[ INFO] [1648580238.552120622]: Loading group: [group:0,joints:{joint_1_s,joint_2_l,joint_3_u,joint_4_r,joint_5_b,joint_6_t},name:gp200r_r1_controller,ns:gp200r]
[ INFO] [1648580238.552348460]: Loading group: [group:1,joints:{robot_mount_joint},name:gp200r_b1_controller,ns:gp200r]
[ INFO] [1648580238.552424471]: Loading topic list
[ INFO] [1648580238.552512079]: Loading group: [group:2,joints:{gripper_joint_1},name:gp200r_s1_controller,ns:gp200r]
[ INFO] [1648580238.552673081]: Loading group: [group:3,joints:{gripper_joint_3},name:gp200r_s2_controller,ns:gp200r]
[ INFO] [1648580238.552843497]: Loaded 4 groups
[ INFO] [1648580238.552991583]: Added message handler for message type: 2017
[ INFO] [1648580238.553025999]: MotomanJointTrajectoryStreamer: init
[ INFO] [1648580238.553238088]: JointTrajectoryStreamer: init
[ INFO] [1648580238.553945624]: Found 4 topics
[ INFO] [1648580238.554053056]: Connected to server
[ INFO] [1648580238.554060368]: Topic(state_value): [group:0,joints:{joint_1_s,joint_2_l,joint_3_u,joint_4_r,joint_5_b,joint_6_t},name:gp200r_r1_controller,ns:gp200r]
[ INFO] [1648580238.554190716]: Topic(state_value): [group:1,joints:{robot_mount_joint},name:gp200r_b1_controller,ns:gp200r]
[ INFO] [1648580238.554247766]: Topic(state_value): [group:2,joints:{gripper_joint_1},name:gp200r_s1_controller,ns:gp200r]
[ INFO] [1648580238.554281675]: Topic(state_value): [group:3,joints:{gripper_joint_3},name:gp200r_s2_controller,ns:gp200r]
[ INFO] [1648580238.554353695]: Loading group: [group:0,joints:{joint_1_s,joint_2_l,joint_3_u,joint_4_r,joint_5_b,joint_6_t},name:gp200r_r1_controller,ns:gp200r]
[ INFO] [1648580238.554550611]: Loading group: [group:1,joints:{robot_mount_joint},name:gp200r_b1_controller,ns:gp200r]
[ INFO] [1648580238.554671246]: Loading group: [group:2,joints:{gripper_joint_1},name:gp200r_s1_controller,ns:gp200r]
[ INFO] [1648580238.554719923]: Loading group: [group:3,joints:{gripper_joint_3},name:gp200r_s2_controller,ns:gp200r]
[ INFO] [1648580238.554799479]: Loaded 4 groups
[ INFO] [1648580238.556755187]: Connected to server
[ INFO] [1648580238.556907205]: Successfully initialized robot state interface
[ INFO] [1648580238.557368877]: Entering message manager spin loop
[ INFO] [1648580238.648853377]: Unlocking mutex
[ INFO] [1648580238.648945051]: Starting Motoman joint trajectory streamer thread
[ INFO] [1648580238.654203456]: Connecting to robot motion server
[ WARN] [1648580238.654248799]: Tried to connect when socket already in connected state
[ WARN] [1648580366.228324804]: Motoman robot is now enabled and will accept motion commands.
[ INFO] [1648580369.364395635]: Receiving joint trajectory message Dynamic
[ INFO] [1648580369.379349712]: Loading trajectory, setting state to streaming
[joint_trajectory_action-3] process has died [pid 32943, exit code -11, cmd /home/bastian/stephen_pick_n_place_ws/devel/lib/motoman_driver/motoman_driver_joint_trajectory_action /joint_trajectory_action:=/arm_controller/follow_joint_trajectory __name:=joint_trajectory_action __log:=/home/bastian/.ros/log/c556229c-aeb0-11ec-a2eb-000c29df6e54/joint_trajectory_action-3.log].
log file: /home/bastian/.ros/log/c556229c-aeb0-11ec-a2eb-000c29df6e54/joint_trajectory_action-3*.log
[ INFO] [1648580369.614160301]: Executing trajectory of size: 32
[ERROR] [1648580369.690632579]: Aborting Trajectory.  Failed to send point (#0): Invalid message (3) : Trajectory start position doesn't match current robot position (3011)

@gavanderhoorn
Copy link
Member

This is our branch: https://github.com/BastianSolutionsRandD/motoman/tree/test_fixes_for_vanderlande

I'm not sure what to make of the commit history.

There appears to be a huge single commit, touching 660 files.

I can't tell whether that branch contains the changes I pointed to, or what other changes it contains.

@akashjinandra
Copy link
Author

That's because we are on a branch of a branch, but I will post one with my personal github tomorrow to confirm it's just the changes you recommended, is there anything else you want to see from the terminal when I run my test tommorow?

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 29, 2022

That's because we are on a branch of a branch

?

That would/should not change the commit history to only show a single commit.

is there anything [..] you want to see from the terminal when I run my test tommorow?

if things keep crashing, make sure to build motoman_driver with -DCMAKE_BUILD_TYPE=Debug (after having cleaned your workspace) and run motoman_driver_joint_trajectory_action in gdb. After it SEGFAULTs, ask gdb to print the backtrace. Then copy-paste that here. Without the backtrace, it's impossible to say where things go wrong.

And I'd also suggest updating your ROSCONSOLE_FORMAT env var to '[${severity}] [${time}] [${node}] [${logger}]: ${message}', so it's clearer which node printed which lines.

Reproducing will be difficult without an MWE, so if you could add a .bag with the relevant topics, that would be appreciated.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 30, 2022

@akashjinandra: I've been able to reproduce the crash. I don't believe I would need you to test anything right now.

Having some representative JointTrajectorys would be helpful though.


Edit: could you confirm you see the crash when you're trying to execute a trajectory for a (MoveIt) group which doesn't include all joints (ie: only the robot's joints, or the robot's + one of the gripper joints)?


Edit 2: after adding a check (to make sure trajectory.joint_names == all_joint_names_) the JTA no longer crashes when sent partial goals.

I assume however that you'd like to be able to submit goals for specific groups individually @akashjinandra, correct? Technically, the global JTA was not supposed to be used for that I believe (but I was not involved in the design, nor the implementation). But I also don't believe you can send goals to the per-group action servers at the same time (which could perhaps offer a work-around, as MoveIt can deal with multiple controllers).

The changes in kinetic-devel...gavanderhoorn:rebased_two_arms_on_a_rail do not add partial-goal capability. #259 does. That's likely why you see the crashes.

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Mar 30, 2022

@akashjinandra: please ignore my previous request, and please test kinetic-devel...gavanderhoorn:coalesced_joint_feedback_ex_relay merged into #259. kinetic-devel...gavanderhoorn:pr259_rebased_with_coalesced_feedback.

My assumption is the changes in that branch will help with the outdated JointState msgs MoveIt is trying to use.


Edit: I've gone ahead and rebased #259 on-top of current kinetic-devel and merged gavanderhoorn:coalesced_joint_feedback_ex_relay into it.

I cannot stress-test it in the same way you are @akashjinandra, so let me know whether this solves your current issue.

@gavanderhoorn
Copy link
Member

Friendly ping @akashjinandra.

@akashjinandra
Copy link
Author

Hello @gavanderhoorn we tested this driver: #488 and it seems to be working but are doing a couple more tests today to say for sure. I will also try out the one that you have created for us.

@gavanderhoorn
Copy link
Member

The changes proposed in #488 are similar to the ones in #259, with some additions. It's also incomplete, just as #259 is/was.

If it works for you, it works of course.

The branch I created only tries to address the problem of out-of-date JointState messages. It's not a fix for all the known problems.

I would still like to know whether the changes to the multi-group publishers help avoid the out-of-date JointState messages, so if you could test that, that would be great.

@akashjinandra
Copy link
Author

akashjinandra commented Apr 4, 2022

Hey @gavanderhoorn I'm trying to look at the joint state topics it seems like there aren't any separate ones for each joint group they are all in one message now, how would you like me to verify that this problem is fixed?

@gavanderhoorn
Copy link
Member

gavanderhoorn commented Apr 4, 2022

There should still be separate topics. That hasn't changed.

What did change is that the global topic (ie: /joint_states) now carries JointState messages which are the coalesced version of the messages on the individual topics.

Could you show me a rostopic list | grep joint_states and a rostopic echo -n1 /joint_states (provided /joint_states is the topic published to by motoman_driver).

how would you like me to verify that this problem is fixed?

whatever you were doing earlier, and which caused the problem to occur.

I suspect the problem with the outdated JointState messages was caused by essentially starvation. The publishers all had queue sizes of 1, including the global topic, and it's possible MoveIt / your program(s) just wasn't able to receive all the messages it needed. This would cause outdated state to be used when setting up new FollowJointTrajectoryAction goals, leading to motoman_driver reporting the error that started this thread.

In all cases you reported, MotoROS was correct in that the start state of the trajectory did not match the current state. Incomplete multi-group support was not the cause there. Outdated JointStates being used is a much more likely cause.

@gavanderhoorn
Copy link
Member

Friendly ping @akashjinandra.

If you can't find an opportunity to test this any more, also ok, but please let me know.

@gavanderhoorn
Copy link
Member

I'm going to assume you got the immediate issue fixed and don't/can't spend time testing kinetic-devel...gavanderhoorn:pr259_rebased_with_coalesced_feedback any more.

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-info-needed Waiting on additional information from poster multi-group Issues related to multi-group support
Development

No branches or pull requests

4 participants