-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Survey manager execute goals script #103
Survey manager execute goals script #103
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Marina- I bet this PR is pretty out of date by now but just wanted to provide some feedback. Happy to talk more about next steps!
send_command("python gds_helper_batch.py -i cmd -- bagger -start stereo_" + bay + "_" + run) | ||
send_command("python gds_helper_batch.py -i cmd -- plan -load plans/ISAAC/" + bay + "_stereo_mapping.fplan") | ||
send_command("python gds_helper_batch.py -i cmd -- plan -run") | ||
send_command("python gds_helper_batch.py -i cmd -- bagger -stop") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is a work in progress, just wanted to re-raise some issues we've discussed that need resolution;
- Propagating errors up from these subcommands to represent action failure (may be a good place to use Python exceptions)
- But probably good to have a default interactive retry loop around most commands (continues on success, waits for operator feedback about retry on failure, and if you choose retry the plansys2 executor is never told about the failure). For the multi-subcommand actions, it's an interesting question whether the retry should wrap around the whole sequence or the individual subcommands.
- Some subcommands (
bagger -stop
) sort of want to be in a Pythonfinally:
clause so they always get run even if an earlier command fails. - Multi-robot will affect this somehow. Probably either wrapping other-robot commands in
ssh
or somehow namespacing the ROS topics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, Marina- This is a nice start! I think it has a ways to go, and I'm a bit concerned about getting it to be really robust (like, can we test it enough in the relevant edge cases).
try: | ||
while True: | ||
# Get output from process | ||
output = process.stdout.readline() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a busy loop. You can avoid that using select.select()
.
# Check if the user wants to exit | ||
if user_input.lower().strip() == 'exit': | ||
break | ||
sock_input.send(user_input.encode("utf-8")[:1024]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not great to silently discard user input past 1024 characters. Possibly replace with an assertion that user_input
is not too big, or at least print a warning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
user input is always just two characters, anything over that is not valid. So I actually added way more than needed (not sure if that's something to avoid)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not saying that 1024 is too short. I'm saying that in general it's bad to accept an invalid input and silently try to sort-of-but-not-really fix it and keep going.
So the recommendation is to switch from silently truncating the string to doing something more sane. It could be reporting a warning and dropping the invalid input, or it could be raising an AssertionError when an assert fails, whatever behavior works out best for the user.
# Check if the user wants to exit | ||
if user_input.lower().strip() == 'exit': | ||
break | ||
sock_input.send(user_input.encode("utf-8")[:1024]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are trying to write line-oriented messages to a byte-oriented stream but without boundaries between lines (like \n
). This is not robust. There is no guarantee that the whole line will be sent in one send()
call, nor that the receiver will receive it with one recv()
call.
You should check the docs for socket.send()
. Unfortunately, you need to pay attention to the number of bytes sent and keep trying (with a truncated buffer) if not all were sent.
And if you want to make sure you're sending complete lines, it would be way better to use a message protocol where finding message boundaries doesn't rely on UTF-8 decoding. Like each message could start with a binary representation of the number of encoded bytes in the line, then the rest of the message is exactly that number of encoded bytes. The reason to do it this way is the mismatch between UTF-8 symbols and bytes. If you try to decode the first part of a byte string that encodes a UTF-8 string, the byte string could be truncated in the middle of a multi-character UTF-8 symbol, so the decoding will fail. But if you make sure you get all the bytes corresponding to a complete UTF-8 string, you won't have that problem.
(Note that the decoding problem I'm describing is one of those things that will probably work most of the time but occasionally fail for obscure reasons, so a nasty bug that would be hard to replicate.)
while not self._stop_event.is_set(): | ||
print("waiting to receive") | ||
try: | ||
request = client_socket.recv(1024).decode("utf-8") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a robust way to receive a line. There is no guarantee that your recv()
call will return the complete message sent by a send()
call. Your decode()
call could fail if the byte sequence is truncated in the middle of a UTF-8 symbol. In another comment I describe a protocol you could use to ensure you get a complete message and avoid the decode()
problem.
* Add `survey_manager` subdirectory without layout for survey nodes Includes placeholder directories with hyperlinked `readme.md` files in the documentation hierarchy for the manger, planner, and bridge. Per discussion with @marinagmoreira `survey_manager` lives under `astrobee` to signify the code is intended for cross-compilation. * Improve linking to survey manager docs * Add files via upload Adding planning domain, problem, and xml of the atomic actions of the robotic agents * added files to the correct directory * adding soft constraints problem files * Attempt at a complete MVP domain model * Oops, remove redundant effect * Simplified collision checking predicate * In a working state * Add sample_output_plan.txt * Oops, remove obsolete commented-out code * Fixed stereo survey part of the model based on analysis of old stereo surveys by Marina - unfortunately causes additional planner flakiness * Update sample_output_plan.txt to reflect latest domain/problem * Now generate PDDL problem from higher-level problem specification * Tune panorama estimated duration * Add plan_interpreter.py. Minor cleanup in problem_generator.py * Simplify dynamic config just a bit * Add `survey_manager` subdirectory without layout for survey nodes Includes placeholder directories with hyperlinked `readme.md` files in the documentation hierarchy for the manger, planner, and bridge. Per discussion with @marinagmoreira `survey_manager` lives under `astrobee` to signify the code is intended for cross-compilation. * Improve linking to survey manager docs * Remove `survey_bridge` capability will be added to `astrobee` * Add traclabs plansys2 backport via submodule, thanks @ana-GT * Move sub-modules to `survey_manager` path * Upgrade behaviortree to V4 NOTE: If already installed, remove V3 before installing V4 * Cleanup unused, misplaced sub-modules, again. * Remove `survey_manager` package and organize `survey_planner` * Deprecate Ubuntu-16.04 (xenial) builds of Isaac to support Plansys2 - Remove Ubuntu-16.04 (xenial) CI builds - Update dockerfile ubuntu version defaults to Ubuntu-20.04 (focal) - Update apk build environment to Ubuntu-20.04 (focal) * Fix python formatting * Update to forks of traclabs backports * Add readline development files to setup for plansys2 terminal interface If you have already built your VM, run `sudo apt install libreadline-dev` * Removed outdated PDDL files per @trey0 nasa#107 (comment) * Revert CI upgrades for APK builds --------- Co-authored-by: Abiola Akanni <[email protected]> Co-authored-by: Trey Smith <[email protected]>
make survey manager PR's run CI
…nager_execute_goals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main thing is I would elaborate in the documentation on command_astrobee, it's unclear what this is for if you aren't already aware.
astrobee/survey_manager/survey_planner/src/dock_action_node.cpp
Outdated
Show resolved
Hide resolved
astrobee/survey_manager/survey_planner/src/panorama_action_node.cpp
Outdated
Show resolved
Hide resolved
|
||
# This class starts a new process and lets you monitor the input and output | ||
# Mostly used for actions where user inteference might be required | ||
class ProcessExecutor: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's very unclear to me from the name and comment what this class does--- can you be more specific?
# If broken pipe connect again | ||
if not request: | ||
break | ||
print("got: " + request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
were these to debug, should they stay here?
# Get the return code of the process | ||
process.kill() | ||
finally: | ||
# Forcefully stop the thread (not recommended) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this not recommended?
) | ||
args = parser.parse_args() | ||
|
||
exit_code = survey_manager_executor(args.robot_name, args.command_name, args.to_bay, args.from_bay, args.run, args.config_static) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sys.exit(...)?
while True: | ||
# Get user input dynamically | ||
user_input = input() | ||
print("user input: " + user_input) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still want this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So much progress here! Thank you!
I have lots of comments, but I would like to go ahead and merge this quickly and then work to iterate.
berth: | ||
berth1: "1" | ||
berth2: "2" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would note this whole section seems intended for managing inter-module moves using the survey manager, which is outside our agreed scope. Let's not get too wrapped up in working on this yet. Good to note it's isolated into fields of the YAML file that are not used elsewhere yet, so it shouldn't break anything.
usl_bay3: "-pos '2.6 0 4.8'" | ||
usl_bay4: " -pos '1.55 0 4.8'" | ||
usl_bay5: "-pos '0.5 0 4.8'" | ||
usl_bay6: "-pos '-0.5 0 4.8'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole section is implementing features that are out of scope, so I don't want to encourage more work on this right now.
But here are some thoughts for improvements if we are able to add inter-module motion as a stretch goal:
It's too bad many of the positions declared here are copy/paste of position data declared in bays
above. Probably a cleaner way overall would be to:
- Separate the complete list of known location pose declarations from the (subset) list of locations that should be included in the PDDL problem instance. This is potentially important because including irrelevant locations can hurt planner performance.
- The location pose declarations can be split into
position
and (optional)attitude
fields. - The correct move arguments for a location can be generated from the pose declaration.
- The current problem generator has a dumb way to infer which locations are connected, relying on everything being sorted by bay number, that would need to be revisited if there are multiple modules involved.
- The current PDDL domain model can't readily represent different attitudes required for moving in different directions. But that's ok. The action executor receives both
from
andto
position for each move action and could use this info to implement an attitude constraint without exposing this level of detail to the planner.
@@ -2,7 +2,7 @@ | |||
;; Command was: ./tools/problem_generator.py | |||
;; Working directory was: /home/vagrant/isaac/astrobee/survey_manager/survey_planner | |||
;; Problem template: pddl/jem_survey_template.pddl | |||
;; Config 1: data/jem_survey_static.yaml | |||
;; Config 1: data/survey_static.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remember it's not currently in scope to manage inter-module moves with the survey manager. It would be viable to use different static config files for the JEM and USL surveys, and might be simpler. At least we should keep that in mind as an open option.
astrobee/survey_manager/survey_planner/tools/survey_planner/command_astrobee
Outdated
Show resolved
Hide resolved
self.ack_msg = msg | ||
self.ack_needed = False | ||
|
||
def publish_and_wait_response(self, cmd): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Intended return value semantics are not documented. If the only possible return values are currently 0 or 1, maybe the return type should be changed to bool
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
per exit codes standards, anything other than zero is failure, so I'm passing a non-zero code back if it fails. I want it to be a number instead of a bool because we're adding the exit_codes to evaluate overall success
Creating a proof of concept executor that would fill the gap between the survey manager goals and the robot commands.
This is still very archaic, I just tested it in simulation that I was able to interact with the inspection_tool normally and that the recursive
-resume
worked.There are still many things to add like the
-ns
flags to the commands, the/env_wrapper
for robot commands, proper error handling, etc, but I figured the sooner I share this and get feedback the better such that we're all working towards a common goal. (The interface will also be changed as theplan_interpreter.py
starts taking shape.. we could even join both scripts)@trey0 @Bckempa