feat: look at person (#166)

* refactor: use correct naming conventions. * feat/refactor: introduce generic object detection skills, modularise from this. * refactor: let the generic perception skills acquire their own sensor input (image/pcl). * refactor/chore: imports and add LookForSeats stub. * feat: always publish markers for debugging, when doing 3D point estimation. * WIP: look for place to seat people. * chore: default sample rate. * chore: setup receptionist demo points. * fix: sleep. * feat: LookToPoint skill. * chore: reorganise receptionist Rasa data and pipeline. * feat: additional motions. * chore: update receptionist demo. * feat: look for chair and seat guest. * WIP: get guest name. * feat: be more informative. * fix: set unknown. * feat: announce name as well as drink. * refactor: ignore point. * feat: be more explicit; look at the chair and semantically describe its location. * WIP. * feat: add option to specify device to use for whisper. * feat: simple transcribe microphone service. * refactor: make transcribe and parse service use the new 'simple' transcribe audio service. * feat: terminate audio interface. * chore: update CMakeLists. * refactor: acquire microphone when initialising service. * chore: update launch file. * fix: remove call to redundant method, * fix: error handling. * feat/fix: pre-load whisper model by prompting it, adjust for ambient noise at the beginning. * chore: configuration required for deploying docs * docs: update container documentation * ci: first draft of documentation build through actions * ci: remove submodules * ci: use ubuntu 20.04 * ci: try using ros:noetic container image * ci: catkin tools isn't available in container * ci: remove extranous source line * ci: try the one-liner ROS installer * ci: use the official ROS noetic installation guide * ci: install catkin tools * ci: try sourcing before commands * ci: deploy to github pages (no stage check yet) * ci: use deploy key * ci: only deploy if on main * chore: update receptionist data and pipeline. * fix: explicitly set spacy NLP model. * fix: drop age for now, since spacy is broken(?). * feat: point and raise_torso motions. * feat: point to chair. * feat: additional motions. * fix: reset motions. * fix: handle empty transcriptions. * fix: enable start state. * fix: properly reset motions * fix(ci): use github_token for gh pages deploy. * revert: fix(ci): use github_token for gh pages deploy. * feat: save microphone output to audio files * chore: update gitignore with python setup files * fix: filename typo and add to cmake * testtesttest * Test * Test * Message Change * fixed message writting and reading * fixed image crop * Updated the feature extraction node and changed the messages. * fixed that torsal and head frames were inversed. * Changed colour format from BGR to RGB within the detection process. * keep the saved model * keep the model file * keep the saved model * Cancel saving the images (but sitll cannot see use cv2.imshow) * Runnable demo * added the hair colour distribution matching method * retrained model is very robust so changed the threshold * Moving the head to meet the person. * xyz axis readable. * (Hopefully) Runnable with 3d input. * Speak normally. * Try to move the head. * testtesttest * Test * Test * Message Change * fixed message writting and reading * fixed image crop * Updated the feature extraction node and changed the messages. * fixed that torsal and head frames were inversed. * Changed colour format from BGR to RGB within the detection process. * keep the saved model * keep the model file * keep the saved model * Cancel saving the images (but sitll cannot see use cv2.imshow) * Runnable demo * added the hair colour distribution matching method * retrained model is very robust so changed the threshold * Moving the head to meet the person. * xyz axis readable. * (Hopefully) Runnable with 3d input. * ah * At least the head moves, not looking at me though. * Cleaned the file to have only one model appear. * Replace the old model with the new one. * correct the lost module. * info update * fixed issues in the service * fixed a stupic typo * runnable version for full demo * Recover the state machine for demo. * Added a simple loop to refresh the frame taken, should work fine. * adding learn_face * adding detect faces * Add detect faces state * editing detect phases * Implement state for learning faces * adding in extra states * any changes * adjustments when testing * push lol * finished merging? * new state machine * merging from rexy laptop * fixed detect faces * getting rid of lasr_speech * swapping from base controllers to skills * fixing state machine after merge with main * making changes * TEST: .idea diff * feat: get name and drink skill & prior data yaml * fix: missing keys for get name & drink constructor * feat: stubs for get attribute state * feat: wrap head manager and lowering base into GoToLocation. * feat: state machine setup. * test: Github tracking fixes * feat: allow AskAndListen to take static text, a format string or reda from userdata. * refactor: being picky about spaces. * fix: import states in dunder init. * feat: get name and drink, and stub for attributes. * fix: typehint for userdata * feat: setup host. * fix: typehint. * fix: refactor ParseNameAndDrink state to convert possible names and drinks to lowercase * refactor: receptionist -> sm_merge (conflicts with python package...) * refactor: remove redundant __init__.py * fix: container consistency. * fix: initialise data correctly. * fix: lower. * fix: uncomment. * feat: take seating area pose and polygon as input to sm. * feat: guide guest to seating area. * feat: add introduce state * feat: add introduction to state machine * feat(WIP): find empty seat. * fix: sneaky parenthesis stops boolifying. * fix: correctly access guest attribute data * fix: get introduce working. * feat(WIP, still): find an empty seat. * feat/refactor: SeatGuest is working and rename. * feat: update init.py * refactor: static motions. * feat: seat guest in state machine. * feat: send head back to default. * refactor: center -> centre. * feat: logic for second guest and pass guest id through constructors. * fix: transition. * refactor: drop unused method. * Remove print statement and add TODO. * fix: namespace. * fix: transition and assume PlayMotion succeeded. * test: yaml setup for testing. * fix: transitions. * refactor: remove redundant comment. * fix: dumb intersection bug. * feat: clip skill for VQA * feat: attempt of clip for attributes * fix: missing imports * feat: tidied clip attributes * refactor: cleanup. * refactor: cleanup again. * refactor: massive cleanup. * refactor: remove hair colour images. * refactor: cleanup cv2_img. * refactor: move Vqa service message into lasr_vision_msgs. * fix: import. * cleanup: remove vision changes. * refactor: cleanup clip vqa skill. * refactor: cleanup describe people skill. * refactor: cleanup vision msgs. * refactor: remove shebang and unused import. * refactor: remove unnecessary message. * refactor: make QueryImage a ServiceState. * refactor: make QueryImage a ServiceState. * build: CMake and requirements. * feat: fix the msgs for bodypix * feat: test initial segment person * feat: filter the poses liek the masks * feat: filter the poses liek the masks * chore: add pcl to img msg and keep the same timestampt * chore: change the look to point to use pointstamp instead * chore: add the 3d image skill * feat: working look at person * chore: cleanup for PR * fix: the PR comments + add head manager disable/enable * feat: wip * feat: working wip * feat: look at multiple people working * chore: prep for PR * chore: reformat agaaaain --------- Co-authored-by: Jared Swift <[email protected]> Co-authored-by: Paul Makles <[email protected]> Co-authored-by: m-barker <[email protected]> Co-authored-by: Benteng Ma <[email protected]> Co-authored-by: MBT <[email protected]> Co-authored-by: tiago <[email protected]> Co-authored-by: Zoe <[email protected]> Co-authored-by: Aaliyah Merchant <[email protected]> Co-authored-by: Haiwei Luo <[email protected]>
LASR-at-Home · May 24, 2024 · d7c3ca9 · d7c3ca9
1 parent 953d35c
commit d7c3ca9
Show file tree

Hide file tree

Showing 7 changed files with 309 additions and 13 deletions.
diff --git a/common/helpers/cv2_pcl/src/cv2_pcl/__init__.py b/common/helpers/cv2_pcl/src/cv2_pcl/__init__.py
@@ -2,12 +2,23 @@
 from sensor_msgs.msg import PointCloud2
 import ros_numpy as rnp
 import cv2
+from cv2_img import cv2_img_to_msg
 
 from typing import Tuple, Union
 
 Mat = np.ndarray
 
 
+def pcl_to_img_msg(pcl: PointCloud2) -> Mat:
+    """
+    Convert a given PointCloud2 message to img_msg
+    """
+    # keep the same timestamp
+    cv2 = pcl_to_cv2(pcl)
+
+    return cv2_img_to_msg(cv2, pcl.header.stamp)
+
+
 def pcl_to_cv2(
     pcl: PointCloud2, height: Union[int, None] = None, width: Union[int, None] = None
 ) -> Mat:

diff --git a/common/vision/lasr_vision_bodypix/src/lasr_vision_bodypix/bodypix.py b/common/vision/lasr_vision_bodypix/src/lasr_vision_bodypix/bodypix.py
@@ -114,7 +114,6 @@ def detect(
         keypoints_msg = []
 
         for i, keypoint in pose.keypoints.items():
-
             if camel_to_snake(keypoint.part) in request.masks[0].parts:
                 keypoint_msg = BodyPixKeypoint()
                 keypoint_msg.xy = [int(keypoint.position.x), int(keypoint.position.y)]

diff --git a/common/vision/lasr_vision_msgs/msg/BodyPixPose.msg b/common/vision/lasr_vision_msgs/msg/BodyPixPose.msg
@@ -1,2 +1 @@
-# keypoints
 BodyPixKeypoint[] keypoints
diff --git a/skills/src/lasr_skills/detect_faces.py b/skills/src/lasr_skills/detect_faces.py
@@ -3,21 +3,30 @@
 
 from lasr_vision_msgs.srv import DetectFaces as DetectFacesSrv
 from sensor_msgs.msg import Image
+from cv2_pcl import pcl_to_img_msg
 
 
 class DetectFaces(smach.State):
-
-    def __init__(self, image_topic: str = "/xtion/rgb/image_raw"):
+    def __init__(
+        self,
+        image_topic: str = "/xtion/rgb/image_raw",
+    ):
         smach.State.__init__(
-            self, outcomes=["succeeded", "failed"], output_keys=["detections"]
+            self,
+            outcomes=["succeeded", "failed"],
+            input_keys=["pcl_msg"],
+            output_keys=["detections"],
         )
+        self.img_msg = None
         self._image_topic = image_topic
         self._detect_faces = rospy.ServiceProxy("/detect_faces", DetectFacesSrv)
 
     def execute(self, userdata):
-        img_msg = rospy.wait_for_message(self._image_topic, Image)
+        self.img_msg = pcl_to_img_msg(userdata.pcl_msg)
+        if not self.img_msg:
+            self.img_msg = rospy.wait_for_message(self._image_topic, Image)
         try:
-            result = self._detect_faces(img_msg)
+            result = self._detect_faces(self.img_msg)
             userdata.detections = result
             return "succeeded"
         except rospy.ServiceException as e:

diff --git a/skills/src/lasr_skills/look_at_person.py b/skills/src/lasr_skills/look_at_person.py
@@ -0,0 +1,282 @@
+import smach_ros
+from geometry_msgs.msg import PointStamped
+import smach
+from vision import GetPointCloud
+from lasr_vision_msgs.srv import BodyPixDetection, BodyPixDetectionRequest
+from lasr_vision_msgs.msg import BodyPixMaskRequest
+from lasr_skills import LookToPoint, DetectFaces
+import cv2
+import rospy
+from sensor_msgs.msg import Image
+from cv2_img import cv2_img_to_msg, msg_to_cv2_img
+from markers import create_and_publish_marker
+from visualization_msgs.msg import Marker
+from cv2_pcl import pcl_to_img_msg
+import ros_numpy as rnp
+import rosservice
+from smach import CBState
+from std_msgs.msg import String
+import actionlib
+from control_msgs.msg import PointHeadAction, PointHeadGoal
+from geometry_msgs.msg import Point
+
+PUBLIC_CONTAINER = False
+
+try:
+    from pal_startup_msgs.srv import (
+        StartupStart,
+        StartupStop,
+        StartupStartRequest,
+        StartupStopRequest,
+    )
+except ModuleNotFoundError:
+    PUBLIC_CONTAINER = True
+
+
+class LookAtPerson(smach.StateMachine):
+    class CheckEyes(smach.State):
+        def __init__(self, debug=True):
+            smach.State.__init__(
+                self,
+                outcomes=["succeeded", "failed", "no_detection"],
+                input_keys=["bbox_eyes", "pcl_msg", "masks", "detections"],
+                output_keys=["pointstamped"],
+            )
+            self.DEBUG = debug
+            self.img_pub = rospy.Publisher("/debug/image", Image, queue_size=1)
+            self.marker_pub = rospy.Publisher("eyes", Marker, queue_size=1)
+            self.look_at_pub = actionlib.SimpleActionClient(
+                "/head_controller/point_head_action", PointHeadAction
+            )
+
+        def execute(self, userdata):
+            rospy.sleep(1)
+            if len(userdata.bbox_eyes) < 1 and len(userdata.detections.detections) > 0:
+                return "succeeded"
+            elif (
+                len(userdata.bbox_eyes) < 1 and len(userdata.detections.detections) < 1
+            ):
+                return "no_detection"
+
+            for det in userdata.bbox_eyes:
+                left_eye = det["left_eye"]
+                right_eye = det["right_eye"]
+                eye_point = (left_eye[0] + right_eye[0]) / 2, (
+                    left_eye[1] + right_eye[1]
+                ) / 2
+
+                if self.DEBUG:
+                    img = msg_to_cv2_img(pcl_to_img_msg(userdata.pcl_msg))
+                    cv2.circle(img, (left_eye[0], left_eye[1]), 5, (0, 255, 0), -1)
+                    cv2.circle(img, (right_eye[0], right_eye[1]), 5, (0, 255, 0), -1)
+                    img_msg1 = cv2_img_to_msg(img)
+                    self.img_pub.publish(img_msg1)
+
+                pcl_xyz = rnp.point_cloud2.pointcloud2_to_xyz_array(
+                    userdata.pcl_msg, remove_nans=False
+                )
+                eye_point_pcl = pcl_xyz[int(eye_point[1]), int(eye_point[0])]
+                if any([True for i in eye_point_pcl if i != i]):
+                    eye_point_pcl = pcl_xyz[int(right_eye[1]), int(right_eye[0])]
+
+                look_at = PointStamped()
+                look_at.header = userdata.pcl_msg.header
+                look_at.point.x = eye_point_pcl[0]
+                look_at.point.y = eye_point_pcl[1]
+                look_at.point.z = eye_point_pcl[2]
+
+                if self.DEBUG:
+                    create_and_publish_marker(self.marker_pub, look_at, r=0, g=1, b=0)
+
+                userdata.bbox_eyes.remove(det)
+                userdata.pointstamped = look_at
+
+                self.look_at_pub.wait_for_server()
+                goal = PointHeadGoal()
+                goal.pointing_frame = "head_2_link"
+                goal.pointing_axis = Point(1.0, 0.0, 0.0)
+                goal.max_velocity = 1.0
+                goal.target = look_at
+                self.look_at_pub.send_goal(goal)
+
+                return "succeeded"
+
+            return "failed"
+
+    @smach.cb_interface(input_keys=["poses", "detections"], output_keys=["bbox_eyes"])
+    def match_poses_and_detections(ud):
+        bbox_eyes = []
+        for pose in ud.poses:
+            for detection in ud.detections.detections:
+                temp = {
+                    "bbox": detection.xywh,
+                }
+                for keypoint in pose.keypoints:
+                    if (
+                        keypoint.part == "leftEye"
+                        and detection.xywh[0]
+                        < keypoint.xy[0]
+                        < detection.xywh[0] + detection.xywh[2]
+                        and detection.xywh[1]
+                        < keypoint.xy[1]
+                        < detection.xywh[1] + detection.xywh[3]
+                    ):
+                        temp["left_eye"] = keypoint.xy
+                    if (
+                        keypoint.part == "rightEye"
+                        and detection.xywh[0]
+                        < keypoint.xy[0]
+                        < detection.xywh[0] + detection.xywh[2]
+                        and detection.xywh[1]
+                        < keypoint.xy[1]
+                        < detection.xywh[1] + detection.xywh[3]
+                    ):
+                        temp["right_eye"] = keypoint.xy
+
+                    if "left_eye" in temp and "right_eye" in temp:
+                        bbox_eyes.append(temp)
+
+        ud.bbox_eyes = bbox_eyes
+
+        return "succeeded"
+
+    def __init__(self):
+        super(LookAtPerson, self).__init__(
+            outcomes=["succeeded", "failed"],
+            input_keys=[],
+            output_keys=["masks", "poses", "pointstamped"],
+        )
+        self.DEBUG = rospy.get_param("/debug", True)
+        IS_SIMULATION = (
+            "/pal_startup_control/start" not in rosservice.get_service_list()
+        )
+
+        with self:
+            if not IS_SIMULATION:
+                if PUBLIC_CONTAINER:
+                    rospy.logwarn(
+                        "You are using a public container. The head manager will not be stopped during navigation."
+                    )
+                else:
+                    smach.StateMachine.add(
+                        "DISABLE_HEAD_MANAGER",
+                        smach_ros.ServiceState(
+                            "/pal_startup_control/stop",
+                            StartupStop,
+                            request=StartupStopRequest("head_manager"),
+                        ),
+                        transitions={
+                            "succeeded": "GET_IMAGE",
+                            "aborted": "failed",
+                            "preempted": "failed",
+                        },
+                    )
+            smach.StateMachine.add(
+                "GET_IMAGE",
+                GetPointCloud("/xtion/depth_registered/points"),
+                transitions={
+                    "succeeded": "SEGMENT_PERSON",
+                },
+                remapping={"pcl_msg": "pcl_msg"},
+            )
+
+            eyes = BodyPixMaskRequest()
+            eyes.parts = ["left_eye", "right_eye"]
+            masks = [eyes]
+
+            smach.StateMachine.add(
+                "SEGMENT_PERSON",
+                smach_ros.ServiceState(
+                    "/bodypix/detect",
+                    BodyPixDetection,
+                    request_cb=lambda ud, _: BodyPixDetectionRequest(
+                        pcl_to_img_msg(ud.pcl_msg), "resnet50", 0.7, masks
+                    ),
+                    response_slots=["masks", "poses"],
+                    input_keys=["pcl_msg"],
+                    output_keys=["masks", "poses"],
+                ),
+                transitions={
+                    "succeeded": "DETECT_FACES",
+                    "aborted": "failed",
+                    "preempted": "failed",
+                },
+            )
+            smach.StateMachine.add(
+                "DETECT_FACES",
+                DetectFaces(),
+                transitions={
+                    "succeeded": "MATCH_POSES_AND_DETECTIONS",
+                    "failed": "failed",
+                },
+                remapping={"pcl_msg": "pcl_msg", "detections": "detections"},
+            )
+
+            smach.StateMachine.add(
+                "MATCH_POSES_AND_DETECTIONS",
+                CBState(
+                    self.match_poses_and_detections,
+                    input_keys=["poses", "detections"],
+                    output_keys=["poses"],
+                    outcomes=["succeeded", "failed"],
+                ),
+                transitions={"succeeded": "CHECK_EYES", "failed": "failed"},
+                remapping={"bbox_eyes": "bbox_eyes"},
+            )
+            smach.StateMachine.add(
+                "CHECK_EYES",
+                self.CheckEyes(self.DEBUG),
+                transitions={
+                    "succeeded": "LOOK_TO_POINT",
+                    "failed": "failed",
+                    "no_detection": "succeeded",
+                },
+                remapping={
+                    "pcl_msg": "pcl_msg",
+                    "bbox_eyes": "bbox_eyes",
+                    "pointstamped": "pointstamped",
+                },
+            )
+
+            smach.StateMachine.add(
+                "LOOK_TO_POINT",
+                LookToPoint(),
+                transitions={
+                    "succeeded": "LOOP",
+                    "aborted": "failed",
+                    "preempted": "failed",
+                },
+                remapping={"pointstamped": "pointstamped"},
+            )
+            smach.StateMachine.add(
+                "LOOP",
+                smach.CBState(
+                    lambda ud: "succeeded" if len(ud.bbox_eyes) > 0 else "finish",
+                    input_keys=["bbox_eyes"],
+                    output_keys=["bbox_eyes"],
+                    outcomes=["succeeded", "finish"],
+                ),
+                transitions={
+                    "succeeded": "CHECK_EYES",
+                    "finish": "ENABLE_HEAD_MANAGER",
+                },
+            )
+            if not IS_SIMULATION:
+                if PUBLIC_CONTAINER:
+                    rospy.logwarn(
+                        "You are using a public container. The head manager will not be start following navigation."
+                    )
+                else:
+                    smach.StateMachine.add(
+                        "ENABLE_HEAD_MANAGER",
+                        smach_ros.ServiceState(
+                            "/pal_startup_control/start",
+                            StartupStart,
+                            request=StartupStartRequest("head_manager", ""),
+                        ),
+                        transitions={
+                            "succeeded": "succeeded",
+                            "preempted": "failed",
+                            "aborted": "failed",
+                        },
+                    )
diff --git a/skills/src/lasr_skills/look_to_point.py b/skills/src/lasr_skills/look_to_point.py
@@ -13,10 +13,7 @@ def __init__(self):
                 pointing_frame="head_2_link",
                 pointing_axis=Point(1.0, 0.0, 0.0),
                 max_velocity=1.0,
-                target=PointStamped(
-                    header=Header(frame_id="map"),
-                    point=ud.point,
-                ),
+                target=ud.pointstamped,
             ),
-            input_keys=["point"],
+            input_keys=["pointstamped"],
         )
diff --git a/tasks/receptionist/launch/setup.launch b/tasks/receptionist/launch/setup.launch
@@ -19,5 +19,4 @@
     <node pkg="lasr_vision_feature_extraction" type="service" name="torch_service" output="screen"/>
     <node pkg="lasr_vision_bodypix" type="service" name="bodypix_service" output="screen"/>
 
-
 </launch>
Original file line number	Diff line number	Diff line change
Expand Up		@@ -19,5 +19,4 @@
		<node pkg="lasr_vision_feature_extraction" type="service" name="torch_service" output="screen"/>
		<node pkg="lasr_vision_bodypix" type="service" name="bodypix_service" output="screen"/>


		</launch>