diff --git a/docs/design/autoware-architecture/perception/image/high-level-perception-diagram.drawio.svg b/docs/design/autoware-architecture/perception/image/high-level-perception-diagram.drawio.svg new file mode 100644 index 00000000000..c9601c78670 --- /dev/null +++ b/docs/design/autoware-architecture/perception/image/high-level-perception-diagram.drawio.svg @@ -0,0 +1,4 @@ + + + +
Localization
Component
Localization...
API Layer
API Layer
Planning Component
Planning Component
Perception Component
Perception Component
Sensing
Component
Sensing...
Traffic Light Recognition
Traffic Light Recognition
Occupancy Grid Map
Occupancy Grid Map
Obstacle Segmentation
Obstacle Segmentation
Map
Component
Map...
Object Recognition
Object Recognition
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/design/autoware-architecture/perception/image/reference-implementaion-perception-diagram.drawio.svg b/docs/design/autoware-architecture/perception/image/reference-implementaion-perception-diagram.drawio.svg new file mode 100644 index 00000000000..8abfd017a07 --- /dev/null +++ b/docs/design/autoware-architecture/perception/image/reference-implementaion-perception-diagram.drawio.svg @@ -0,0 +1,4 @@ + + + +
To Planning
To Planning
Occupancy Grid Map
Occupancy Grid Map
Obstacle Segmentation
Obstacle Segmentation
Traffic Light Recognition
Traffic Light Recognition
Traffic Light Detector
Traffic Light Detector
Traffic Light Classifier
Traffic Light Classifier
traffic_light_multi_camera_fusion performs traffic light signal fusion which can be summarized as the following two tasks: Multi-Camera-Fusion: performed on single traffic light signal detected by different cameras. Group-Fusion: performed on traffic light signals within the same group, which means traffic lights sharing the same regulatory element id defined in lanelet2 map.traffic_light_multi_camera_fusion performs traffic light signal fusion which can be summarized as the following two tasks: Multi-Camera-Fusion: performed on single traffic light signal detected by different cameras. Group-Fusion: performed on traffic light signals within the same group, which means traffic lights sharing the same regulatory element id defined in lanelet2 map.
Multi Camera Fusion
Multi Camera Fusion
crosswalk_traffic_light_estimator is a module that estimates pedestrian traffic signals from HDMap and detected vehicle traffic signals. crosswalk_traffic_light_estimator is a module that estimates pedestrian traffic signals from HDMap and detected vehicle traffic signals.
Crosswalk Traffic Light Estimator
Crosswalk Traffic Light Estimator
Traffic Light States
Traffic Light States
This package receives traffic signals from perception and external (e.g., V2X) components and combines them using either a confidence-based or a external-preference based approach. This package receives traffic signals from perception and external (e.g., V2X) components and combines them using either a confidence-based or a external-preference based approach.
V2X Fusion node
V2X Fusion node
Camera Image
Camera Image
Point Cloud
Point Cloud
Sensing
Sensing
Point Cloud, Camera Image, Radar Object
Point Cloud, Camera Image, Radar Object
Object Recognition
Object Recognition
Occupancy Grid Map
Occupancy Grid Map
Obstacle Points
Obstacle Points
Point Cloud
Point Cloud
Vehicle Odometry
Vehicle Odometry
Localization
Localization
vector mapの情報を用いて,unknown objectをfilterする.lane内のunknown objectのみを残す.vector mapの情報を用いて,unknown objectをfilterする.lane内のunknown objectのみを残す.
Map based Filter
Map based Filter
Detected Objects
Detected Objects
detection同士のassignmentを取り,confidenceが高い方を採用する.overlapしたunknown objectはmergeするdetection同士のassignmentを取り,confidenceが高い方を採用する.overlapしたunknown objectはmergeする
Object Association
 Merger
Object Association...
Object Merger
Object Merger
Interpolator
Interpolator
tracker内部のclusterをマージし,shape fittingしたbboxを出力するtracker内部のclusterをマージし,shape fittingしたbboxを出力する
Detection by
Tracker
Detection by...
Detected Objects
Detected Objects
BBox内に存在するobstacle_segmentation後の点群数を用いて,false positiveを除くBBox内に存在するobstacle_segmentation後の点群数を用いて,false positiveを除く
Map based validator
Map based validator
DNNベースでLiDAR点群に物体のクラス情報を付与するDNNベースでLiDAR点群に物体のクラス情報を付与する
DNN based 3D detector
DNN based 3D detector
LiDAR pipeline
LiDAR pipeline
Detection
Detection
LiDAR clustering
LiDAR clustering
clustering結果に画像のdetection結果をprojectionしてlabelを付与するclustering結果に画像のdetection結果をprojectionしてlabelを付与する
Projection based fusion node
Projection based fusion node
DNNベースで画像に物体のクラス情報を付与するDNNベースで画像に物体のクラス情報を付与する
Camera DNN based 2D detector
Camera DNN based 2D detector
Camera-LiDAR pipeline
Camera-LiDAR pipeline
Radar pipeline
Radar pipeline
This package contains a radar noise filter module for autoware_auto_perception_msgs/msg/DetectedObject. This package can filter the noise objects which cross to the ego vehicle.This package contains a radar noise filter module for autoware_auto_perception_msgs/msg/DetectedObject. This package can filter the noise objects which cross to the ego vehicle.
Radar Filter
Radar Filter
This package can make clustered objects from radar DetectedObjects, the objects which is converted from RadarTracks by radar_tracks_msgs_converter and is processed by noise filter. In other word, this package can combine multiple radar detections from one object into one and adjust class and size.This package can make clustered objects from radar DetectedObjects, the objects which is converted from RadarTracks by radar_tracks_msgs_converter and is processed by noise filter. In other word, this package can combine multiple radar detections from one object into one and adjust class and size.
Radar Object Clustering
Radar Object Clustering
This package try to merge two tracking objects from different sensor.This package try to merge two tracking objects from different sensor.
Tracking Merger
Tracking Merger
This package provides a radar object tracking node that processes sequences of detected objects to assign consistent identities to them and estimate their velocities.This package provides a radar object tracking node that processes sequences of detected objects to assign consistent identities to them and estimate their velocities.
Radar Object Tracker
Radar Object Tracker
クラス+位置+形状情報に対してtrackingを行う。(最近上流が速度情報も出せるようになってきたらしい)クラス+位置+形状情報に対してtrackingを行う。(最近上流が速度情報も出せるようになってきたらしい)
Multi Object Tracker
Multi Object Tracker
Tracking
Tracking
Dynamic Objects
Dynamic Objects
高精度地図情報を用いて、trackingされた動物体情報の移動経路予測を行う高精度地図情報を用いて、trackingされた動物体情報の移動経路予測を行う
Map based Prediction
Map based Prediction
Prediction
Prediction
Perception
Perception
map
map
map
map
Vector Map,
Point Cloud Map
Vector Map,...
Vector Map
Vector Map
V2X interface
V2X interface
Text is not SVG - cannot display
\ No newline at end of file diff --git a/docs/design/autoware-architecture/perception/index.md b/docs/design/autoware-architecture/perception/index.md index 867511f13ff..101d082d15d 100644 --- a/docs/design/autoware-architecture/perception/index.md +++ b/docs/design/autoware-architecture/perception/index.md @@ -1,5 +1,96 @@ -# Perception component design +# Perception Component Design -!!! warning +## Purpose of this document - Under Construction +This document outlines the high-level design strategies, goals and related rationales in the development of the Perception Component. Through this document, it is expected that all OSS developers will comprehend the design philosophy, goals and constraints under which the Perception Component is designed, and participate seamlessly in the development. + +## Overview + +The Perception Component receives inputs from Sensing, Localization, and Map components, and adds semantic information (e.g., Object Recognition, Obstacle Segmentation, Traffic Light Recognition, Occupancy Grid Map), which is then passed on to Planning Component. This component design follows the overarching philosophy of Autoware, defined as the [microautonomy concept](https://autowarefoundation.github.io/autoware-documentation/main/design/autoware-concepts/). + +## Goals and non-goals + +The role of the Perception Component is to recognize the surrounding environment based on the data obtained through Sensing and acquire sufficient information (such as the presence of dynamic objects, stationary obstacles, blind spots, and traffic signal information) to enable autonomous driving. + +In our overall design, we emphasize the concept of [microautonomy architecture](https://autowarefoundation.github.io/autoware-documentation/main/design/autoware-concepts). This term refers to a design approach that focuses on the proper modularization of functions, clear definition of interfaces between these modules, and as a result, high expandability of the system. Given this context, the goal of the Perception Component is set not to solve every conceivable complex use case (although we do aim to support basic ones), but rather to provide a platform that can be customized to the user's needs and can facilitate the development of additional features. + +To clarify the design concepts, the following points are listed as goals and non-goals. + +**Goals:** + +- To provide the basic functions so that a simple ODD can be defined. +- To achieve a design that can provide perception functionality to every autonomous vehicle. +- To be extensible with the third-party components. +- To provide a platform that enables Autoware users to develop the complete functionality and capability. +- To provide a platform that enables Autoware users to develop the autonomous driving system which always outperforms human drivers. +- To provide a platform that enables Autoware users to develop the autonomous driving system achieving "100% accuracy" or "error-free recognition". + +**Non-goals:** + +- To develop the perception component architecture specialized for specific / limited ODDs. +- To achieve the complete functionality and capability. +- To outperform the recognition capability of human drivers. +- To achieve "100% accuracy" or "error-free recognition". + +## High-level architecture + +This diagram describes the high-level architecture of the Perception Component. + +![overall-perception-architecture](image/high-level-perception-diagram.drawio.svg) + +The Perception Component consists of the following sub-components: + +- **Object Recognition**: Recognizes dynamic objects surrounding the ego vehicle in the current frame and predicts their future trajectories. +- **Obstacle Segmentation**: Identifies point clouds originating from obstacles(not only dynamic objects but also static obstacles that should be avoided, such as stationary obstacles) that the ego vehicle should avoid. +- **Occupancy Grid Map**: Detects blind spots (areas where no information is available and where dynamic objects may jump out). +- **Traffic Light Recognition**: Recognizes the colors of traffic lights and the directions of arrow signals. + +## Component interface + +The following describes the input/output concept between Perception Component and other components. See [the Perception Component Interface](../../autoware-interfaces/components/perception.md) page for the current implementation. + +### Input to the Perception Component + +- **From Sensing**: This input should provide real-time information about the environment. + - Camera Image: Image data obtained from the camera. + - Point Cloud: Point Cloud data obtained from LiDAR. + - Radar Object: Object data obtained from radar. +- **From Localization**: This input should provide real-time information about the ego vehicle. + - Vehicle motion information: Includes the ego vehicle's position. +- **From Map**: This input should provide real-time information about the static information about the environment. + - Vector Map: Contains all static information about the environment, including lane aria information. + - Point Cloud Map: Contains static point cloud maps, which should not include information about the dynamic objects. +- **From API**: + - V2X information: The information from V2X modules. For example, the information from traffic signals. + +### Output from the Perception Component + +- **To Planning** + - Dynamic Objects: Provides real-time information about objects that cannot be known in advance, such as pedestrians and other vehicles. + - Obstacle Segmentation: Supplies real-time information about the location of obstacles, which is more primitive than Detected Object. + - Occupancy Grid Map: Offers real-time information about the presence of occluded area information. + - Traffic Light Recognition result: Provides the current state of each traffic light in real time. + +## How to add new modules (WIP) + +As mentioned in the goal session, this perception module is designed to be extensible by third-party components. For specific instructions on how to add new modules and expand its functionality, please refer to the provided documentation or guidelines (WIP). + +## Supported Functions + +| Feature | Description | Requirements | +| ---------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------- | +| LiDAR DNN based 3D detector | This module takes point clouds as input and detects objects such as vehicles, trucks, buses, pedestrians, and bicycles. | - Point Clouds | +| Camera DNN based 2D detector | This module takes camera images as input and detects objects such as vehicles, trucks, buses, pedestrians, and bicycles in the two-dimensional image space. It detects objects within image coordinates and providing 3D coordinate information is not mandatory. | - Camera Images | +| LiDAR Clustering | This module performs clustering of point clouds and shape estimation to achieve object detection without labels. | - Point Clouds | +| Semi-rule based detector | This module detects objects using information from both images and point clouds, and it consists of two components: LiDAR Clustering and Camera DNN based 2D detector. | - Output from Camera DNN based 2D detector and LiDAR Clustering | +| Object Merger | This module integrates results from various detectors. | - Detected Objects | +| Interpolator | This module stabilizes the object detection results by maintaining long-term detection results using Tracking results. | - Detected Objects
- Tracked Objects | +| Tracking | This module gives ID and estimate velocity to the detection results. | - Detected Objects | +| Prediction | This module predicts the future paths (and their probabilities) of dynamic objects according to the shape of the map and the surrounding environment. | - Tracked Objects
- Vector Map | +| Obstacle Segmentation | This module identifies point clouds originating from obstacles that the ego vehicle should avoid. | - Point Clouds
- Point Cloud Map | +| Occupancy Grid Map | This module detects blind spots (areas where no information is available and where dynamic objects may jump out). | - Point Clouds
- Point Cloud Map | +| Traffic Light Recognition | This module detects the position and state of traffic signals. | - Camera Images
- Vector Map | + +## Reference Implementation + +When Autoware is launched, the default parameters are loaded, and the Reference Implementation is started. For more details, please refer to [the Reference Implementation](./reference_implementation.md). diff --git a/docs/design/autoware-architecture/perception/reference_implementation.md b/docs/design/autoware-architecture/perception/reference_implementation.md new file mode 100644 index 00000000000..313471ae162 --- /dev/null +++ b/docs/design/autoware-architecture/perception/reference_implementation.md @@ -0,0 +1,32 @@ +# Perception Component Reference Implementation Design + +## Purpose of this document + +This document outlines detailed design of the reference imprementations. This allows developers and users to understand what is currently available with the Perception Component, how to utilize, expand, or add to its features. + +## Architecture + +This diagram describes the architecture of the reference implementation. + +![overall-perception-architecture](image/reference-implementaion-perception-diagram.drawio.svg) + +The Perception component consists of the following sub-components: + +- **Obstacle Segmentation**: Identifies point clouds originating from obstacles(not only dynamic objects but also static obstacles that should be avoided, such as stationary obstacles) that the ego vehicle should avoid. For example, construction cones are recognized using this module. +- **Occupancy Grid Map**: Detects blind spots (areas where no information is available and where dynamic objects may jump out). +- **Object Recognition**: Recognizes dynamic objects surrounding the ego vehicle in the current frame and predicts their future trajectories. + - **Detection**: Detects the pose and velocity of dynamic objects such as vehicles and pedestrians. + - **Detector**: Triggers object detection processing frame by frame. + - **Interpolator**: Maintains stable object detection. Even if the output from Detector suddenly becomes unavailable, Interpolator uses the output from the Tracking module to maintain the detection results without missing any objects. + - **Tracking**: Associates detected results across multiple frames. + - **Prediction**: Predicts trajectories of dynamic objects. +- **Traffic Light Recognition**: Recognizes the colors of traffic lights and the directions of arrow signals. + +### Internal interface in the perception component + +- **Obstacle Segmentation to Object Recognition** + - Point Cloud: A Point Cloud observed in the current frame, where the ground and outliers are removed. +- **Obstacle Segmentation to Occupancy Grid Map** + - Ground filtered Point Cloud: A Point Cloud observed in the current frame, where the ground is removed. +- **Occupancy Grid Map to Obstacle Segmentation** + - Occupancy Grid Map: This is used for filtering outlier. diff --git a/docs/design/autoware-interfaces/components/perception.md b/docs/design/autoware-interfaces/components/perception.md new file mode 100644 index 00000000000..a90ccd1142a --- /dev/null +++ b/docs/design/autoware-interfaces/components/perception.md @@ -0,0 +1,49 @@ +# Perception + +This page provides specific specifications about the Interface of the Perception Component. Please refer to [the perception architecture reference implementation design document](../../autoware-architecture/perception/reference_implementation.md) for concepts and data flow. + +## Input + +### From Map Component + +| Name | Topic / Service | Type | Description | +| --------------- | ----------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------------------------------------------- | +| Vector Map | `/map/vector_map` | [autoware_auto_mapping_msgs/msg/HADMapBin](https://github.com/tier4/autoware_auto_msgs/blob/tier4/main/autoware_auto_mapping_msgs/msg/HADMapBin.idl) | HD Map including the information about lanes | +| Point Cloud Map | `/service/get_differential_pcd_map` | [autoware_map_msgs/srv/GetDifferentialPointCloudMap](https://github.com/autowarefoundation/autoware_msgs/blob/main/autoware_map_msgs/srv/GetDifferentialPointCloudMap.srv) | Point Cloud Map | + +Notes: + +- Point Cloud Map + - input can be both topic or service, but we highly recommend to use service because since this interface enables processing without being constrained by map file size limits. + +### From Sensing Component + +| Name | Topic | Type | Description | +| ------------ | ------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------- | +| Camera Image | `/sensing/camera/camera*/image_rect_color` | [sensor_msgs/Image](https://github.com/ros2/common_interfaces/blob/humble/sensor_msgs/msg/Image.msg) | Camera image data, processed with Lens Distortion Correction (LDC) | +| Camera Image | `/sensing/camera/camera*/image_raw` | [sensor_msgs/Image](https://github.com/ros2/common_interfaces/blob/humble/sensor_msgs/msg/Image.msg) | Camera image data, not processed with Lens Distortion Correction (LDC) | +| Point Cloud | `/sensing/lidar/concatenated/pointcloud` | [sensor_msgs/PointCloud2](https://github.com/ros2/common_interfaces/blob/humble/sensor_msgs/msg/PointCloud2.msg) | Concatenated point cloud from multiple LiDAR sources | +| Radar Object | `/sensing/radar/detected_objects` | [autoware_auto_perception_msgs/msg/DetectedObject](https://gitlab.com/autowarefoundation/autoware.auto/autoware_auto_msgs/-/blob/master/autoware_auto_perception_msgs/msg/DetectedObject.idl) | Radar objects | + +### From Localization Component + +| Name | Topic | Type | Description | +| ---------------- | ------------------------------- | -------------------------------------------------------------------------------------------------------- | -------------------------- | +| Vehicle Odometry | `/localization/kinematic_state` | [nav_msgs/msg/Odometry](https://github.com/ros2/common_interfaces/blob/humble/nav_msgs/msg/Odometry.msg) | Ego vehicle odometry topic | + +### From API + +| Name | Topic | Type | Description | +| ------------------------ | --------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------- | +| External Traffic Signals | `/external/traffic_signals` | [autoware_perception_msgs::msg::TrafficSignalArray](https://github.com/autowarefoundation/autoware_msgs/blob/main/autoware_perception_msgs/msg/TrafficSignalArray.msg) | The traffic signals from an external system | + +## Output + +### To Planning + +| Name | Topic | Type | Description | +| ------------------ | ------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------- | +| Dynamic Objects | `/perception/object_recognition/objects` | [autoware_auto_perception_msgs/msg/PredictedObjects](https://github.com/tier4/autoware_auto_msgs/blob/tier4/main/autoware_auto_perception_msgs/msg/PredictedObjects.idl) | Set of dynamic objects with information such as a object class and a shape of the objects | +| Obstacles | `/perception/obstacle_segmentation/pointcloud` | [sensor_msgs/PointCloud2](https://github.com/ros2/common_interfaces/blob/humble/sensor_msgs/msg/PointCloud2.msg) | Obstacles, which includes dynamic objects and static objetcs | +| Occupancy Grid Map | `/perception/occupancy_grid_map/map` | [nav_msgs/msg/OccupancyGrid](https://docs.ros.org/en/latest/api/nav_msgs/html/msg/OccupancyGrid.html) | The map with the imformation about the presence of obstacles and blind spot | +| Traffic Signal | `/perception/traffic_light_recognition/traffic_signals` | [autoware_perception_msgs::msg::TrafficSignalArray](https://github.com/autowarefoundation/autoware_msgs/blob/main/autoware_perception_msgs/msg/TrafficSignalArray.msg) | The traffic signal information such as a color (green, yellow, read) and an arrow (right, left, straight) |