Estimating the relative motion between camera poses using depth map and matched keypoints. This kernel is using RANSAC-EPnP (RANdom SAmple Consensus, Efficient Perspective-n-Point) and Iterative PnP (Direct Linear Transform & Levenberg–Marquardt).
motion-estimation
├── c-src
├── hls-src
│ ├── underbaseline
│ ├── baseline
│ └── optimized
├── program
│ ├── host
│ ├── kernel
│ └── testdata
└── doc
cd
to directory program.
cd ./motion-estimation/program
make
the host executable program & xclbin files using Makefile.
make all TARGET=hw
- As long as the host program & kernel are compiled, you may test the functionality of the kernel, using test data.
Run the program using
make run TARGET=hw
- (option) You may modify the arguements by args.mk
# | Arguments | Type | Size (number of items) | input/ouput |
---|---|---|---|---|
0 | matches | MATCH | MAX_KEYPOINT_NUM | input |
1 | match_num | int | 1 | input |
2 | kp0 | IPOINT | MAX_KEYPOINT_NUM | input |
3 | kp1 | IPOINT | MAX_KEYPOINT_NUM | input |
4 | fx | float | 1 | input |
5 | fy | float | 1 | input |
6 | cx | float | 1 | input |
7 | cy | float | 1 | input |
8 | depth | float | IMG_WIDTH * IMG_HEIGHT | input |
9 | threshold | int | 1 | input |
10 | confidence | float | 1 | input |
11 | maxiter | int | 1 | input |
12 | rmat | float | 9 | input/output |
13 | tvec | float | 3 | input/output |
14 | take_last | bool | 1 | input |
-
matches
An array of matched indices of kp0 and kp1 from Feature Matching. The maximum buffer size is MAX_KEYPOINT_NUM. -
match_num
Number of matched keypoint sets. -
kp0, kp1
Keypoint 0, Keypoint 1 from Feature Extraction. -
fx, fy, cx, cy
Focal length, optical center from intrinsic matrix.
fx = K_left[0][0], fy = K_left[1][1], cx = K_left[0][2], cy = K_left[1][2]
. -
depth
Depth map from Stereo Matching. -
threshold
Parameter for RANSAC. Distance (in pixel) to determine whether the projected 2D point is outlier. -
confidence
Parameter for RANSAC. To determine whether the number of inlier is sufficient. -
maxiter
Parameter for RANSAC. The maximum number of iteration to operate RANSAC. -
rmat, tvec
Outcome rotation matrix and translation vector. -
take_last
To determine whether rmat and tvec are taken as inputs to act as initial values of gradient descent.
-
MAX_KEYPOINT_NUM
#define MAX_KEYPOINT_NUM 500
-
MATCH
struct MATCH { int a; // index of kp0 int b; // index of kp1 };
-
IPOINT
struct IPOINT { float x; // x coordinate of 2D point float y; // y coordinate of 2D point };
The brief introduction can be found in the slide. You may also check out the video.
-
depth map from Stereo Matching
matched 2D keypoints of 1st and 2nd left images
camera intrinsic matrix
Now we are looking for Rotation matrix rmat and translation vector tvec to estimate the relative motion between 1st and 2nd images.
void estimate(Matrix &match, Matrix &kp0, Matrix &kp1, Matrix &k, Matrix &depth, Matrix &rmat, Matrix &tvec);
-
Before doing the math part, we need to modify the input to match the requirement. Because we're using PnP, which solves relative motion cv problem by 2D-3D points. As a result, we project 1st left image's keypoints to 3D point through depth map and camera matrix. Also we align 2 sets of keypoints through matched index.
Extracting and modifying source code from OpenCV, the pure C/C++ code implement most of the function solvePnPRansac. It's runnable for CPU, but may not be able to synthesis due to dynamic allocated memory and some other reasons.
- underbaseline
The codes are synthesizable are and able to do co-simulation, while no optimization is done. In fact, these codes are outcome when taking co-simulation as priority. The utilization may be too high to implement on the board.