Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CudaImage allocate error in my project (segmentation fault) #84

Open
richard-elvira opened this issue Sep 1, 2022 · 1 comment
Open

Comments

@richard-elvira
Copy link

richard-elvira commented Sep 1, 2022

Hi, i am working with this library, and I am already running the examples without problem.

My main problem is to integrate CudaSift in my project, I have add the CudaSIFT code and run well, but problem comes when I try to allocate an image in the CudaImage container where system launch a segmentation fault inside of allocate function.

I send you the code to see if there is something I am missing:

InitCuda(0); // Initialize with the device 0 (from all the devices with CUDA)

if(img.empty())
{
    std::cerr << "Empty image in CudaSiftHandler::extractCudaSift" << std::endl;
}

cv::Mat img_32f;
img.convertTo(img_32f, CV_32FC1, 1/255.0);

unsigned int w = img_32f.cols;
unsigned int h = img_32f.rows;
std::cout << "img_32f cols: " << w << "; rows: " << h << "; channels: " << img_32f.channels() << std::endl;
std::cout << "img_32f type: " << img_32f.type() << std::endl;

mCudaImg.Allocate(w, h, w, false, static_cast<float*>(NULL), (float*)img_32f.data);
mCudaImg.Download();

InitSiftData(mSiftDataImgExt, mnMaxFeatures, true, true);

ExtractSift(mSiftDataImgExt, mCudaImg, mnNumOctaves, mfInitBlur, mfThreshold, mnMinScale, false);
std::cout << "There are " << mSiftDataImgExt.numPts << " sift points detected by GPU" << std::endl;

The output of my code before of the segmentation fault shows that InitCuda has initialized the graphic card correctly, the image is not empty and has the correct format (float):

Device Number: 0
Device name: NVIDIA TITAN Xp
Memory Clock Rate (MHz): 5705
Memory Bus Width (bits): 384
Peak Memory Bandwidth (GB/s): 547.7

img_32f cols: 1440; rows: 1080; channels: 1
img_32f type: 5

Disassembler (at the beginning of Allocate code, in line 5):

0x5555555644f0                  f3 0f 1e fa                       endbr64
0x5555555644f4  <+    4>        53                                push   %rbx
0x5555555644f5  <+    5>        48 89 fb                          mov    %rdi,%rbx
0x5555555644f8  <+    8>        48 83 ec 10                       sub    $0x10,%rsp
0x5555555644fc  <+   12>        89 37                             mov    %esi,(%rdi)   <---- This arise the error.
0x5555555644fe  <+   14>        89 57 04                          mov    %edx,0x4(%rdi)
0x555555564501  <+   17>        c5 fa 7e 4c 24 20                 vmovq  0x20(%rsp),%xmm1
0x555555564507  <+   23>        c4 c3 f1 22 c1 01                 vpinsrq $0x1,%r9,%xmm1,%xmm0
0x55555556450d  <+   29>        89 4f 08                          mov    %ecx,0x8(%rdi)
0x555555564510  <+   32>        48 c7 47 20 00 00 00 00           movq   $0x0,0x20(%rdi)
0x555555564518  <+   40>        c5 f8 11 47 10                    vmovups %xmm0,0x10(%rdi)
0x55555556451d  <+   45>        4d 85 c9                          test   %r9,%r9
0x555555564520  <+   48>        74 4e                             je     0x555555564570 <_ZN9CudaImage8AllocateEiiibPfS0_+128>
0x555555564522  <+   50>        48 83 7c 24 20 00                 cmpq   $0x0,0x20(%rsp)
0x555555564528  <+   56>        75 05                             jne    0x55555556452f <_ZN9CudaImage8AllocateEiiibPfS0_+63>
0x55555556452a  <+   58>        45 84 c0                          test   %r8b,%r8b
0x55555556452d  <+   61>        75 11                             jne    0x555555564540 <_ZN9CudaImage8AllocateEiiibPfS0_+80>
0x55555556452f  <+   63>        48 83 c4 10                       add    $0x10,%rsp
0x555555564533  <+   67>        5b                                pop    %rbx
0x555555564534  <+   68>        c3                                retq

Thanks for the help

@richard-elvira
Copy link
Author

richard-elvira commented Sep 2, 2022

I found a problem in image values, it appears that my float image have values between 0 and 1, but CudaImage expect a float image with values between 0 and 255, if I change :
img.convertTo(img_32f, CV_32FC1, 1/255.0);
to
img.convertTo(img_32f, CV_32FC1);

It is able to allocate the image, but in the download function a new error arise. The weird thing is, if I put the same code, with the same image in the main function of the launcher it works well, but in the handler that I prepared to do it, it doesn't work.

In the main function prints this results:

img_32f cols: 720; rows: 540; channels: 1
img_32f type: 5 // CV_32F
Frame: min val: 16 // Minimum value on the image
Frame: max val: 254 // Maximun value on the image
mCudaImg width: 720; height: 540; pitch: 768
SIFT extraction time =        0.60 ms 3770
Incl prefiltering & memcpy =  2.38 ms 3770

There are 3770 sift points detected by GPU

But, if the same code is in the class which handle CudaSift it raises a segmentation fault inside of download fuction, this is the output before the error:

img_32f cols: 720; rows: 540; channels: 1
img_32f type: 5
min val: 16
max val: 254
mCudaImg width: 720; height: 540; pitch: 768

And this is the Disassembler:

0x7ffff7e698eb  <+ 1051>        4c 89 f7                          mov    %r14,%rdi
0x7ffff7e698ee  <+ 1054>        e8 3d a6 de ff                    callq  0x7ffff7c53f30 <_ZNSolsEi@plt>
0x7ffff7e698f3  <+ 1059>        ba 09 00 00 00                    mov    $0x9,%edx
0x7ffff7e698f8  <+ 1064>        48 8d 35 ef 09 07 00              lea    0x709ef(%rip),%rsi        # 0x7ffff7eda2ee
0x7ffff7e698ff  <+ 1071>        48 89 c7                          mov    %rax,%rdi
0x7ffff7e69902  <+ 1074>        49 89 c6                          mov    %rax,%r14
0x7ffff7e69905  <+ 1077>        e8 66 ea de ff                    callq  0x7ffff7c58370 <_ZSt16__ostream_insertIcSt11char_traitsIcEERSt13basic_ostreamIT_T0_ES6_PKS3_l@plt>
0x7ffff7e6990a  <+ 1082>        8b b5 38 ff ff ff                 mov    -0xc8(%rbp),%esi
0x7ffff7e69910  <+ 1088>        4c 89 f7                          mov    %r14,%rdi
0x7ffff7e69913  <+ 1091>        e8 18 a6 de ff                    callq  0x7ffff7c53f30 <_ZNSolsEi@plt>
0x7ffff7e69918  <+ 1096>        49 89 c6                          mov    %rax,%r14
0x7ffff7e6991b  <+ 1099>        48 8b 00                          mov    (%rax),%rax
0x7ffff7e6991e  <+ 1102>        48 8b 40 e8                       mov    -0x18(%rax),%rax
0x7ffff7e69922  <+ 1106>        4d 8b bc 06 f0 00 00 00           mov    0xf0(%r14,%rax,1),%r15
0x7ffff7e6992a  <+ 1114>        4d 85 ff                          test   %r15,%r15
0x7ffff7e6992d  <+ 1117>        0f 84 b4 02 00 00                 je     0x7ffff7e69be7 <_ZN9ORB_SLAM315CudaSiftHandler15extractCudaSIFTERN2cv3MatE+1815>
0x7ffff7e69933  <+ 1123>        41 80 7f 38 00                    cmpb   $0x0,0x38(%r15)
0x7ffff7e69938  <+ 1128>        0f 84 b2 01 00 00                 je     0x7ffff7e69af0 <_ZN9ORB_SLAM315CudaSiftHandler15extractCudaSIFTERN2cv3MatE+1568>
0x7ffff7e6993e  <+ 1134>        41 0f be 77 43                    movsbl 0x43(%r15),%esi
0x7ffff7e69943  <+ 1139>        4c 89 f7                          mov    %r14,%rdi
0x7ffff7e69946  <+ 1142>        e8 25 b9 de ff                    callq  0x7ffff7c55270 <_ZNSo3putEc@plt>
0x7ffff7e6994b  <+ 1147>        48 89 c7                          mov    %rax,%rdi
0x7ffff7e6994e  <+ 1150>        e8 1d b4 de ff                    callq  0x7ffff7c54d70 <_ZNSo5flushEv@plt>
0x7ffff7e69953  <+ 1155>        4c 89 e7                          mov    %r12,%rdi
0x7ffff7e69956  <+ 1158>        e8 75 02 df ff                    callq  0x7ffff7c59bd0 <_ZN9CudaImage8DownloadEv@plt>
0x7ffff7e6995b  <+ 1163>        c5 fa 10 4b 08                    vmovss 0x8(%rbx),%xmm1    <------- In this instruction it breaks.
0x7ffff7e69960  <+ 1168>        c5 e8 57 d2                       vxorps %xmm2,%xmm2,%xmm2
0x7ffff7e69964  <+ 1172>        8b 53 04                          mov    0x4(%rbx),%edx
0x7ffff7e69967  <+ 1175>        48 8d 7b 18                       lea    0x18(%rbx),%rdi
0x7ffff7e6996b  <+ 1179>        c5 ea 5a 03                       vcvtss2sd (%rbx),%xmm2,%xmm0
0x7ffff7e6996f  <+ 1183>        45 31 c0                          xor    %r8d,%r8d
0x7ffff7e69972  <+ 1186>        31 c9                             xor    %ecx,%ecx
0x7ffff7e69974  <+ 1188>        4c 89 e6                          mov    %r12,%rsi
0x7ffff7e69977  <+ 1191>        c5 ea 2a 53 10                    vcvtsi2ssl 0x10(%rbx),%xmm2,%xmm2
0x7ffff7e6997c  <+ 1196>        e8 0f e9 de ff                    callq  0x7ffff7c58290 <_Z11ExtractSiftR8SiftDataR9CudaImageidffbPf@plt>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant