Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfaults during simulation #4

Open
hiverniaa opened this issue Aug 25, 2024 · 11 comments
Open

Segfaults during simulation #4

hiverniaa opened this issue Aug 25, 2024 · 11 comments

Comments

@hiverniaa
Copy link

Hi!

I am trying to run the examples bundled with your project. I'm only trying to run the non doublebuffer versions of the scripts in order to keep the number of variables as low as possible.
While the demo, MiniLM and SenteceBERT experiments worked great, I'm encountering segfaults while running CamemBERT and VIT. More precisely, these segfaults seem to occur during the simulation step, as suggested by their order of appearance in the terminal. Here is an example of a crash I got running the CamemBERT experiment in the 4x4 configuration (I was using time hoping to get more insights about what caused the crash, the behavior is the same without it) :

[...]
CGRA configurations -- size: 4x4, support double buffer: false, evaluate baseline: false.
Segmentation fault (core dumped)
Command exited with non-zero status 139
        Command being timed: "sh script4x4.sh"
        User time (seconds): 5304.44
        System time (seconds): 232.21
        Percent of CPU this job got: 99%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 1:32:29
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 39042732
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 38
        Minor (reclaiming a frame) page faults: 190616510
        Voluntary context switches: 18683
        Involuntary context switches: 14252
        Swaps: 0
        File system inputs: 1339056
        File system outputs: 14989160
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 139

For context, I am running mlir-cgra inside the Docker image you published, after executing a git pull from inside the container's copy of the project. I encountered the same minor issues as #3 and followed the same steps as described there.

Here are a few things I already tried and that did not solve this issue :

  • trying the 8x8 versions of the scripts (the scripts that crash in the 4x4 and 8x8 versions are the same)
  • running on a beefier machine (in case my own hardware wasn't enough, but the behavior was the same)
  • running each step manually, one after the other (in case some memory was leaking and not freed properly between the steps)
  • re-running manually only the simulation step at the end of the flow

Things I haven't tried yet :

  • running the project locally instead of inside a container

Any input about this would be really much appreciated !

Thanks in advance,
Leo.

@tancheng
Copy link
Owner

Sure, let me take a look.

@tancheng
Copy link
Owner

tancheng commented Sep 3, 2024

Hi, related to #3, I think # export LD_LIBRARY_PATH=../../../../llvm-project/build/lib/ might help.

  • For your crash, after your step-by-step run, where does it crash? Does the compilation succeed with all the intermediate files generated? Then the simulation just crashes immediately after it launches? If this is the case, you can add some dump/print in the sim-/runtime-/func-related functions to see which model operator triggers that.
  • If the simulation successfully runs for a while, then it crashes, maybe the issue is related to the limited memory budget or docker issue, so you can try it on your native machine.

@hiverniaa
Copy link
Author

Hi !

Answering your questions :

  • it crashes while the executable simulate4 is running. It does not crash right away, instead the simulation can last a few tens of seconds before being killed. See details below.
  • Adding prints enables me to monitor some variables, but I found it still hard to pinpoint the origin of the error using this method, so I turned to lldb.
  • I can confirm this problem is not memory budget-related, as I finally got around installing mlir-cgra locally on a Debian 12 machine.

Here are some new elements I got after testing mlir-cgra on my local install :

  • the crash is caused by a segmentation violation, after trying to access an invalid address (for example 0x1b during the simulation of CamemBERT 4x4, 0x1080 for VIT 4x4)
  • it always happens during a call to forward in 09-host-tailored-4.mlir around line 50k, see below.

I still have a few questions :

  • What kind of environment are you currently testing on ? What are your machine's specs ?
  • Could you please confirm that the faulty experiments (CamemBERT and VIT) work on your side ?
  • When printing from the MemRef class, the allocated attribute takes the value 0xdeadbeef on multiple occasions, both on working and crashing experiments. Is this an intended behavior ?
  • Are there specific variables you would like me to print ?

Traces of the CamemBERT 4x4 simulation with lldb and the added prints in the MemRef constructor :

~/mlir-cgra/experiments/CamemBERT/cgra$ clang++-12 main.cpp 11-model-4.o 12-accel-4.o /home/lpajot/llvm-project-mlir/build/tools/mlir/lib/ExecutionEngine/CMakeFiles/mlir_c_runner_utils.dir/CRunnerUtils.cpp.o -L/home/lpajot/llvm-project-mlir/build/lib/libmlir_c_runner_utils.so CustomizedRuntime-4.cpp -I../../../sim/ ../../../sim/*.cpp -o simulate4 -g
~/mlir-cgra/experiments/CamemBERT/cgra$ lldb ./simulate4 4 false false
[...]
New MemRef :
        Allocated : 0x1492edb0
        Aligned   : 0x1492ee00
        Offset    : 3764
        Dim       : 3
New MemRef :
        Allocated : 0x14934950
        Aligned   : 0x14934980
        Offset    : 24
        Dim       : 3
New MemRef :
        Allocated : 0x14934950
        Aligned   : 0x14934980
        Offset    : 0
        Dim       : 3
New MemRef :
        Allocated : 0xdeadbeef
        Aligned   : 0x4f9110
        Offset    : 0
        Dim       : 1
New MemRef :
        Allocated : 0x14934830
        Aligned   : 0x14934880
        Offset    : 0
        Dim       : 3
Process 912422 stopped
* thread #1, name = 'simulate4', stop reason = signal SIGSEGV: invalid address (fault address: 0x1b)
    frame #0: 0x00007ffff7b8c5c3 libc.so.6`__memmove_avx512_unaligned_erms + 195
libc.so.6`__memmove_avx512_unaligned_erms:
->  0x7ffff7b8c5c3 <+195>: vmovdqu64 %zmm16, (%rdi)
    0x7ffff7b8c5c9 <+201>: vmovdqu64 %zmm17, 0x40(%rdi)
    0x7ffff7b8c5d0 <+208>: vmovdqu64 %zmm18, -0x40(%rdi,%rdx)
    0x7ffff7b8c5d8 <+216>: vmovdqu64 %zmm19, -0x80(%rdi,%rdx)
(lldb) thread backtrace all
* thread #1, name = 'simulate4', stop reason = signal SIGSEGV: invalid address (fault address: 0x1b)
  * frame #0: 0x00007ffff7b8c5c3 libc.so.6`__memmove_avx512_unaligned_erms + 195
    frame #1: 0x000000000041d506 simulate4`forward at 09-host-tailored-4.mlir:55059:5
    frame #2: 0x0000000000402aa7 simulate4`main(argc=4, argv=0x00007fffffffdea8) at main.cpp:59:3
    frame #3: 0x00007ffff7a4624a libc.so.6`__libc_start_call_main + 122
    frame #4: 0x00007ffff7a46305 libc.so.6`__libc_start_main@@GLIBC_2.34 + 133
    frame #5: 0x0000000000402551 simulate4`_start + 33
(lldb) frame select 1
frame #1: 0x000000000041d506 simulate4`forward at 09-host-tailored-4.mlir:55059:5
   55056            %48303 = llvm.extractvalue %19[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<3 x i64>, array<3 x i64>)> 
   55057            %48304 = llvm.getelementptr %48302[%48303] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
   55058            %48305 = llvm.mlir.constant(false) : i1
-> 55059            "llvm.intr.memcpy"(%48304, %48301, %48300, %48305) : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i1) -> ()
   55060            llvm.return
   55061          }
   55062        }
(lldb) frame select 2
frame #2: 0x0000000000402aa7 simulate4`main(argc=4, argv=0x00007fffffffdea8) at main.cpp:59:3
   56       cout<<"false."<<endl;
   57     }
   58  
-> 59     forward(a, a, 1, 7, 768, 768, 1, b, b, 1, 7, 5, 5, 1);
   60  
   61     cout<<"Check result: "<<endl;
   62     for (int i=0; i<7*5; ++i) {
(lldb) frame variable
(int) argc = 4
(char **) argv = 0x00007fffffffdea8
(int64_t *) a = 0x0000000014923eb0
(float *) b = 0x000000001492e6c0
(int) dim = 4
(std::string) isDoubleBuffered = error: summary string parsing error
(std::string) runAsBaseline = error: summary string parsing error

Traces of the VIT 4x4 simulation with lldb and the added prints in the MemRef constructor :

~/mlir-cgra/experiments/VIT/cgra$ clang++-12 main.cpp 11-model-4.o 12-accel-4.o /home/lpajot/llvm-project-mlir/build/tools/mlir/lib/ExecutionEngine/CMakeFiles/mlir_c_runner_utils.dir/CRunnerUtils.cpp.o -L/home/lpajot/llvm-project-mlir/build/lib/libmlir_c_runner_utils.so CustomizedRuntime-4.cpp -I../../../sim/ ../../../sim/*.cpp -o simulate4 -g
~/mlir-cgra/experiments/VIT/cgra$ lldb ./simulate4 4 false false
[...]
New MemRef :
        Allocated : 0x14ca8270
        Aligned   : 0x14ca8280
        Offset    : 151280
        Dim       : 3
New MemRef :
        Allocated : 0x14f96640
        Aligned   : 0x14f96680
        Offset    : 752996
        Dim       : 3
New MemRef :
        Allocated : 0x14dfc5b0
        Aligned   : 0x14dfc600
        Offset    : 196996
        Dim       : 3
New MemRef :
        Allocated : 0x14dfc5b0
        Aligned   : 0x14dfc600
        Offset    : 0
        Dim       : 3
New MemRef :
        Allocated : 0xdeadbeef
        Aligned   : 0x4fc900
        Offset    : 0
        Dim       : 1
New MemRef :
        Allocated : 0x14ebcc60
        Aligned   : 0x14ebcc80
        Offset    : 0
        Dim       : 3
Process 913281 stopped
* thread #1, name = 'simulate4', stop reason = signal SIGSEGV: invalid address (fault address: 0x1080)
    frame #0: 0x00007ffff7b8c875 libc.so.6`__memmove_avx512_unaligned_erms + 885
libc.so.6`__memmove_avx512_unaligned_erms:
->  0x7ffff7b8c875 <+885>: rep    movsb (%rsi), %es:(%rdi)
    0x7ffff7b8c877 <+887>: vmovdqu64 %zmm16, (%r8)
    0x7ffff7b8c87d <+893>: retq   
    0x7ffff7b8c87e <+894>: nop    
(lldb) thread backtrace all
* thread #1, name = 'simulate4', stop reason = signal SIGSEGV: invalid address (fault address: 0x1080)
  * frame #0: 0x00007ffff7b8c875 libc.so.6`__memmove_avx512_unaligned_erms + 885
    frame #1: 0x000000000041e095 simulate4`forward at 09-host-tailored-4.mlir:54729:5
    frame #2: 0x0000000000402aa7 simulate4`main(argc=4, argv=0x00007fffffffe0e8) at main.cpp:51:3
    frame #3: 0x00007ffff7a4624a libc.so.6`__libc_start_call_main + 122
    frame #4: 0x00007ffff7a46305 libc.so.6`__libc_start_main@@GLIBC_2.34 + 133
    frame #5: 0x0000000000402551 simulate4`_start + 33
(lldb) frame select 1
frame #1: 0x000000000041e095 simulate4`forward at 09-host-tailored-4.mlir:54729:5
   54726            %47984 = llvm.extractvalue %19[2] : !llvm.struct<(ptr<f32>, ptr<f32>, i64, array<3 x i64>, array<3 x i64>)> 
   54727            %47985 = llvm.getelementptr %47983[%47984] : (!llvm.ptr<f32>, i64) -> !llvm.ptr<f32>
   54728            %47986 = llvm.mlir.constant(false) : i1
-> 54729            "llvm.intr.memcpy"(%47985, %47982, %47981, %47986) : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i1) -> ()
   54730            llvm.return
   54731          }
   54732        }
(lldb) frame select 2
frame #2: 0x0000000000402aa7 simulate4`main(argc=4, argv=0x00007fffffffe0e8) at main.cpp:51:3
   48       cout<<"false."<<endl;
   49     }
   50  
-> 51     forward(a, a, 1, 197, 768, 768, 1, b, b, 1, 197, 1000, 1000, 1);
   52  
   53     cout<<"Check result: "<<endl;
   54     for (int i=0; i<197*1000; ++i) {
(lldb) frame variable
(int) argc = 4
(char **) argv = 0x00007fffffffe0e8
(int64_t *) a = 0x00007ffff78f7010
(float *) b = 0x00007ffff7836010
(int) dim = 4
(std::string) isDoubleBuffered = error: summary string parsing error
(std::string) runAsBaseline = error: summary string parsing error

@tancheng
Copy link
Owner

Hi @Yiran-ASU, can you please help on this issue? This looks like a real bug and please try to reproduce it on your side.

@Yiran-ASU
Copy link
Collaborator

Yiran-ASU commented Sep 28, 2024 via email

@Yiran-ASU
Copy link
Collaborator

Hello Dr. Tan @tancheng , I also got the same segmentation fault when running sh script4x4.sh in CamemBert:

CGRA configurations -- size: 4x4, support double buffer: false, evaluate baseline: false.

Segmentation fault (core dumped)

It also happens when running ./simulate4 4 false false

@tancheng
Copy link
Owner

Cool, can you help triage a little bit?

@Yiran-ASU
Copy link
Collaborator

yes, I am working on it

@Yiran-ASU
Copy link
Collaborator

Yiran-ASU commented Sep 28, 2024

Hello Dr. Tan @tancheng, I used gdb as the debug tool to locate the segmentation fault, and got this information:

Starting program: /mlir-cgra/experiments/CamemBERT/cgra/simulate4 4 false false

warning: Error disabling address space randomization: Operation not permitted

CGRA configurations -- size: 4x4, support double buffer: false, evaluate baseline: false.

Program received signal SIGSEGV, Segmentation fault.

__memmove_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:369

Then I used td command in gdb, and get a more detailed location:

#0 __memmove_avx512_unaligned_erms () at ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:369

#1 0x000000000041d586 in forward () at /mlir-cgra/experiments/CamemBERT/cgra/09-host-tailored-4.mlir:55059

#2 0x0000000000402b27 in main ()

Seems the segmentation fault is caused by forward in 09-host-tailored-4.mlir. I am going to look into it, and the script that generates it removeRedundantDeclares.py

Ah, the line 55059 in 09-host-tailored-4.mlir is: "llvm.intr.memcpy"(%48304, %48301, %48300, %48305) : (!llvm.ptr<f32>, !llvm.ptr<f32>, i64, i1) -> (), related to memory copy.

image

@Yiran-ASU
Copy link
Collaborator

Yiran-ASU commented Oct 9, 2024

Hello, I found the segmentation fault may be caused by the input shape of forward function a in main.cpp under cgra folder. The input of CamemBERT model should be a 3D tensor, with shape [1, 7, 768], in model/CamemBERT.py. The input for forward function used in the main.cpp is a pointer points to a 1D array, with shape [1, 7x768].

An easier solution is to assume the input of CamemBERT model under model folder is in shape [1, 7x768], and reshape it in the forward function in the same file:
Reshape input into [1, 7x768]:
image
Reshape back to [1, 7, 768] in the forward of CamemBERT.py:
image

This way, after compilation, the simulation can work.

@tancheng
Copy link
Owner

tancheng commented Oct 9, 2024

Thanks Yiran, plz make PR for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants