Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pathfinder crashing my game #141

Open
masonasons opened this issue Nov 17, 2024 · 18 comments
Open

Pathfinder crashing my game #141

masonasons opened this issue Nov 17, 2024 · 18 comments

Comments

@masonasons
Copy link
Contributor

The pathfinder object is crashing my game, pretty reliably now. This started when the commit that fixes a pathfinder memory leak was committed, as far as I can see. I have no further info than this, because it crashes with no error.

@samtupy
Copy link
Owner

samtupy commented Nov 18, 2024

Hi,

Could you please post a minimal reproducible example so I can take a look?

Thanks.

@masonasons
Copy link
Contributor Author

OK, I've genuinely been trying to get this to reproduce in a standalone test for the past half an hour and it just, won't crash. Yet it does crash in Shooter, and if I comment out the pathfinding code in Shooter (It has fallback if style pathfinding code it can use) then it doesn't crash. So ... I'm genuinely at a loss.

@ivansoto0
Copy link
Contributor

Could the path finder hang during your callback and then result in a hardcrash? Perhaps that's why it doesn't crash in a test program, but does with shooter

@samtupy
Copy link
Owner

samtupy commented Nov 18, 2024

One thing that sometimes works is to copy the game that can crash it into a new folder, then just start ripping the copy apart. Tear out system after system until you either have a minimal example or you've identified the gimmick that crashes it. Not sure how viable it is for your project, but it's an idea if you hadn't yet tried such a thing.

@masonasons
Copy link
Contributor Author

Yep, that's what I've been doing. Finally got it to crash. Definitely takes a lot longer in the test to crash, but after waiting about 5-10 minutes, it does crash. Here it is.
pathfinder.zip

@ethindp
Copy link
Contributor

ethindp commented Nov 18, 2024

I can't seem to reproduce this issue... I've replaced the wait call with refresh_window to allow the code to run as quickly as possible and I'm at almost 6.5 million finds and still nothing. But someone else may be able to.

@masonasons
Copy link
Contributor Author

Dammit, yeah, I tried to crash it again and it wouldn't reproduce. Urg.

@samtupy
Copy link
Owner

samtupy commented Nov 18, 2024

If you are able to build NVGT from source and want to try capturing the traceback yourself from your game that you said crashes much more frequently, the steps are the following:

  1. Do scons -s debug=1 install, this will insure a good .pdb gets generated.
  2. Find the x64 version of cdb.exe somewhere, this might have already been installed with your visual studio build tools. If you can't find it, this should help
  3. run cdb.exe C:\nvgt\nvgt.exe -- c:\path\to\game.nvgt. When the debugger window pops up, type g and press enter. This will start the game.
  4. When the crash happens, you might see NVDA lock up as the window handling halts. Navigate to the cdb terminal window. First type .lines to make sure line numbers are displayed, then type k to get a traceback. Afterwhich you can copy the terminal output to your clipboard and type q to quit cdb.

@ethindp
Copy link
Contributor

ethindp commented Nov 18, 2024

Yeah, I just let it run until it nearly got to 17 million finds and almost 100 million vectors and nothing. So it could very well be one of those bugs. Ugh.

@masonasons
Copy link
Contributor Author

Sure! Now that it's the next day, I shall do this.

@masonasons
Copy link
Contributor Author

Oh, well I guess that's not happening because the driver kit won't install the debug tools and the windows SDK fails to install them. Why is literally everything conspiring against me here! Lmao

@samtupy
Copy link
Owner

samtupy commented Nov 18, 2024

It's probably a bit old, but here is a standalone folder of them. https://www.dropbox.com/scl/fi/0e6xziql4to6opsfgheuq/Debuggers.7z?rlkey=wskdsorn1u03zihj0t6qcgfwa&dl=1

@masonasons
Copy link
Contributor Author

Alrighty, here it is!

(1d74.1f8c): C++ EH exception - code e06d7363 (first chance)
(1d74.1f8c): Access violation - code c0000005 (first chance)
First chance exceptions are reported before any exception handling.
This exception may be expected and handled.
*** WARNING: Unable to verify checksum for nvgt.exe
nvgt!CScriptArray::AddRef:
00007ff750e77fe0 c6410c00 mov byte ptr [rcx+0Ch],0 ds:000000000000000c=??
0:000> .lines
Line number information will be loaded
0:000> k
Child-SP RetAddr Call Site
00000024464fe9a8 00007ff750bd3a34 nvgt!CScriptArray::AddRef [C:\git\nvgt\ASAddon\src\scriptarray.cpp @ 1924]
00000024464fe9b0 00007ff750ce8615 nvgt!pathfinder::find+0x1b4 [C:\git\nvgt\src\pathfinder.cpp @ 166]
00000024464fea50 00007ff750c89f9b nvgt!CallX64+0x95 [G:\windev\work\angelscript\source\as_callfunc_x64_msvc_asm.asm @ 139]
00000024464feb50 00007ff750c633d3 nvgt!CallSystemFunctionNative+0x30b
00000024464fee00 00007ff750c3d249 nvgt!CallSystemFunction+0x1c3
00000024464feed0 00007ff750c3c578 nvgt!asCContext::ExecuteNext+0xb89
00000024464ff000 00007ff750ea9e14 nvgt!asCContext::Execute+0x1d8
00000024464ff070 00007ff750bc7c5c nvgt!CContextMgr::ExecuteScripts+0xc4 [C:\git\nvgt\ASAddon\src\contextmgr.cpp @ 180]
00000024464ff0e0 00007ff750bc3060 nvgt!ExecuteScript+0x2cc [C:\git\nvgt\src\nvgt_angelscript.cpp @ 780]
00000024464ff1d0 00007ff750f3206c nvgt!nvgt_application::main+0x11b0 [C:\git\nvgt\src\nvgt.cpp @ 294]
00000024464ff540 00007ff750bc36e2 nvgt!Poco::Util::Application::run+0x3c
00000024464ff640 00007ff750f42244 nvgt!wmain+0x92 [C:\git\nvgt\src\nvgt.cpp @ 346]
(Inline Function) ---------------- nvgt!invoke_main+0x22 [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 90] 00000024464ff740 00007ffb65037374 nvgt!__scrt_common_main_seh+0x10c [D:\a\_work\1\s\src\vctools\crt\vcstartup\src\startup\exe_common.inl @ 288] 00000024464ff780 00007ffb66f1cc91 KERNEL32!BaseThreadInitThunk+0x14 00000024464ff7b0 00000000`00000000 ntdll!RtlUserThreadStart+0x21

@ethindp
Copy link
Contributor

ethindp commented Nov 18, 2024

I... Am very confused by this bug. Unless the pathfinder data callback is nullptr? The AddRef function does nothing more than clear the GC flag and atomically increase the counter -- there are no memory accesses. So I suspect this has everything to do with the callback pointer; perhaps it is being freed unexpectedly? Or the data array is being freed prematurely?

@ethindp
Copy link
Contributor

ethindp commented Nov 18, 2024

I would honestly recommend running the code under VLD or valgrind at this point. Or enabling address sanitizer and leak sanitizer. Those tools may provide more accurate information. To confirm, @masonasons, you are running the (latest) NVGT from main? Or the latest release?

@samtupy
Copy link
Owner

samtupy commented Nov 18, 2024

It is confusing, in this case the data pointer is of type CScriptAny* yet CScriptArray::AddRef is getting called not CScriptAny::AddRef, and I currently can't imagine how those 2 pointers are getting crossed. Nevertheless I'll be looking into it, that's very strange. The one thing to maybe test more is to attempt finding failing paths. So for example call the .find function with the end coordinates of the find being somewhere impossible such as in the wall or in the air over and over and see if that causes it faster.

@ethindp
Copy link
Contributor

ethindp commented Nov 18, 2024

Yeah I'm very confused, at first I thought that CScriptArray's AddRef function was being called because CScriptAny was forwarding calls to it but AddRef doesn't do that on CScriptAny. I am definitely very confused. Like I said, building with asan or lsan or using VLD might help track this kind of bug down (as well as potentially other bugs).

@masonasons
Copy link
Contributor Author

masonasons commented Nov 18, 2024

@samtupy, I did try that in my test, and it doesn't crash at all whatsoever in my test. I'm calling the pathfinder in the same way as in the game in my test. So, I'm very, very confused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants