-
Notifications
You must be signed in to change notification settings - Fork 13
About VM and YYC
As you may know, many GameMaker: Studio decompilers / modding tools fail when a game is compiled using YYC.
Since the early days of YYC, it's been regarded as this scary thing that makes modding tools not work.
Even now, the best-in-class modding tool that currently exists (UndertaleModTool) brings up a warning whenever you open a YYC-compiled game.
To understand why this is, and what separates YYTK from other tools, let's see what options game developers have to build their games.
- VM (Virtual Machine)
- YYC (YoYo Compiler)
Virtual Machine builds are what most GMS games use. It's how Undertale's built, it's how DELTARUNE's built.
VM builds are very easily modifiable, as GML is only compiled to bytecode, which is then interpreted by a virtual machine inside the runner.
This bytecode is stored inside the data.win
file, inside the CODE
chunk, which is essentially just a pointer list.
For you curious out there, the format (for bytecode 15 and newer) is as follows:
struct CODEChunk
{
char szChunkName[4]; // RIFF-style, it's not null-terminated.
uint32_t nChunkLength; // The total length of the chunk
struct PointerList
{
int nEntryCount;
uint32_t nPointers[nEntryCount];
struct Element
{
UndertaleString sEntryName;
uint16_t nLocalsCount;
uint16_t nArgumentsCount;
int32_t nBytecodeOffset;
uint32_t nOffset;
} nEntries[nEntryCount];
} pList;
};
Thanks to this, you can use data.win
files from a Linux build on Windows, provided you have a runner that works on Windows.
Of course the side effect of this is that code runs slower, as it has to:
- Be fetched from the data.win file / some location it's mapped to in memory
- C++ function:
Code_Execute()
- C++ function:
- Sent to the VM to execute
- C++ function:
ExecuteIt()
- C++ function:
- VM decodes the instruction
- C++ function:
VM::ExecRelease()
/VM::ExecDebug()
- C++ function:
- VM calls a function pointer from
g_Instructions[index]
So why is VM even used? It's quite simple. It has been around since the very beginning of GM 8.1, where it interpreted plain-text code.
Now here is where it gets interesting. Few games use this configuration, but when they do - modders are the first to know about it.
This is because unlike VM compilations, there is no bytecode saved in the data.win. Instead, code is embedded directly into the exe file.
Again, for you curious, here's the list of differences
"Why can't we just extract it out the EXE?" you ask? Well, it turns out it's no longer bytecode at all.
Instead, GML is converted into native C++ code and then compiled into x86 assembly, which is then embedded directly into the runner.
This gains precious time as C++ can be optimized very well using modern compilers (like Clang or GCC). However with C++ comes worse decompilation output.
There's the two big reasons why C++ is such a headache to decompile:
- Optimization
- Various things are inlined, division and multiplication may be turned into bitshifts...
- Virtually no debug information
- Except maybe RTTI or some DWARF debug info (hi, Nik!)
Let's make an example with YYToolkit's IsGameYYC function:
As you can see, there's tons of mangled, even non-sensical output, together with no variable names or function names (beside the exported GetFunctionByName
symbol).
Apps like UndertaleModTool, which dig into the data.win
file are therefore at a loss on modifying code, as they're assuming the code's in there, when it's not.
Here's when YYToolkit comes in - it takes a different approach, and that's runtime injection.
It turns out the game passes function pointers to ExecuteIt
(yeah, remember the 2nd step of VM execution?), which allows us to detour stuff.
I won't be getting into how detouring works, although that may be a topic in the future, as I quite enjoy writing these posts.
Anyway, if we can detour functions, that gives us the ability to monitor when they're called, or even override them completely. However, this doesn't allow us to modify the code inside!
There's still not much info available on how YYC works after compilation, but one thing I know is that they still DON'T call built-in functions directly. Instead, a proxy function of sorts is used, to which the index of the function is passed.
But what's a function index?
Well, it's quite simple really - in both VM and YYC runners, there exists an array of type RFunction
called the_functions
(creative name huh?).
You can see the RFunction struct here:
struct RFunction
{
char szName[64];
TRoutine pRoutine;
int nArgCount;
unsigned int nUsageCount; // Always seems to be -1
};
Every built-in function is added into this array through loads of Function_Add()
calls.
You can then retrieve the members either by directly accessing the array, which is bad because this array changes in GMS 1.4.x, or you can use Code_Function_GET_the_function()
.
This function requires an index to retrieve a function from the array, and this index is what gets passed to the YYC-only variant of this function.
Full YYC support is still a long way, but YYToolkit is right now at the bleeding edge of runner exploitation.
The very beginnings of this journey may be found in this document, which was made back in April 2021. Please keep in mind some information in that document is outdated or just plain wrong, but it was the best we had at the time 😉
Special thanks go to Colinator27, BenjaminUrquhart, and you, the reader.
Thanks for reading!