Native arm64 dynamic core support for Apple Silicon #2
Replies: 0 comments 6 replies
-
Crossed linked to vogons: https://www.vogons.org/viewtopic.php?p=958225#p958225 |
Beta Was this translation helpful? Give feedback.
-
Based on jmarsh's comment in the vogons thread I moved the write protect toggle and cache invalidation up to just the CreateCacheBlock call, which resulted in a significant performance improvement. JITing Quake, the old approach: JITing Quake, the new approach: I also added a naive mprotect implementation for SELinux, tested under Fedora 33 ARM64 in a VM, and it seems to work fine. I have the experiment at https://github.com/kklobe/dosbox-staging/tree/kklobe/arm64_dynamic. |
Beta Was this translation helpful? Give feedback.
-
my reply reposted from vogons: Hi all, author of the code here, just wanted to respond to some good points jmarsh has raised. TLDR: I'm in favor of a "1.5" Approach that combines a single mmap region with write protect toggling. I started this experiment because I wanted DOSBox Staging dynrec running on my M1 MacBook Pro that I've had since December. I like Approach 1 because of the simplicity, along with the relative portability of a single mmap call with MAP_ANON | MAP_PRIVATE (| MAP_JIT if apple). With regards to the 2016 presentation mentioned, a few things are new in the Apple ecosystem in the last several years:
Apple's recommended solution for security in 2021 seems to include the above mentioned components: enable the Hardened Runtime, enable the JIT Entitlement, and use per-thread write protect toggling to help reduce the attack surface. And, I reasoned, if it's going to toggle, it might as well just have a single mapping, and not have to deal with the fiddly issues pointed out for dual mappings. After looking around at some other projects that do codegen, I found that the toggling approach is common:
So it looks like there's a decent precedent for the toggling approach. It also worked with a quick test on Fedora 33 arm64 with SELinux enabled by using an mprotect in place of the pthread_jit_write_protect_np(). In summary, I came to the same conclusion as the author of the OpenJDK PR: "It's implemented with pthread_jit_write_protect_np provided by Apple... This approach of managing W^X mode turned out to be simple and efficient enough." Thanks for taking the time to discuss. |
Beta Was this translation helpful? Give feedback.
-
Adding a link where this code allows the dynamic core to run with a 64-bit x86-64 installation of Fedora 34, with SELinux in its default "enforcing" state. See: dosbox-staging/dosbox-staging#1010 With these being tested-and-working, having minimal measured performance impact, the lean state of the implementation (ie: the dynamic core is left essentially as-is), and the approach following best-practices per the list above - I can't see anything holding this back from landing in Staging! |
Beta Was this translation helpful? Give feedback.
-
Discussion archived in PR: dosbox-staging/dosbox-staging#1031 |
Beta Was this translation helpful? Give feedback.
-
Archived discussion to https://github.com/dosbox-staging/archived-discussions-for-dosbox-staging |
Beta Was this translation helpful? Give feedback.
-
Last night I threw together a minimum working hack for native arm64 dynamic core on my MacBook Pro M1, with some encouraging results.
Using Future Crew's Unreal (
unreal p8
with GUS 44kHz, surely one of the better moments in DOS history), I was able to raise the audio stuttering threshold from ~100k cycles on normal core to ~400k on the dynamic core.The short list:
cache_addX
write with write protect disable and enable, then flush the written memoryIn practice, this is all in
dyn_cache.h
, and looks like:and
I understand from Discord chats that there's a plan for this work and I am very happy to help out with coding, testing, benchmarking (the fun part), and anything else.
A few references:
Porting Just-In-Time Compilers to Apple Silicon
Attempts to mprotect() with MAP_JIT failing on Apple Silicon as of macOS 11.2
Apple M1 Support for MacOS
Beta Was this translation helpful? Give feedback.
All reactions