Releases: CharlieTap/chasm
0.9.53
APIs for reading and writing strings from memory
You'll now find three new api calls in chasms embedding api:
- readUtf8String()
- writeUtf8String()
- readNullTerminatedUtf8String()
The latter of the three is useful when dealing with modules created by languages like C which use null terminating bytes to signal the end of a string. I've done what I can to optimise the algorithm which finds the null byte on native, which is powered by a wide SIMD search. The Java impl is more primitive but will be improved in the future when I start to specialise the JVM memory implementation.
0.9.52
Performance
- When exiting blocks, functions and exception handlers chasm will no longer attempt to null out the respective stack elements. Instead it will simply move the stack pointer, this gives around 7% boost to control flow dispatch with the tradeoff that stack frames and labels are not released. These objects don't actually have much state so this has no meaningful memory impact.
Bug fixes
- Fixed a complex issue around type pointers mismatching during instantiation
- Fixed Issue where breaking from blocks would not cleanup relevant exception handlers, this could cause future exceptions to caught by handlers that should not exist and hold incorrect stack depths.
0.9.51
Bugfix
This release includes a small bug fix for programs that use array_new_default on an array who's field type is a non null reference type. This would cause an issue that would leak a concrete defined type heap type onto the stack, which chasm cannot serialise.
0.9.5
More Performance
- The core dispatch loop is 60% faster
- GC instructions are orders of magnitude faster 10X speedup in local tests
- Most instructions have been dialled in towards there theoretical minimum execution time given the current architecture
Removal of garbage collection during invocation
Allocations from GC instructions are now deallocated with the store rather than at runtime when they fall off the stack. This was done to allow the pushing values to the stack without boxing which was slowing the runtime quite some. I may introduce a lightweight GC or reference counting solution in the future but the majority of workloads run on wasm at present are small and bursty and in this scenarios its fine to sweep the memory after invocation.
0.9.42
Performance
Performance is improved roughly 50% for all types of workloads
- The dispatch loop has been shrunken to just a jump, a compare and load and an invocation of a function reference
- Branches have been removed from all stack operations, all operations are now optimistic
- Host functions are no longer type checked when they return results, this checking was introducing several branches per value in the result. A future release will add functions to chasms api that allow authors of HostFunction libraries the ability to type check them whilst they are developed.
- Locals now have their defaults predecoded, saving several branches on call entrance
0.9.41
Bugfix
- Modules with large amounts of recursive types should now have a small memory footprint
Full Changelog: 0.9.4...0.9.41
0.9.4
Breaking API Changes
Chasm now exposes a small part of its internal runtime to the public api. All Values and ValueTypes given to chasm now use the same types chasm uses internally to operate on .
The reasoning behind this change is largely performance related, the boxing and unboxing of the internal types to the public api type was prohibitively expensive, particularly when dealing with host functions. This would cause a ton of temporary allocations and GC pressure.
Whilst exposing these internals isn't ideal I don't plan for chasm to be used directly by the majority of consumers. In the coming months I will add a plugin that code gens a kotlin interface from wasm modules, this plugin will generate a pure kotlin api and hide almost all of chasms api. Allowing the core library to make breaking changes whilst the plugin stays stable.
More performance
Last release focused on the interpreter dispatch loop, this release focuses on optimising some core instructions and also the entry and exit routines of chasm.
Calling chasm on a function with a single noop instruction is a great way to understand the cost of entry and exit which I have created a benchmark around here.
Running this on a Pixel 8
0.9.3: 20 micros
0.9.4: 3 micros
I believe its possible to get this sub 1 microsecond by shifting more work to the in the instantiation phase however theres lots of other work elsewhere which would have a more meaningful impact for now.
No threads execution phase
For a bunch of different reasons I've had to switch focus and really work on performance, threads will land before 1.0 but right now theres a lot of work to do in order to make chasm go fast with all types of workloads.
API calls for releasing resources
- DropStore (Use this 99% of the time)
- DropInstance
These calls manually drop all of the associated state allocated and close any resources. This is particularly relevant on native platforms as the memory implementation calls rust over ffi and thus is unable to track when kotlin objects are GC'd. Failing to call this will result in mmap'd regions of memory active and reduce the global address space until your process exits.
Known Issues
There is a known issue of excessive memory consumption for modules with a large amount of types, in some cases this may OOM. This is an issue created by the algorithm that unrolls recursive types introduced in the Wasm GC proposal. 0.9.3 exacerbated this problem by calling this process more times than was necessary, 0.9.4 removes that issue but the underlying algorithm remains troublesome and needs to be optimised. Which leads onto the next release
Next release
- Bytecode fusion
- GC instruction predecoding
- Recursive Type algorithm optimisation
0.9.3
What's Changed
Performance
Chasms performance changes drastically with this release as I've began to optimise the interpreter. As such the time taken for chasm to complete the official wasm testsuite has dropped from 8 minutes to 6 seconds on the jvm with similar speedups on other platforms.
Threads proposal decode and validation
Chasm can now decode and validate wasm modules with bytecode from the threads proposal, execution support will come in the next release
Host Function Exceptions
Host functions can now throw a HostFunctionException, this will be caught internally by the VM and chasm error will be returned.
Windows support dropped for now
This will be readded before chasms 1.0 but its exclusion for now allows development to progress faster
New Contributors
Full Changelog: 0.9.2...0.9.3
0.9.2
What's Changed
- Automation of the GC proposal testsuite
- Simplified GC approach
- 100% passing of all testsuite tests for all semantic phases of every proposal
- Module Information API
- Tags API
- Tons of bugfixes
Full Changelog: 0.9.1...0.9.2
0.9.1
What's Changed
- Update testsuite to latest by @CharlieTap in #26
- Wabt 1.0.36 by @CharlieTap in #27
- Migration from wabt to wasm-tools by @CharlieTap in #28
- Exception Handling Proposal by @CharlieTap in #29
Full Changelog: 0.9.0...0.9.1