Skip to content

Commit

Permalink
Version v0.2.0
Browse files Browse the repository at this point in the history
  • Loading branch information
savannstm committed Aug 21, 2024
1 parent 4ebebdd commit 58c8fa2
Show file tree
Hide file tree
Showing 7 changed files with 540 additions and 317 deletions.
8 changes: 4 additions & 4 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "marshal-rs"
version = "0.1.0"
version = "0.2.0"
authors = ["savannstm <[email protected]>"]
edition = "2021"
rust-version = "1.63.0"
Expand All @@ -18,8 +18,8 @@ default = ["dep:serde_json"]
cfg-if = "1.0.0"
encoding_rs = "0.8.34"
num-bigint = "0.4.6"
serde_json = { version = "1.0.121", optional = true, features = ["preserve_order"] }
sonic-rs = { version = "0.3.9", optional = true }
serde_json = { version = "1.0.125", optional = true, features = ["preserve_order"] }
sonic-rs = { version = "0.3.11", optional = true }

[dev-dependencies]
rayon = "1.10.0"
rayon = "1.10.0"
29 changes: 21 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,17 @@
This project is essentially just [@savannstm/marshal](https://github.com/savannstm/marshal), rewritten using Rust.
It is capable of :fire: **_BLAZINGLY FAST_** loading dumped from Ruby Marshal files, as well as :fire: **_BLAZINGLY FAST_** dumping them back to Marshal format.

## Installation

`cargo add marshal-rs`

## Quick overview

This crate has two main functions: `load()` and `dump()`.

`load()` takes a `&[u8]`, consisting of Marshal data bytes (that can be read using std::fs::read()) as its only argument, and outputs serde_json::Value (sonic_rs::Value, if "sonic" feature is enabled).
`load()` takes a `&[u8]`, consisting of Marshal data bytes (that can be read using `std::fs::read()`) as its only argument, and outputs `serde_json::Value` (`sonic_rs::Value`, if `sonic` feature is enabled).

`dump()` takes a `Value` as its only argument, and outputs Vec\<u8\>, consisting of Marshal bytes.
`dump()`, in turn, takes `Value` as its only argument and serializes it back to `Vec<u8>` Marshal byte stream. It does not preserve strings' initial encoding, writing all strings as UTF-8 encoded.

If serializes Ruby data to JSON using the table:

Expand All @@ -32,7 +36,7 @@ If serializes Ruby data to JSON using the table:

By default, Ruby strings, that include encoding instance variable, are serialized to JSON strings, and those which don't, serialized to `{ __type: "bytes", data: [...] }` objects.

This behavior can be controlled with `string_mode` argument of load() function.
This behavior can be controlled with `string_mode` argument of `load()` function.

`StringMode::UTF8` tries to convert arrays without instance variable to string, and produces string if array is valid UTF8, and object otherwise.

Expand All @@ -44,14 +48,17 @@ For objects, that cannot be serialized in JSON (such as Objects and Symbols), `m

### Hash keys

For Hash keys, that in Ruby may be represented using Integer, Float, Object etc, `marshal-rs` tries to preserve key type with prefixing stringifiyed key with it type. For example, Ruby `{1 => nil}` Hash will be converted to `{"__integer__1": null}` object.

load(), in turn, takes serialized JSON object and serializes it back to Ruby Marshal format. It does not preserve strings' initial encoding, writing all strings as UTF-8 encoded, as well as does not writes links, which effectively means that output Marshal data might be larger in size than initial.
For Hash keys, that in Ruby may be represented using `Integer`, `Float`, `Object` etc, `marshal-rs` tries to preserve key type with prefixing stringifiyed key with it type. For example, Ruby `{1 => nil}` Hash will be converted to `{"__integer__1": null}` object.

### Instance variables

Instance variables always decoded as strings with "\_\_symbol\_\_" prefix.
You can manage the prefix of instance variables using `instance_var_prefix` argument in load() and dump(). Passed string replaces "@" instance variables' prefixes.
Instance variables always decoded as strings with `__symbol__` prefix.
You can manage the prefix of instance variables using `instance_var_prefix` argument in `load()` and `dump()`. Passed string replaces "@" instance variables' prefixes.

### Unsafe code

This code uses UnsafeCell along with unsafe blocks multiple times in load() function.
However, in current implementation, this unsafe code will NOT ever cause any data races or instabilities.

## Quick example

Expand Down Expand Up @@ -82,6 +89,12 @@ fn main() {

Minimum supported Rust version is 1.63.0.

## References

- [Official documentation for Marshal format](https://docs.ruby-lang.org/en/master/marshal_rdoc.html)
- [TypeScript implementation of Marshal](https://github.com/hyrious/marshal)
- [marshal.c](https://github.com/ruby/ruby/blob/master/marshal.c)

## License

Project is licensed under WTFPL.
Loading

0 comments on commit 58c8fa2

Please sign in to comment.