Scripting & Asset Loading Brain Dump #828

zicklag · 2023-07-03T00:50:06Z

zicklag
Jul 3, 2023
Maintainer

This is a brain dump/primer for stuff relevant to the scripting/asset stuff I'm going to be working on next, to help anybody who wants to participate in the design or learn more about it.

Note: I'm going to do some extra explanation on layout concepts for people who aren't already acquainted with them. I'm may or may not be 100% accurate with some of my explanations, but my understanding has been close enough to get me along so far, so it shouldn't be too bad. :)

Layout

When it comes to scripting, one of challenges we have to solve for is memory layout, and how we're going to get both Rust and other languages reading and writing the same data without causing problems.

Layout Basics

The layout of a type describes it's size, alignment, and the offsets of it's fields.

Size is simple, it's how many bytes the type takes up in memory. For example, the size of an i32 is 4 bytes, the size of a u8 is 1 byte, and the size of a bool is 1 byte.

Field offsets are also simple. Take the struct below, for example:

#[repr(C)] // More about #[repr(C)] in a second
struct MyData {
    x: u32, // Field offset 0
    y: u32, // Field offset 4
    z: u32, // Field offset 8
}

The field offsets describe how far from the "front" of the type, the data for a specific field is.

If we have a raw pointer to a MyData struct, for instance, we know that if we read a 4 byte chunk starting from the position of the pointer as a u32, we will get the field x. And if we take the pointer, and add an offset of 8 bytes to it, then read a 4 bytes chunk as a u32, we will be reading the field z of the struct.

Alignment

Alignment was something that confused me for a bit. The alignment of a type is a number of bytes that the pointer to that type must be evenly divisible by.

Let's start by discussing the alignment of primitive types, such as i32, and u8.

For primitive types, the alignment is usually equal to it's size. So an i32 has a size and alignment of 4 bytes. This means that we're only allowed to read or write an i32 at a memory position that is divisible by 4.

This alignment restriction has to do with the way the CPU often works under-the-hood. If you want to read an i32, the CPU often only lets you do it, if the address is aligned, or else, you must make an intentional unaligned read, which can be way slower than normal.

Making a normal read/write to an unaligned address is undefined behavior.

Now consider a struct. Every struct must be have all of it's fields aligned at all times, otherwise reads of those fields would be undefined behavior. So the alignment of a struct is equal to the largest alignment of all of it's fields.

So if you have a struct with an i32 and an f64 in it, then the alignment of the struct is 8 bytes, because of the 8 byte alignment of the f64.

Another nitty gritty detail is the fact that you can make types that have gaps in them, where the bytes are not taken up by any field. For instance:

// The struct has an alignment of 4, because of the u32
#[repr(C)]
struct MyData {
    // This will be aligned to 4, because it's at the top of the struct.
    x: u8,
    // This needs to be aligned to 4, but the since it comes after x, which
    // only takes up 1 byte, there are 3 empty bytes before the u32 can fit
    // back into an alignment of 4.
    y: u32,
}

These empty gap bytes are unitialized which means something like we don't really know what is in them, and it's undefined behavior to try to read from them I think.

`repr(C)` and Type Layout

The thing about Rust structs, by default, is that they don't have a defined layout. The rust compiler may re-order fields or do any other kind of dark magic to try and get more performance out of your code. This means if we have a pointer to a struct, we don't really know what is going to happen if we try to make a read at that pointer, because we don't know where the fields are, or how they are represented in memory.

That's where #[repr(C)] comes in. Rust lets us add a repr() annotation to our types to let us influence how Rust represents them in memory. In this case of #[repr(C)], we are telling Rust that we want our struct to be laid out in the way a C struct would. This layout will be reliable, so that we can confidently do manual messing with the bytes of our type in memory, knowing that things are going where we think they are going.

The layout is described in more detail in the Rust reference. The Layout::extend() method in the Rust standard library also has a nifty example of how to calculate the field offsets of a struct, from their layouts.

Thinking About Scripting

Being able to specify our type layout can be very important for scripting, because we want our scripts to be able to read and write to fields of structs that we define in Rust. Given a pointer to a Rust struct, if the script knows the layout of the struct and its fields, then it can effectively access the data in the Struct.

The bones_ecs also allows you to create new component types at runtime, if you provide a size and alignment for that component. This means, that scripts could create their own component types, if they wanted, and store them in the ECS. Other scripts, if they knew the layout of the component, could then read data from that component created in another script.

One thing we'll want to think about designing, is a way for scripts and Rust to share that layout information with each-other, so that they can all read and write the to the different component datas throughout the game.

This will probably mean adding a #[repr(C)] annotation to many of our Rust structs, so that they have a predictable layout.

Assets

As I was drafting the new scripting-friendly, bevy-independent asset system, I was realizing that we are in a very similar situation with metadata assets as we are with components. Even though they are loaded from files, we essentially want to load them into pre-defined structures with a specific layout.

Before I realized this, my first attempt was to create essentially an equivalent to serde_json::Value called Metadata, that was an enum with variants like Metadata::String, Metadata::Number, Metadata::Map, etc. Metadata could be deserialized and had the ability to specially handle referencing other assets. The issue was that it is a big enum that must be matched on every time you need to get a value out of it. There's no structure validation or static typing at all, which was causing trouble.

My new idea is to treat assets very similar to components. We create a new Schema type, that can be de/serialized from YAML, or possibly a simple custom format. I've already got a parser for a Rust-inspired one from all the way back when I was looking into scripting for Amethyst. Either way, we use these schemas to define the layout of all of our asset types.

The asset system will parse these schemas and use them to validate loaded YAML assets. For each asset type, the scripts will be able to check the schema to see what the type layout is. On the Rust side, we can use a proc macro to generate a repr(C) rust struct that matches the schema. We can probably use similar techniques as already used for components in bones_ecs to cast the asset pointers safely to the Rust struct type generated by the proc macros.

Next Steps

That means the next step is to work on the schema setup. We'll need to define a new Schema struct, and see whether or not we can make/derive a decently nice serde deserializer so that we can write our schema's in YAML. YAML is probably better than custom for editor highlighting simplicity and avoiding people needing to learn a new syntax.

Since we'll want to use schemas for component definitions, too, we should add it in the bones_ecs crate directly. I'm thinking that, once we get to it, we'll want to add an Option<Schema> to the UntypedComponentStore in bones_ecs.

The idea is that if the component store from a component has a Schema, then that means scripts will be able to know how to read and write data from it. We'll have to look around the component API and figure out the easiest way to deal with registering components.

We should be able to easily create components in Rust, the same way we do today, without having to bother with schemas, but we should also be able to create explicitly schema'd components, that will be accessible with scripts. Again, probably with a proc_macro to create the components from a schema file.

As I'm saying that, I'm realizing maybe for Rust-native components/assets, we go the other way around. You write a normal Rust struct and derive a HasSchema trait, or something like that, and we have a way to make sure that the schema gets registered in the UntypedComponentStore when using that component.

That way the scripts can just access the schemas programmatically without needing a separate schema file. Then the only situation we'd want a schema file is when scripts need to define their own schemas for their own components or asset types.

Finally, we can use the loaded schema data and something like tealr_doc_gen to generate HTMl documentation for the asset and component types accessible to scripts!

I think that sums up my thoughts so far. If you have any questions or thoughts, you can ask below or on chat!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scripting & Asset Loading Brain Dump #828

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Scripting & Asset Loading Brain Dump #828

zicklag Jul 3, 2023 Maintainer

Layout

Layout Basics

Alignment

repr(C) and Type Layout

Thinking About Scripting

Assets

Next Steps

Replies: 0 comments

zicklag
Jul 3, 2023
Maintainer

`repr(C)` and Type Layout