Skip to content

Commit

Permalink
added README.md, made the ast2str code generate code for the rare cas…
Browse files Browse the repository at this point in the history
…e of a struct member of an ast case, and added code to generate instruction types and map every ast tag to its corresponding type(s)
  • Loading branch information
moste00 committed Oct 19, 2024
1 parent 4395646 commit c973000
Show file tree
Hide file tree
Showing 12 changed files with 1,958 additions and 20 deletions.
127 changes: 127 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
This is the repo of the generator tool riscv_disasm_from_sail, a tool that ingests the https://github.com/riscv/sail-riscv executable model of the RISCV instruction set and generates most of a disassembler for the architecture.

# What is Sail ?

Sail is a Domain Specific Language, a specialized programming language made for a narrow, singular purpose and not designed for general development of arbitrary programs (although it might end up, nonetheless, suitable for general development anyway). SQL and regular expressions are often-cited examples of DSLs.

Sail’s specialized domain is describing Instruction Set Architectures in an executable form.

# Why Sail ?

Describing ISAs like RISCV and x86 (among others) in machine-readable executable formats has many rationales:

1- ISAs are increasingly too complex for natural language. Even just describing the sequential semantics of an instruction - how it intreprets its addressing mode(s), how it fetches operands, how it computes a result, what registers/memory and/or any other architectural state it modifies or clobbers or saves or restores - can get unmanagebly tedious and confusing in natural language.

Add to that the fact that ISAs often specify addtional semantics on top of simple sequential execution of instruction streams: traps, faults, interrupts, privilege levels, ABI conventions, memory models, etc…, and it becomes obvious that describing an ISA is itself a computationally complex task that requires a suitable formal language.

2- Formal models enable automatic generation of ISA artifacts from a single source of truth. It’s clear, for instance, that assemblers and disassemblers are related, they “contain the same information”, in a certain sense. It follows that it’s wasteful and error-prone to write them by hand: it constitutes a duplication of information.

A well-defined executable model of the ISA could in principle contain the core information needed to generate both. An assembler-generator would ingest this model and generate an assembler, and a disassembler-generator would ingest the same model and generate a disassembler. This is commonly called the [Model-Driven Engineering](https://en.wikipedia.org/wiki/Model-driven_engineering) development methodology. Sail enables the application of MDE to the development of ISA artifacts.

# Why is Sail a new language?

The fact that Sail is an entirely new language and not a library, a framework or an internal DSL in an existing one deserves an explanation. Being a new language enables Sail to optimize its syntax and semantics for its domain, at the cost of larger development effort for the Sail compiler team.

For instance, the following are 2 features that Sail provides specifically to syntax-sugar ISA development:

1- Dependent typing: Sail can track values on the type level, such that it's possible to define the number 2 as a type, and derive a new type (e.g. 4) from it using type-level arithmetics. Dependent typing is an advanced feature not present in the vast majority of mainstream languages, even functional ones. Sail uses this advanced type system to ensure additional type safety and consistency checking for architectural parameters (e.g. register file width) at compile time.

2- Bidirectional mappings: Sail mappings are functions that can be run in both direction. Their type signature is a hint of their nature: while functions are typed A→ B, meaning they derive values of type B from values of type A, mappings are typed A↔B, meaning they can go “back and forth” from values of any type to values of the other. Mappings can be a very concise way of specifying bidirectional computations, such as describing how to obtain a structured object representing an instruction from the binary encoding of the instruction, how to parse the binary form of the instruction, and at the same time also how to serialize the structured object back to its binary encoding.

# How to run

The `end2end-smoke-test` workflow file in the CI directory (.github) setups the generator’s environment from scratch on a new machine. It should be viewed as the authoritative guide on how to run the generator on a new machine, even if it gets out of sync of this README. (In such case, please update the README to reflect the file.)

The steps are reproduced below with explanations:

## 1- Download and install Ocaml toolchain

Ocaml has an installation guide [here](https://ocaml.org/install#linux_mac_bsd), but in a nutshell, the following commands are enough on most *nix systems (any Linux, MacOS, any BSD):

```bash
sudo apt-get install bubblewrap
bash -c "sh <(curl -fsSL [https://opam.ocaml.org/install.sh](https://opam.ocaml.org/install.sh))"
opam init
opam install dune
```

And this is indeed what the workflow file does, with the exception that it pipes values

to serve as interactive inputs to the first 2 commands because can’t require user inputs.

- The `bubblewrap` package is a low-level interface for creating and managing sandboxed environments, `opam` requires it by default on Linux systems to run arbitrary code safely
- The second command downloads Ocaml and installs it,
- The third command initializes `opam`, Ocaml’s package downloader/installer and toolchain manager (manages the installation of different Ocaml compilers on the same machine), and
- The fourth downloads and installs `dune`, Ocaml’s package manager.

> P.S. It would be correct to say that exactly one out of {`opam`, `dune`} is actually needed, but alas, that’s a problem for the Ocaml language community to sort out, not this tool.
>
The following are optional development dependencies that aren’t strictly needed for running the generator, but might be useful if you want to modify the generator:

```bash
opam install ocaml-lsp-server odoc ocamlformat utop
```

## 2- Install generator dependencies

The generator tool itself has almost no dependencies except Libsail, the Sail compiler exposed as a library. It also requires installing Sail itself as a shell command because the Sail standard library must be downloaded, and requires the system-level command line tool `z3`, the popular SMT solver from Microsoft Research, exposed as a cmd tool.

In commands:

```bash
git clone https://github.com/moste00/riscv_disasm_from_sail.git
cd riscv_disasm_from_sail && source ~/.bash_profile
export OPAMCONFIRMLEVEL=yes
opam update

opam install sail=0.18
sudo apt-get install z3
opam install . --deps-only
```

It’s very important to fix the exact Sail version, as above, because Sail is in heavy development and every version breaks its predecessor’s APIs. The version is determined by what the maintainers of the `sail-riscv` model choose.

## 3- Fetch the sail-riscv model

Because Sail is a large language and this tool is *not* a general purpose Sail→C compiler, only a subset of Sail’s features is actually supported. More generally, the tool also makes some specific assumptions about its input code, for example that the logic to stringify instructions is a mapping (and not, e.g., a function). Taken together, those 2 facts imply that the input sail-riscv model could break the tool if it violates those assumptions or uses more Sail features than what the tool supports.

Therefore, `riscv_disasm_from_sail` versions its input. In `sail.hash.txt`, there is a git sha hash that states the latest sail-riscv version that the tool ran successfully on. It’s generally preferable for this version to not lag the sail-riscv model latest commit on master by too much (~1 month worth of commits are the limits), and it’s a bug in `riscv_disasm_from_sail` if it crashes on the latest master commit.

To get the input model, run the following in the parent directory of the tool:

```bash
git clone https://github.com/riscv/sail-riscv.git
cd sail-riscv
git reset --hard $(cat sail.hash.txt)
```

## 4- Run the generator on the input

Finally, run the following command from the generator’s directory to obtain the generated C code:

```bash
source ~/.bash_profile && dune exec --profile release -- riscv_disasm_from_sail -f sail.filepaths.txt
```

## 5- Or just copy riscv_disasm

The outputs of the generator for the model version specified by sail.hash.txt is also kept in this repo, this is a quality-of-life feature for 2 reasons:

1- If a commit to `riscv_disasm_from_sail` changed the generated code, the diff to the generated code is accompanied by the diff to the generator’s logic as an explanation

2- Users can simply copy the generated code instead of going through steps 1..4 above

As a downside, if the generated C code is used elsewhere, such as in a RISCV module in https://github.com/capstone-engine/capstone disassembler framework (the primary purpose and raison d'être of this tool), this means that the handwritten code in riscv_disasm (all the files not having “.gen” in their names) MUST NOT be modified in that site of use, only here in this repo.

# Updating the input model version

To update the version of sail-riscv, you have to update sail.filepaths.txt and sail.hash.txt. To update the first, run :

```bash
tools/riscv-ls.sh <path-to-riscv-model>
```

This will (ab)use the makefile in the sail-riscv repo to print an ordered list of all files that comprise the model. This is also the procedure to follow if the sail-riscv directory is not the direct filetree-sibling of the riscv_disasm_from_sail directory, as the paths in the committed sail.filepaths.txt assume this relationship, but you can override those paths by running the above command and giving it any valid path to the sail-riscv model directory.

Updating the hash must be done manually as of yet, query the sail-riscv repo version (e.g. using git log) and paste the version in sail.hash.txt.
34 changes: 17 additions & 17 deletions bin/main.ml
Original file line number Diff line number Diff line change
Expand Up @@ -46,23 +46,7 @@ let anon_arg_handler a =

let () = Arg.parse arg_spec anon_arg_handler usage_msg

let read_filepaths name =
let names = ref [] in
let file_chnl = open_in name in
try
while true do
names := input_line file_chnl :: !names
done;
[""] (*UNREACHABLE, just to guide type checker*)
with
| End_of_file ->
close_in file_chnl;
List.rev !names
| e ->
close_in_noerr file_chnl;
raise e

let filepaths = read_filepaths !paths_filename
let filepaths = Utils.read_file !paths_filename

let initial_typeenv = Type_check.initial_env

Expand Down Expand Up @@ -94,6 +78,14 @@ let asm = Gen_assembler.gen_asm ast analysis

let asm_str, tables_str = Stringify.stringify_assembler asm typdefwalker

let gen_instr_types_conf =
Gen_instr_types.read_config "sail.instr_types_excluded_enums.txt"

let instr_types = Gen_instr_types.gen_instr_types analysis gen_instr_types_conf

let instr_types_str, instr_types_mapping_str =
Stringify.stringify_instr_types instr_types typdefwalker

let () = write_c_file Constants.ast_type_filename ctypedefs_str
let () =
write_c_file Constants.decode_logic_filename proc_dec_str
Expand All @@ -111,3 +103,11 @@ let () =
let () =
write_c_file Constants.ast2str_tables_filename tables_str
~additional_includes:["\"" ^ Constants.ast_type_filename ^ "\""]

let () =
write_c_file Constants.instr_types_filename instr_types_str
~additional_includes:["\"" ^ Constants.ast_type_filename ^ "\""]

let () =
write_c_file Constants.instr_types_mapping_filename instr_types_mapping_str
~additional_includes:["\"" ^ Constants.instr_types_filename ^ "\""]
4 changes: 4 additions & 0 deletions lib/constants.ml
Original file line number Diff line number Diff line change
Expand Up @@ -25,3 +25,7 @@ let ast_assembly_mapping = "assembly"
let assembler_filename = "riscv_ast2str.gen.inc"

let ast2str_tables_filename = "riscv_ast2str_tbls.gen.inc"

let instr_types_filename = "riscv_insn.gen.inc"

let instr_types_mapping_filename = "riscv_insn_mapping.gen.inc"
36 changes: 36 additions & 0 deletions lib/gen_instr_types.ml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
open Utils
open Sail_analysis
open Hashset

type config = { exclude_enums : string set }

type instr_types = (string, int * string list) Hashtbl.t

let read_config path =
let c = { exclude_enums = Hashtbl.create 10 } in
List.iter (fun e -> set_add c.exclude_enums e) (Utils.read_file path);
c

let add_instr_types conf case_names_to_instr_types analysis case_name
enum_typenames =
let do_add name types = Hashtbl.add case_names_to_instr_types name types in
let applicable_enums =
List.filter
(fun (_, name) -> not (set_contains conf.exclude_enums name))
enum_typenames
in
match applicable_enums with
| [] -> do_add case_name (-1, [case_name])
| [(i, enum_typename)] ->
do_add case_name (i, get_all_members_of_enum analysis enum_typename)
| _ -> failwith "FAIL"

let gen_instr_types analysis conf =
let case_names_to_enum_typenames = get_all_cases_with_enum_members analysis in
let instr_types =
Hashtbl.create (Hashtbl.length case_names_to_enum_typenames)
in
Hashtbl.iter
(add_instr_types conf instr_types analysis)
case_names_to_enum_typenames;
instr_types
30 changes: 30 additions & 0 deletions lib/sail_analysis.ml
Original file line number Diff line number Diff line change
Expand Up @@ -376,6 +376,12 @@ let get_case_arg_size ana case_name arg_idx =
let is_member_of_enum ana maybe_member =
Type_check.is_enum_member maybe_member ana.sail_env.e

let get_all_members_of_enum ana enum_typename =
let member_ids = Bindings.find (mk_id enum_typename) ana.sail_env.enums in
let members = ref [] in
IdSet.iter (fun i -> members := id_to_str i :: !members) member_ids;
!members

let get_bv2str_mapping ana map_name =
let bv2str_mappings =
ana.mapping_ctx.to_string_mappings_registery.bv2string_mappings
Expand All @@ -399,3 +405,27 @@ let get_struct2str_mapping ana map_name =
ana.mapping_ctx.to_string_mappings_registery.struct2string_mappings
in
Option.map snd (Hashtbl.find_opt struct2str_mappings map_name)

let get_all_cases_with_enum_members ana =
let cases_type_sigs = ana.type_ctx.union_cases_type_signatures in
let cases_with_enum_members =
Hashtbl.create (Hashtbl.length cases_type_sigs)
in
Hashtbl.iter
(fun case_name typesig ->
let enum_types =
List.mapi
(fun i typ ->
match typ with
| Named_type name when Bindings.mem (mk_id name) ana.sail_env.enums
->
Some (i, name)
| _ -> None
)
typesig
in
let enum_types = List.filter_map (fun a -> a) enum_types in
Hashtbl.add cases_with_enum_members case_name enum_types
)
cases_type_sigs;
cases_with_enum_members
5 changes: 5 additions & 0 deletions lib/sail_analysis.mli
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@ val get_case_arg_size : sail_analysis_result -> string -> int -> int option

val is_member_of_enum : sail_analysis_result -> id -> bool

val get_all_members_of_enum : sail_analysis_result -> string -> string list

val get_bv2str_mapping : sail_analysis_result -> string -> bv2str_table option

val get_enum2str_mapping :
Expand All @@ -30,3 +32,6 @@ val get_bool2str_mapping :

val get_struct2str_mapping :
sail_analysis_result -> string -> struct2str_table option

val get_all_cases_with_enum_members :
sail_analysis_result -> (string, (int * string) list) Hashtbl.t
Loading

0 comments on commit c973000

Please sign in to comment.