This is a purely academic project to explore context free grammars and compilers based on the theoretical MIMA-Machine.
We use a recursive decent parser to parse a subset of the C programming language. The parsed output can then either be interpreted or compiled to MIMA-Assembly*.
You can try it yourself here: https://bertilbraun.github.io/mima-c/
- while, for loop
- conditionals (if else)
- tenary operator
- IO printf function
- return
- break
- continue
- boolean operators
- arithmetic operators
- variables, arrays
- stringliterals
- charliterals
- pointers
- typedefs
The Interpreter can execute anything that can be parsed.
- while, for loops
- conditionals (if else)
- tenary operator
- some arithmetic expressions
- (+), (+=)
- printf (somewhat)
- functions (recursion)
- return
- variables
This grammar is (mostly) a LL(2) grammar.
With the exception that we evaluate types and scope during parsing to differentiate between identifier and type tokens (to avoid lookahead).
expr -> assignment | opassignment ;; this is ok to do recursively because it is right associative opassignment -> ternary (STAREQ | DIVIDEEQ | MODEQ | MINUSEQ | PLUSEQ) assignment assignment -> ternary (EQUALS assigmnet)* ternary -> p9 (QUESTIONMARK expr COLON expr)? p9 -> (p8 AND)* p8 p8 -> (p7 OR)* p7 p7 -> (p6 (EQUAL | NEQ))* p6 p6 -> (p4 (LT | GT | GEQ | LEQ)* p4) p4 -> (p3 (PLUS | MINUS))* p3 p3 -> (unary (STAR | DIVIDE | MOD))* unary unary -> (cast | STAR | AMPERSAND | NOT | LNOT) unary | (PLUSPLUS | MINUSMINUS | MINUS | PLUS)? postfix cast -> LPAREN type RPAREN postfix -> value (PLUSPLUS | MINUSMINUS | ((DOT | ARROW) value))? value -> INTLITERAL | LPAREN expr RPAREN | functioncall | IDENTIFIER | arraylit | arrayaccess functioncall -> IDENTIFIER LPAREN (RPAREN | expr (COMMA expr)* RPAREN) arrayaccess -> IDENTIFIER LBRACKET expr RBRACKET arraylit -> LBRACE (RBRACE | expr (, expr)* RBRACE)
;; declaration -> type IDENTIFIER (LBRACKET expr RBRACKET)? | funcptr ;; funcptr -> type LPAREN STAR IDENTIFIER RPAREN LPAREN (type (LBRACKET RBRACKET)? (COMMA type (LBRACKET RBRACKET)?)*)? RPAREN type -> (UNSIGNED | STRUCT | CONST)* (One of the defined Types :P )
blockstatement -> ((vardecl | expr | return | break | continue | instrinsic | typedef | structdecl)? SEMICOLON | block | for | while | if) for -> FOR LPAREN (vardecl | expr) SEMICOLON expr SEMICOLON expr RPAREN blockstatement while -> WHILE LPAREN expr RPAREN blockstatement if -> IF LPAREN expr RPAREN blockstatement (ELSE blockstatement)? block -> LBRACE (blockstatment)* RBRACE return -> RETURN (expr)? SEMICOLON break -> BREAK SEMICOLON continue -> CONTINUE SEMICOLON intrinsic -> INTRINSIC LPAREN (RPAREN | expr (COMMA expr)* RPAREN)
program -> (statement)* statement -> ((vardecl | funcdecl | structdecl | typedef)? SEMICOLON) typedef -> TYPEDEF (type | structdecl) IDENTIFIER structdecl -> STRUCT IDENTIFIER? LBRACE (vardecl SEMICOLON)* RBRACE vardecl -> type vardecl' (, vardecl')* vardecl' -> STAR* IDENTIFIER (LBRACKET (expr)? RBRACKET)? (= expr)? funcdecl -> type IDENTIFIER LPAREN (RPAREN | funcdecl' (COMMA funcdecl')* RPAREN) (SEMICOLON | block) funcdecl' -> type IDENTIFIER (LBRACKET RBRACKET)?
The stack starts at 1 and grows upwards.
Some memory addresses are reserved to be used as registers by the compiler.
StackPointerPosition | 1048504 |
FramePointerPosition | 1048508 |
LastAddrPointerPosition | 1048512 |
PushPopPointerPosition | 1048516 |
---|---|
General Purpose registers | 1048520 |
1048524 | |
1048528 | |
1048532 | |
1048536 | |
1048540 | |
1048544 | |
1048548 | |
Return Register Positions | 1048552 |
1048556 | |
InstructionPointer | 1048500 |
We don’t assemble to bytecode. The Mima assembly Interpreter we use will use the InstructionPointer memory location as an instruction pointer thus allowing for direct reading of that value.
This makes it trivial to push the current instruction pointer to the stack even without resolving that address in the assembler.
In the future we might assemble to proper MIMA bytecode and then this step will not be necessary anymore.