Skip to content

BertilBraun/mima-c

Repository files navigation

Readme

Table of Contents

Overview

This is a purely academic project to explore context free grammars and compilers based on the theoretical MIMA-Machine.

We use a recursive decent parser to parse a subset of the C programming language. The parsed output can then either be interpreted or compiled to MIMA-Assembly*.

You can try it yourself here: https://bertilbraun.github.io/mima-c/

Supported Featureset:

Parser

  • while, for loop
  • conditionals (if else)
  • tenary operator
  • IO printf function
  • return
  • break
  • continue
  • boolean operators
  • arithmetic operators
  • variables, arrays
  • stringliterals
  • charliterals
  • pointers
  • typedefs

Interpreter

The Interpreter can execute anything that can be parsed.

Compiler

  • while, for loops
  • conditionals (if else)
  • tenary operator
  • some arithmetic expressions
    • (+), (+=)
  • printf (somewhat)
  • functions (recursion)
  • return
  • variables

Grammar

This grammar is (mostly) a LL(2) grammar.

With the exception that we evaluate types and scope during parsing to differentiate between identifier and type tokens (to avoid lookahead).

expr           -> assignment | opassignment
;; this is ok to do recursively because it is right associative
opassignment   -> ternary (STAREQ | DIVIDEEQ | MODEQ | MINUSEQ | PLUSEQ) assignment
assignment     -> ternary (EQUALS assigmnet)*
ternary        -> p9 (QUESTIONMARK expr COLON expr)?
p9             -> (p8 AND)* p8
p8             -> (p7 OR)* p7
p7             -> (p6 (EQUAL | NEQ))* p6
p6             -> (p4 (LT | GT | GEQ | LEQ)* p4)
p4             -> (p3 (PLUS | MINUS))* p3
p3             -> (unary (STAR | DIVIDE | MOD))* unary
unary          -> (cast | STAR | AMPERSAND | NOT | LNOT) unary | (PLUSPLUS | MINUSMINUS | MINUS | PLUS)? postfix
cast           -> LPAREN type RPAREN
postfix        -> value (PLUSPLUS | MINUSMINUS | ((DOT | ARROW) value))?
value          -> INTLITERAL | LPAREN expr RPAREN | functioncall | IDENTIFIER | arraylit | arrayaccess
functioncall   -> IDENTIFIER LPAREN (RPAREN | expr (COMMA expr)* RPAREN)
arrayaccess    -> IDENTIFIER LBRACKET expr RBRACKET
arraylit       -> LBRACE (RBRACE | expr (, expr)* RBRACE)
;; declaration    -> type IDENTIFIER (LBRACKET expr RBRACKET)? | funcptr
;; funcptr        -> type LPAREN STAR IDENTIFIER RPAREN LPAREN (type (LBRACKET RBRACKET)? (COMMA type (LBRACKET RBRACKET)?)*)? RPAREN
type           -> (UNSIGNED | STRUCT | CONST)* (One of the defined Types :P )
blockstatement -> ((vardecl | expr | return | break | continue | instrinsic | typedef | structdecl)? SEMICOLON | block | for | while | if)
for            -> FOR LPAREN (vardecl | expr) SEMICOLON expr SEMICOLON expr RPAREN blockstatement
while          -> WHILE LPAREN expr RPAREN blockstatement
if             -> IF LPAREN expr RPAREN blockstatement (ELSE blockstatement)?
block          -> LBRACE (blockstatment)* RBRACE
return         -> RETURN (expr)? SEMICOLON
break          -> BREAK SEMICOLON
continue       -> CONTINUE SEMICOLON
intrinsic      -> INTRINSIC LPAREN (RPAREN | expr (COMMA expr)* RPAREN)
program        -> (statement)*
statement      -> ((vardecl | funcdecl | structdecl | typedef)? SEMICOLON)
typedef        -> TYPEDEF (type | structdecl) IDENTIFIER
structdecl     -> STRUCT IDENTIFIER? LBRACE (vardecl SEMICOLON)* RBRACE
vardecl        -> type vardecl' (, vardecl')*
vardecl'       -> STAR* IDENTIFIER (LBRACKET (expr)? RBRACKET)? (= expr)?
funcdecl       -> type IDENTIFIER LPAREN (RPAREN | funcdecl' (COMMA funcdecl')* RPAREN) (SEMICOLON | block)
funcdecl'      -> type IDENTIFIER (LBRACKET RBRACKET)?

Compiler

Execution environment

Stack

The stack starts at 1 and grows upwards.

Simulated Registers

Some memory addresses are reserved to be used as registers by the compiler.

StackPointerPosition1048504
FramePointerPosition1048508
LastAddrPointerPosition1048512
PushPopPointerPosition1048516
General Purpose registers1048520
1048524
1048528
1048532
1048536
1048540
1048544
1048548
Return Register Positions1048552
1048556
InstructionPointer1048500

Our Instructionpointer

We don’t assemble to bytecode. The Mima assembly Interpreter we use will use the InstructionPointer memory location as an instruction pointer thus allowing for direct reading of that value.

This makes it trivial to push the current instruction pointer to the stack even without resolving that address in the assembler.

In the future we might assemble to proper MIMA bytecode and then this step will not be necessary anymore.

About

C Compiler to MiMa Assembly Code

Resources

Stars

Watchers

Forks