Readme

Overview

This is a purely academic project to explore context free grammars and compilers based on the theoretical MIMA-Machine.

We use a recursive decent parser to parse a subset of the C programming language. The parsed output can then either be interpreted or compiled to MIMA-Assembly*.

You can try it yourself here: https://bertilbraun.github.io/mima-c/

Supported Featureset:

Parser

while, for loop
conditionals (if else)
tenary operator
IO printf function
return
break
continue
boolean operators
arithmetic operators
variables, arrays
stringliterals
charliterals
pointers
typedefs

Interpreter

The Interpreter can execute anything that can be parsed.

Compiler

while, for loops
conditionals (if else)
tenary operator
some arithmetic expressions
- (+), (+=)
printf (somewhat)
functions (recursion)
return
variables

Grammar

This grammar is (mostly) a LL(2) grammar.

With the exception that we evaluate types and scope during parsing to differentiate between identifier and type tokens (to avoid lookahead).

expr           -> assignment | opassignment
;; this is ok to do recursively because it is right associative
opassignment   -> ternary (STAREQ | DIVIDEEQ | MODEQ | MINUSEQ | PLUSEQ) assignment
assignment     -> ternary (EQUALS assigmnet)*
ternary        -> p9 (QUESTIONMARK expr COLON expr)?
p9             -> (p8 AND)* p8
p8             -> (p7 OR)* p7
p7             -> (p6 (EQUAL | NEQ))* p6
p6             -> (p4 (LT | GT | GEQ | LEQ)* p4)
p4             -> (p3 (PLUS | MINUS))* p3
p3             -> (unary (STAR | DIVIDE | MOD))* unary
unary          -> (cast | STAR | AMPERSAND | NOT | LNOT) unary | (PLUSPLUS | MINUSMINUS | MINUS | PLUS)? postfix
cast           -> LPAREN type RPAREN
postfix        -> value (PLUSPLUS | MINUSMINUS | ((DOT | ARROW) value))?
value          -> INTLITERAL | LPAREN expr RPAREN | functioncall | IDENTIFIER | arraylit | arrayaccess
functioncall   -> IDENTIFIER LPAREN (RPAREN | expr (COMMA expr)* RPAREN)
arrayaccess    -> IDENTIFIER LBRACKET expr RBRACKET
arraylit       -> LBRACE (RBRACE | expr (, expr)* RBRACE)

;; declaration    -> type IDENTIFIER (LBRACKET expr RBRACKET)? | funcptr
;; funcptr        -> type LPAREN STAR IDENTIFIER RPAREN LPAREN (type (LBRACKET RBRACKET)? (COMMA type (LBRACKET RBRACKET)?)*)? RPAREN
type           -> (UNSIGNED | STRUCT | CONST)* (One of the defined Types :P )

blockstatement -> ((vardecl | expr | return | break | continue | instrinsic | typedef | structdecl)? SEMICOLON | block | for | while | if)
for            -> FOR LPAREN (vardecl | expr) SEMICOLON expr SEMICOLON expr RPAREN blockstatement
while          -> WHILE LPAREN expr RPAREN blockstatement
if             -> IF LPAREN expr RPAREN blockstatement (ELSE blockstatement)?
block          -> LBRACE (blockstatment)* RBRACE
return         -> RETURN (expr)? SEMICOLON
break          -> BREAK SEMICOLON
continue       -> CONTINUE SEMICOLON
intrinsic      -> INTRINSIC LPAREN (RPAREN | expr (COMMA expr)* RPAREN)

program        -> (statement)*
statement      -> ((vardecl | funcdecl | structdecl | typedef)? SEMICOLON)
typedef        -> TYPEDEF (type | structdecl) IDENTIFIER
structdecl     -> STRUCT IDENTIFIER? LBRACE (vardecl SEMICOLON)* RBRACE
vardecl        -> type vardecl' (, vardecl')*
vardecl'       -> STAR* IDENTIFIER (LBRACKET (expr)? RBRACKET)? (= expr)?
funcdecl       -> type IDENTIFIER LPAREN (RPAREN | funcdecl' (COMMA funcdecl')* RPAREN) (SEMICOLON | block)
funcdecl'      -> type IDENTIFIER (LBRACKET RBRACKET)?

Compiler

Execution environment

Stack

The stack starts at 1 and grows upwards.

Simulated Registers

Some memory addresses are reserved to be used as registers by the compiler.

PushPopPointerPosition	1048516
StackPointerPosition	1048504
FramePointerPosition	1048508
LastAddrPointerPosition	1048512
General Purpose registers	1048520
	1048524
	1048528
	1048532
	1048536
	1048540
	1048544
	1048548
Return Register Positions	1048552
	1048556
InstructionPointer	1048500

Our Instructionpointer

We don’t assemble to bytecode. The Mima assembly Interpreter we use will use the InstructionPointer memory location as an instruction pointer thus allowing for direct reading of that value.

This makes it trivial to push the current instruction pointer to the stack even without resolving that address in the assembler.

In the future we might assemble to proper MIMA bytecode and then this step will not be necessary anymore.

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
compiler		compiler
interpreter		interpreter
src		src
.gitignore		.gitignore
AST.cs		AST.cs
CParser.cs		CParser.cs
Lexer.cs		Lexer.cs
MimaC.cs		MimaC.cs
Preprozessor.cs		Preprozessor.cs
README.org		README.org
Token.cs		Token.cs
Util.cs		Util.cs
mima-c.csproj		mima-c.csproj
mima-c.csproj.user		mima-c.csproj.user
mima-c.sln		mima-c.sln

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Readme

Table of Contents

Overview

Supported Featureset:

Parser

Interpreter

Compiler

Grammar

Compiler

Execution environment

Stack

Simulated Registers

Our Instructionpointer

About

Contributors 2

Languages

BertilBraun/mima-c

Folders and files

Latest commit

History

Repository files navigation

Readme

Table of Contents

Overview

Supported Featureset:

Parser

Interpreter

Compiler

Grammar

Compiler

Execution environment

Stack

Simulated Registers

Our Instructionpointer

About

Resources

Stars

Watchers

Forks

Contributors 2

Languages