Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
johnwickerson authored Jan 29, 2024
2 parents 88bfd1e + 5e9c533 commit bd0bb77
Show file tree
Hide file tree
Showing 30 changed files with 894 additions and 126 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1 +1,7 @@
bin/
.vscode
*.o
*.tab.hpp
*.tab.cpp
*.yy.cpp
*.output
29 changes: 25 additions & 4 deletions Makefile
Original file line number Diff line number Diff line change
@@ -1,12 +1,28 @@
CPPFLAGS += -std=c++20 -W -Wall -g -I include
CPPFLAGS += -std=c++20 -W -Wall -g -Wno-unused-parameter -Wno-unused-variable -Wno-unused-function -I include

.PHONY: default
CPPFILES := $(wildcard src/*.cpp)
DEPENDENCIES := $(patsubst %.cpp,%.d,$(CPPFILES))
-include $(DEPENDENCIES)
OBJFILES := $(patsubst %.cpp,%.o,$(CPPFILES))
OBJFILES += src/lexer.yy.o src/parser.tab.o


.PHONY: default clean with_coverage coverage

default: bin/c_compiler

bin/c_compiler : src/cli.cpp src/compiler.cpp
bin/c_compiler : $(OBJFILES)
@mkdir -p bin
g++ $(CPPFLAGS) -o bin/c_compiler $^
g++ $(CPPFLAGS) -o $@ $^

%.o: %.cpp Makefile
g++ $(CPPFLAGS) -MMD -MP -c $< -o $@

src/parser.tab.cpp src/parser.tab.hpp: src/parser.y
bison -v -d src/parser.y -o src/parser.tab.cpp

src/lexer.yy.cpp : src/lexer.flex src/parser.tab.hpp
flex -o src/lexer.yy.cpp src/lexer.flex

with_coverage : CPPFLAGS += --coverage
with_coverage : bin/c_compiler
Expand All @@ -25,3 +41,8 @@ clean :
@rm -rf coverage
@find . -name "*.o" -delete
@rm -rf bin/*
@rm -f src/*.tab.hpp
@rm -f src/*.tab.cpp
@rm -f src/*.yy.cpp
@rm -f src/*.output

3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,10 +35,9 @@ Changelog
* Directly linked to ANSI C parser and lexer.
* Added a "Getting started" guide and incorporated last year's feedback from Ed.
* Changed the 10% of the grade (previously only for time management) to also account for code design to reward thoughtful planning.
* Improved the skeleton compiler to be more advanced by providing lexer and parser to hopefully jump-start progress and avoid unnecessary debugging. [WIP]
* Improved the skeleton compiler to be more advanced by integrating lexer and parser to hopefully jump-start progress and avoid unnecessary debugging.
* Covered assembler directives in more details by showcasing the meaning behind an example assembly program, because that topic had always caused confusion in the past years. [WIP]


* New for 2022/2023:

* Target architecture is now RISC-V rather than MIPS, in order to align with the modernised Instruction Architectures half of the module.
Expand Down
2 changes: 1 addition & 1 deletion compiler_tests/_example/example.c
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
int f()
{
return 5;
return;
}
11 changes: 11 additions & 0 deletions docs/assembler_directives.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Assembler directives
====================
"The assembler implements a number of directives that control the assembly of instructions into an object file. These directives give the ability to include arbitrary data in the object file, control exporting of symbols, selection of sections, alignment of data, assembly options for compression, position dependent and position independent code" - quote from [RISC-V Assembler Reference](https://michaeljclark.github.io/asm.html).

The linked guide explains in details all available directives, but fortunately you only need a very small subset to start with and even the more advanced features only require a few additional directives. While [Godbolt](https://godbolt.org/z/vMMnWbsff) emits some directives, to see all of them (more than you actually need) you are advised to run:

```riscv64-unknown-elf-gcc -std=c90 -pedantic -ansi -O0 -march=rv32imfd -mabi=ilp32d -S [source-file.c] -o [dest-file.s]```.

The below picture offers a quick walk-through of a very simple program with detailed annotations describing the meaning behind the included directives. Some of them a crucial (e.g. section specifiers, labels, data emitting) while others not so much (e.g. file attributes, compiler identifier, symbol types) - you will get a feel for them during the development of the compiler. Most importantly, you only need to set the correct section and provide function directives as long as you deal with local variables. **In other words, you can postpone studying this document in details until you decide to deal with global variables.**

![Assembler directives](./assembler_directives.png)
Binary file added docs/assembler_directives.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
22 changes: 22 additions & 0 deletions docs/basic_compiler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Basic compiler
==============

For the first time ever, you are provided with a basic compiler that can lex, parse and generate (incorrect) code for the following program:
```
int f() {
return;
}
```

The output assembly is hardcoded, so that the basic compiler passes one of the provided test cases. However, having a functioning compiler should allow you to hopefully jump-start the development of the actually interesting parts of this coursework while avoiding the common early pitfalls that students have faced in previous years. It should also allow you to better understand the underlying C90 grammar and have an easier time when adding new features.

The provided basic compiler is able to traverse the following AST related to the above program. In order to expand its capabilities, you should develop the parser and the corresponding code generation at the same time - do not try to fully implement one before the other.

![int_main_return_tree](./int_main_return_tree.png)


The lexer and parser are loosely based on the "official" grammar covered [here](https://www.lysator.liu.se/c/ANSI-C-grammar-l.html) and [here](https://www.lysator.liu.se/c/ANSI-C-grammar-y.html) respectively. While they should suffice for a significant portions of features, you might need to improve them to implement the more advanced ones. If you find the grammar too complicated to understand, it is also perfectly fine to create your own simple grammar and build upon it as you add more features.

You can follow the patterns introduced for the code generation part of the basic compiler, but you might find adjusting them to your needs be better in the long run. You are recommended to follow the coding style that best suits you while hopefully picking strong design skills throughout the development of your compiler.


32 changes: 16 additions & 16 deletions docs/c_compiler.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Main coursework: A compiler for the C language
==============================================

Your program should read C source code from a file, and write RISC-V assembly to another file.
Your program should read C source code from a file, and write corresponding RISC-V assembly to another file.

Environment
-----------
Expand All @@ -10,26 +10,25 @@ Environment
Developing your compiler
------------------------

If you wish to use C++, then a basic framework for building your compiler has been provided.
If you wish to use C++, then a basic framework for building your compiler has been provided. You are strongly recommended to check out its structure [here](./basic_compiler.md).

Source files can be found in the [./src](../src) folder and header files can be found in the [./include](../include) folder.

You can test your compiler against the provided test-suite by running `./test.sh` from the top of this repo; the output should look as follows:
You can test your compiler against the provided test-suite by running [`./test.sh`](../test.sh) from the top of this repo; the output should look as follows:

```console
root@host:/workspaces/langproc-env# ./test.sh

g++ -std=c++20 -W -Wall -g -I include -o bin/c_compiler src/cli.cpp src/compiler.cpp

> ./test.sh
>
compiler_tests/_example/example.c
> Pass
compiler_tests/array/declare_global.c
> Fail: simulation did not exit with exit-code 0
...
```

By default, the first `_example/example.c` test should be passing.
By default, the first [`_example/example.c`](../compiler_tests/_example/example.c) test should be passing.

This basic framework ignores the source input file and always produces the same assembly, which loads the value `5` into `a0`.
This basic framework is only able to compile a very simple program, as described [here](./basic_compiler.md).

Program build and execution
---------------------------
Expand All @@ -47,16 +46,14 @@ You can assume that the command-line arguments will always be in this order, and
Input
-----

The input file will be pre-processed [ANSI C](https://en.wikipedia.org/wiki/ANSI_C), also called C90 or C89. It's what's generally thought of as "classic" or "normal" C, but not the _really_ old one without function prototypes (you may never have come across that). C90 is still often used in embedded systems, and pretty much the entire Linux kernel is in C90.
The input file will be pre-processed [ANSI C](https://en.wikipedia.org/wiki/ANSI_C), also called C90 or C89. It is what is generally thought of as "classic" or "normal" C, but not the _really_ old one without function prototypes (you may never have come across that). C90 is still often used in embedded systems, and pretty much the entire Linux kernel is in C90.

You've mainly been taught C++, but you're probably aware of C as a subset of C++ without classes, which is a good mental model. Your programs (lexer, parser and compiler) will never be given code that has different parsing or execution semantics under C and C++ (so, for example, I won't give you code that uses `class` as an identifier).
You have mainly been taught C++, but you are probably aware of C as a subset of C++ without classes, which is a good mental model. Your programs (lexer, parser and compiler) will never be given code that has different parsing or execution semantics under C and C++ (so, for example, I will not give you code that uses `class` as an identifier).

The source code will not contain any compiler-specific or platform-specific extensions. If you pre-process a typical program (see later), you'll see many things such as `__attribute__` or `__declspec` coming from the system headers. You will not need to deal with any of these.
The source code will not contain any compiler-specific or platform-specific extensions. If you pre-process a typical program (see later), you will see many things such as `__attribute__` or `__declspec` coming from the system headers. You will not need to deal with any of these.

The test inputs will be a set of files of increasing complexity and variety. The test inputs will not have syntax errors or other programming errors, so your code does not need to handle these gracefully.

[This is the "official" C90 grammar](https://www.lysator.liu.se/c/ANSI-C-grammar-y.html), presented in the form of a Yacc parser file without any specific actions linked to each rule. There is also a [corresponding Lex lexer file](https://www.lysator.liu.se/c/ANSI-C-grammar-l.html) attached. You do not need to use everything that is in there, but it can help to give you an idea of the AST constructs that you need. If you find the grammar too complicated to understand, it is also perfectly fine to create your own simple grammar and build upon it as you add more features.

Features
-------

Expand Down Expand Up @@ -162,10 +159,13 @@ I then use spike to simulate the executable on RISC-V, like so:

This command should produce the exit code `0`.

Assembler directives
---------------
[You will need to consider assembler directives in your output](./assembler_directives.md)

Useful links
------------
* [Godbolt](https://godbolt.org/z/vMMnWbsff) - Great tool for viewing what a real (`gcc` in this case) RISC-V compiler would produce for a given snippet of C code. This link is pre-configured for the correct architecture (`RV32IMFD`) and ABI (`ILP32D`) that the coursework targets. Code optimisation is also disabled to best mimic what you might want your compiler to output. You can replicate Godbolt locally by running `riscv64-unknown-elf-gcc -std=c90 -pedantic -ansi -O0 -march=rv32imfd -mabi=ilp32d -S [source-file.c] -o [dest-file.s]`, which might make debugging easier for some.
* [Godbolt](https://godbolt.org/z/vMMnWbsff) - Great tool for viewing what a real (`gcc` in this case) RISC-V compiler would produce for a given snippet of C code. This link is pre-configured for the correct architecture (`RV32IMFD`) and ABI (`ILP32D`) that the coursework targets. Code optimisation is also disabled to best mimic what you might want your compiler to output. You can replicate Godbolt locally by running `riscv64-unknown-elf-gcc -std=c90 -pedantic -ansi -O0 -march=rv32imfd -mabi=ilp32d -S [source-file.c] -o [dest-file.s]`, which might make debugging and directives analysis easier for some.

* [Interactive RISC-V simulator](https://creatorsim.github.io/creator) - Might be helpful when trying to work out the behaviour of certain instructions that Godbolt emits.

Expand All @@ -175,7 +175,7 @@ Useful links

* [RISC-V Assembler Reference](https://michaeljclark.github.io/asm.html) - Very useful resource containing information about structuring your output assembly files and most importantly the assembler directives - if you don't know the meaning behind `.data`, `.text`, or `.word` then definitely check this out as well as experiment with Godbolt to see how it actually emits them.

Getting started
Getting started
---------------
[How to get started? (previous students' perspectives)](./starting_guide.md)

Expand Down
11 changes: 6 additions & 5 deletions docs/environment_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,10 @@ Many students develop their compiler in VS Code, as this has good support for co
### VS Code + Docker (the most popular option)

1) Install [Docker Desktop](https://www.docker.com/products/docker-desktop/). If you are on Apple M1/M2, make sure to choose the Apple Silicon download.
2) Open VS Code and install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension
3) Open the folder containing this file, in VS Code
2) Open VS Code and install the [Dev Containers](https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.remote-containers) extension.
3) Open the top folder of your repo (`/langproc-cw-.../`), in VS Code.
4) Open the Command Palette in VS Code. You can do this by the shortcut `Ctrl + Shift + P` on Windows or `Cmd + Shift + P` on Mac. Alternatively, you can access this from `View -> Command Palette`.
5) Enter `>Dev Containers: Reopen in Container` into the Command Palette
5) Enter `>Dev Containers: Reopen in Container` into the Command Palette.
6) After a delay -- depending on how fast your Internet connection can download ~1GB -- you will now be in the container environment. For those interested, VS Code reads the container configuration from the [.devcontainer/devcontainer.json](.devcontainer/devcontainer.json) file.
7) Test that your tools are correctly setup by running `./scripts/toolchain_test.sh` in the VS Code terminal, accessible via `Terminal -> New Terminal`. Your output should look as follows:

Expand All @@ -39,11 +39,12 @@ Many students develop their compiler in VS Code, as this has good support for co
> Warning for Windows users: if you are running Windows and use this method, you may experience errors related to the line endings of your files. Windows uses the special characters CRLF (`\r\n`) to represent the end of a line, whereas Linux uses just LF (`\n`). As such, if you edit these files on Windows they are most likely to be saved using CRLF. See if you can change your editor to use LF file endings or, even better, see if your editor supports [EditorConfig](https://editorconfig.org/), which standardises formatting across all files based on the [.editorconfig](.editorconfig) file in the same folder as this file.

1) Install [Docker](https://www.docker.com/products/docker-desktop/). If you are on Apple M1/M2, make sure to choose the Apple Silicon download.
2) Open a terminal (Powershell on Windows; Terminal on Mac) to the folder containing this file
2) Open a terminal (Powershell on Windows; Terminal on Mac) to the folder containing this file.
3) Inside that terminal, run `docker build -t compilers_image .`
4) Once that completes, run `docker run --rm -it -v "${PWD}:/code" -w "/code" --name "compilers_env" compilers_image`
4) Once that completes, run `docker run --rm -it -v "${PWD}:/code" -w "/code" --name "compilers_env" compilers_image`.
5) You should now be inside the LangProc tools container, where you can run `./scripts/toolchain_test.sh` inside the `/code` folder to check that your tools are working correctly. Note that the folder containing this file, as well as any subdirectories, are mounted inside this container under the path `/code`. The output of running the command should look as follows:


```console
root@ad12f00322f6:/code# ./scripts/toolchain_test.sh

Expand Down
Binary file added docs/int_main_return_tree.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
17 changes: 17 additions & 0 deletions include/ast.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
#ifndef AST_HPP
#define AST_HPP

#include <iostream>
#include <string>
#include <vector>

#include "ast_direct_declarator.hpp"
#include "ast_function_definition.hpp"
#include "ast_identifier.hpp"
#include "ast_jump_statement.hpp"
#include "ast_node.hpp"
#include "ast_type_specifier.hpp"

extern Node* parseAST(std::string file_name);

#endif
9 changes: 9 additions & 0 deletions include/ast_context.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#ifndef AST_CONTEXT
#define AST_CONTEXT

// An object of class Context is passed between AST nodes during compilation to provide adequate context
class Context {
/* TODO decide what goes inside here */
};

#endif
13 changes: 13 additions & 0 deletions include/ast_direct_declarator.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#ifndef AST_DIRECT_DECLARATOR
#define AST_DIRECT_DECLARATOR

#include "ast_node.hpp"

class DirectDeclarator : public Node {
public:
DirectDeclarator(Node* identifier);
~DirectDeclarator() {};
void emitRISC(std::ostream &stream, Context &context) const;
};

#endif
13 changes: 13 additions & 0 deletions include/ast_function_definition.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#ifndef AST_FUNCTION_DEFINITION_HPP
#define AST_FUNCTION_DEFINITION_HPP

#include "ast_node.hpp"

class FunctionDefinition : public Node {
public:
FunctionDefinition(Node* declaration_specifiers, Node* declarator, Node* compound_statement);
~FunctionDefinition() {};
void emitRISC(std::ostream &stream, Context &context) const;
};

#endif
15 changes: 15 additions & 0 deletions include/ast_identifier.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#ifndef AST_IDENTIFIER
#define AST_IDENTIFIER

#include "ast_node.hpp"

class Identifier : public Node {
private:
std::string* identifier;
public:
Identifier(std::string* _identifier) : identifier(_identifier) {};
~Identifier() {delete identifier;};
void emitRISC(std::ostream &stream, Context &context) const;
};

#endif
13 changes: 13 additions & 0 deletions include/ast_jump_statement.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#ifndef AST_JUMP_STATEMENT
#define AST_JUMP_STATEMENT

#include "ast_node.hpp"

class JumpStatement : public Node {
public:
JumpStatement() {};
~JumpStatement() {};
void emitRISC(std::ostream &stream, Context &context) const;
};

#endif
19 changes: 19 additions & 0 deletions include/ast_node.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
#ifndef AST_NODE_HPP
#define AST_NODE_HPP

#include <iostream>
#include <vector>

#include "ast_context.hpp"

class Node {
protected:
std::vector<Node*> branches;

public:
Node() {};
virtual ~Node();
virtual void emitRISC(std::ostream &stream, Context &context) const = 0;
};

#endif
15 changes: 15 additions & 0 deletions include/ast_type_specifier.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#ifndef AST_TYPE_SPECIFIER
#define AST_TYPE_SPECIFIER

#include "ast_node.hpp"

class TypeSpecifier : public Node {
private:
std::string type;
public:
TypeSpecifier(std::string _type) : type(_type) {};
~TypeSpecifier() {};
void emitRISC(std::ostream &stream, Context &context) const {};
};

#endif
2 changes: 1 addition & 1 deletion include/cli.h
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,6 @@
#include <iostream>
#include <unistd.h>

int parse_command_line_args(int argc, char **argv, std::string &sourcePath, std::string &outputPath);
int parseCommandLineArgs(int argc, char **argv, std::string &source_path, std::string &output_path);

#endif
1 change: 1 addition & 0 deletions src/ast_context.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
#include "ast_context.hpp"
11 changes: 11 additions & 0 deletions src/ast_direct_declarator.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
#include "ast_direct_declarator.hpp"

DirectDeclarator::DirectDeclarator(Node* identifier) {
branches.insert(branches.end(), {identifier});
}

void DirectDeclarator::emitRISC(std::ostream &stream, Context &context) const {
// Emit identifier
branches[0]->emitRISC(stream, context);
stream << ":" << std::endl;
}
13 changes: 13 additions & 0 deletions src/ast_function_definition.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#include "ast_function_definition.hpp"

FunctionDefinition::FunctionDefinition(Node* declaration_specifiers, Node* declarator, Node* compound_statement) {
branches.insert(branches.end(), {declaration_specifiers, declarator, compound_statement});
}

void FunctionDefinition::emitRISC(std::ostream &stream, Context &context) const {
// Emit declarator
branches[1]->emitRISC(stream, context);

// Emit compound_statement
branches[2]->emitRISC(stream, context);
}
5 changes: 5 additions & 0 deletions src/ast_identifier.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
#include "ast_identifier.hpp"

void Identifier::emitRISC(std::ostream &stream, Context &context) const {
stream << *identifier;
}
Loading

0 comments on commit bd0bb77

Please sign in to comment.