CIR is a C source-to-source compiler that comes with a JIT. This JIT enables arbitrary functions to be evaluated at compile time. Additionally, compile-time-evaluated functions can manipulate code as data and call-back into the compiler API, enabling compile-time metaprogramming to be performed.
This project is being developed as an undergraduate final year project at the National University of Singapore.
|
This project is still in its infancy and under heavy development. Don’t expect to be able to use this in real C projects yet. |
Currently the JIT only compiles for x86-64 systems that use the System V ABI. This means it currently only works on Linux (and theoretically Mac OS as well).
Only GCC/Clang (a version that supports the C11 standard) is required to compile CIR. MSVC is not supported as the source code currently uses some GCC extensions.
Clone the repository:
git clone https://github.com/pyokagan/cir.git
and compile it:
cd cir make
Here is a program which prints the 20th number in the Fibonacci sequence:
fib.c
#include <stdio.h>
static int fib(int n) {
int a = 0, b = 1, i = 0;
while (i < n) {
int c = b;
b = a + b;
a = c;
i = i + 1;
}
return a;
}
int main(void) {
printf("The answer is: %d\n", fib(20));
return 0;
}
To use it with CIR, we first need to pass it through the C preprocessor to process #includes
:
gcc -E -o fib.cpp.c fib.c
💡
|
CIR requires source files to first be run through the C preprocessor. For the rest of this document we will assume that this is done. |
We can then pass the preprocessed file (fib.cpp.c
) into CIR:
./cir fib.cpp.c >fib.output.c
The output from CIR is:
fib.output.c
static int vid57_fib(int vid58_n)
{
int vid59_a;
int vid60_b;
int vid61_i;
int vid62_c;
int vid63;
int vid64;
vid59_a = 0; /* sid1 */
vid60_b = 1; /* sid2 */
vid61_i = 0; /* sid3 */
sid4:
if (vid61_i < vid58_n) goto sid6; /* sid4 */
goto sid13; /* sid5 */
sid6:
vid62_c = vid60_b; /* sid6 */
vid63 = vid59_a + vid60_b; /* sid7 */
vid60_b = vid63; /* sid8 */
vid59_a = vid62_c; /* sid9 */
vid64 = vid61_i + 1; /* sid10 */
vid61_i = vid64; /* sid11 */
goto sid4; /* sid12 */
sid13:
/* nop */; /* sid13 */
return vid59_a; /* sid14 */
}
extern int printf(char *__restrict __format, ...);
int main(void)
{
int vid66;
int vid67;
vid66 = vid57_fib(20); /* sid15 */
vid67 = printf("The answer is: %d\n", vid66); /* sid16 */
return 0; /* sid17 */
}
As we can see, CIR has compiled fib.cpp.c
into a three-address-code-like representation that is a subset of the C programming language.
This representation is called "CIR", meaning "C Immediate Language".
While it may seem at first that the CIR representation is horribly inefficient,
modern optimizing compilers such as GCC will actually compile both fib.c
and fib.output.c
to the same assembly code:
.LC0:
.string "The answer is: %d\n"
main:
sub rsp, 8
mov eax, 20
mov esi, 1
mov edx, 0
jmp .L2
.L3:
mov esi, ecx
.L2:
lea ecx, [rdx+rsi]
mov edx, esi
sub eax, 1
jne .L3
mov edi, OFFSET FLAT:.LC0
mov eax, 0
call printf
mov eax, 0
add rsp, 8
ret
However, as can be seen from the assembly listing,
we are actually still computing the value of fib(20)
at runtime.
Can we do better?
CIR extends the C programming language with the compile-time evaluation operator, @
.
Function calls that are prefixed with @
will be evaluated at compile-time.
Here is the modified fib.c
source file with the @
operator added:
#include <stdio.h>
static int fib(int n) {
int a = 0, b = 1, i = 0;
while (i < n) {
int c = b;
b = a + b;
a = c;
i = i + 1;
}
return a;
}
int main(void) {
printf("The answer is: %d\n", @fib(20)); // @ operator added
return 0;
}
CIR now outputs:
extern int printf(char *__restrict __format, ...);
int main(void)
{
int vid66;
vid66 = printf("The answer is: %d\n", 6765); /* sid15 */
return 0; /* sid16 */
}
As we can see, the call to fib(20)
has been replaced with the constant 6765
,
which is indeed the 20th number in the fibonacci sequence.
So what happened?
CIR JIT-compiled the fib()
function into X86-64 machine code, executed it, and then inlined the result (6765
) into the callsite.
The JIT is a full-featured C compiler [1]. This means that you can use any C language construct you want, such as conditionals, loops, calling other functions etc.
Furthermore, JIT-compiled code can call external functions and libraries.
This includes C standard library APIs such as malloc()
, free()
, fopen()
, fwrite()
, printf()
etc.
For example, here is a compile-time function that reads a file using C standard library APIs and returns it as a string constant:
#include <stdio.h>
#include <stdlib.h>
#include "../cir.h" // include compiler API
// Reads a file and returns it as a string constant
static CirCodeId readFile(char *path) {
FILE *fp = fopen(path, "r");
if (!fp)
cir_fatal("failed to open %s", path);
fseek(fp, 0, SEEK_END);
size_t len = ftell(fp);
fseek(fp, 0, SEEK_SET);
char *buffer = malloc(len + 1);
fread(buffer, len, 1, fp);
buffer[len] = 0;
fclose(fp);
return CirCode_ofExpr(CirValue_ofString(buffer, len + 1));
}
int main(void) {
puts(@readFile("fileToBeRead.txt"));
return 0;
}
However, this also means that the compile-time evaluation may not halt, or may even crash. With great power comes great responsibility, developers need to exercise care when writing compile-time functions.
Code evaluated at compile-time can call back into the compiler API.
CIR will examine the type of compile-time-evaluated functions:
-
When a compile-time function declares its argument(s) to take a code object (
CirCodeId
), CIR will pass the raw code (in IR form) into the function. -
When a compile-time function declares its return type to be a code object (
CirCodeId
), CIR will inline the returned code object as-is into the call site.
For example, here is a function that receives CirCodeId
as an argument,
and examines the IR contained within:
#include <stdbool.h>
#include "../cir.h" // include compiler API
// Returns true if code calls a function, otherwise returns false
static bool callsAFunction(CirCodeId code) {
CirStmtId stmt = CirCode_getFirstStmt(code);
while (stmt) {
if (CirStmt_isCall(stmt))
return true;
stmt = CirStmt_getNext(stmt);
}
return false;
}
And can be used as follows:
@callsAFunction(puts("Hi")); // evaluates to 1
@callsAFunction(42); // evaluates to 0
ℹ️
|
Notice how a simple |
Here is another function that returns a code object containing a string constant.
#include <stdio.h>
#include "cir.h" // include compiler API
static CirCodeId generateCode() {
return CirCode_ofExpr(CirValue_ofCString("Inlined String"));
}
int main(void) {
puts(@generateCode());
return 0;
}
And the result is:
extern int puts(char *__s);
int main(void)
{
int vid260;
vid260 = puts("Inlined String"); /* sid4 */
return 0; /* sid5 */
}
This project was initially prototyped with CIL before being ported to C. Certain parts of CIL still remain, such as the representation of types, structs, typedefs, attributes etc.