The Futhark compiler is a significant body of code with a not entirely straightforward design. The main source of documentation is the Haddock comments in the source code itself, including this general overview of the compiler architecture.
To build the compiler, you need a recent version of
GHC, which can be installed via
ghcup. Alternatively, if you
install Nix then you
can run nix-shell
to get a shell environment in which all necessary
tools are installed.
After that, run make docs
to generate internal compiler
documentation in HTML format. The last few lines of output will tell
you the name of an index.html
file which you should then open. Go
to the documentation for the module named Futhark
, which contains an
introduction to the compiler architecture.
For contributing code, see the Haskell style guide.
If you feel that the documentation is incomplete, or something lacks an explanation, then feel free to report it as an issue. Documentation bugs are bugs too.
We include a Makefile
with the following targets.
-
make build
(or justmake
) builds the compiler. -
make install
builds the compiler and copies the resulting binaries to$HOME/.local/bin
, or$PREFIX/bin
if thePREFIX
environment variable is set. -
make docs
builds internal compiler documentation. For the user documentation, see thedocs/
subdirectory. -
make check
style-checks all code. Requires GNU Parallel. -
make check-commit
style-checks all code staged for a commit. Requires GNU Parallel.
You can also use cabal
directly if you are familiar with it. In
particular, cabal run futhark -- args...
is useful for running the
Futhark compiler with the provided args.
Asking GHC to generate profiling information is useful not just for the obvious purpose of gathering profiling information, but also so that stack traces become more informative. Run
make configure-profile
to turn on profiling. This setting will be stored in the file
cabal.project.local
and all future builds will be with profiling
information. Note that the compiler runs significantly slower this
way.
To produce a profiling report when running the compiler, add +RTS -p
to the end command line.
See also the chapter on profiling in the GHC User's Guide.
Note that GHCs code generator is sometimes slightly buggy in its handling of profiled code. If you encounter a compiler crash with an error message like "PAP object entered", then this is a GHC bug.
By default, Haskell does not produce very good stack traces. If you
compile with make configure-profile
as mentioned above, you can pass
+RTS -xc
to the Futhark compiler in order to get better stack
traces. You will see that you actually get multiple stack traces,
as the Haskell runtime system will print a stack trace for every
signal it receives, and several of these occur early, when the program
is read from disk. Also, the final stack trace is often some
diagnostic artifact. Usually the second-to-last stack trace is what
you are looking for.
This command tests compilation without compiling the generated C code, which speeds up testing for internal compiler errors:
futhark test -C tests --pass-compiler-option=--library
cabal run unit -- -p '/reshape . fix . iota 3d/'
The argument to -p
is the name of the test that fails, as reported
by cabal test
. You may have to scroll through the output a bit to
find it.
The Futhark compiler uses a typed core language, and the type checker
is run after every pass. If a given pass produces a program with
inconsistent typing, the compiler will report an error and
abort. While not every compiler bug will manifest itself as a core
language type error (unfortunately), many will. To write the erroneous
core program to filename
in case of type error, pass -vfilename
to
the compiler. This will also enable verbose output, so you can tell
which pass fails. The -v
option is also useful when the compiler
itself crashes, as you can at least tell where in the pipeline it got
to.
Hacking on the compiler will often involve inspecting the quality of
the generated code. The recommended way to do this is to use
futhark c
or futhark opencl
to compile a Futhark program to an
executable. These backends insert various forms of instrumentation
that can be enabled by passing run-time options to the generated
executable.
- As a first resort, use
-t
option to use the built-in runtime measurements. A nice trick is to pass-t /dev/stderr
, while redirecting standard output to/dev/null
. This will print the runtime on the screen, but not the execution result. - Optionally use
-r
to ask for several runs, e.g.-r 10
. If combined with-t
, this will cause several runtimes to be printed (one per line). - Pass
-D
to have the program print information on allocation and deallocation of memory. - (
futhark opencl
andfuthark cuda
only) Use the-D
option to enable synchronous execution.clFinish()
or the CUDA equivalent will be called after most OpenCL operations, and a running log of kernel invocations will be printed. At the end of execution, the program prints a table summarising all kernels and their total runtime and average runtime.
For debugging specific compiler passes, the futhark dev
subcommand
allows you to tailor your own compilation pipeline using command line
options. It is also useful for seeing what the AST looks like after
specific passes.
You can set the level of debug verbosity via the environment variable
FUTHARK_COMPILER_DEBUGGING
. It has the following effects:
-
FUTHARK_COMPILER_DEBUGGING=1
:- The frontend prints internal names. (This may affect code generation in some cases, so turn it off when actually generating code.)
- Tools that talk to server-mode executables will print the messages sent back and forth on the standard error stream.
-
FUTHARK_COMPILER_DEBUGGING=2
:- All of the effects of
FUTHARK_COMPILER_DEBUGGING=1
. - The frontend prints explicit type annotations.
- All of the effects of
You can run the various compiler passes in whatever order you wish. There are also various shorthands for running entire standard pipelines:
--gpu
: pipeline used for GPU backends (stopping just before adding memory information).--gpu-mem
: pipeline used for GPU backends, with memory information. This will show the IR that is passed to ImpGen.--seq
: pipeline used for sequential backends (stopping just before adding memory information).--seq-mem
: pipeline used for sequential backends, with memory information. This will show the IR that is passed to ImpGen.--mc
: pipeline used for multicore backends (stopping just before adding memory information).--mc-mem
: pipeline used for multicore backends, with memory information. This will show the IR that is passed to ImpGen.
By default, futhark dev
will print the resulting IR. You can switch to
a different action with one of the following options:
--compile-imp-seq
: generate sequential ImpCode and print it.--compile-imp-gpu
: generate GPU ImpCode and print it.--compile-imp-multicore
: generate multicore ImpCode and print it.
You must use the appropriate pipeline as well (e.g. --gpu-mem
for
--compile-imp-gpu
).
You can also use e.g. --backend=c
to run the same code generation
and compilation as futhark c
. This is useful for experimenting with
other compiler pipelines, but still producing an executable or
library.
When using the cuda
backend, you can use the --dump-ptx
runtime
option to dump PTX, a kind of high-level assembly for NVIDIA GPUs,
corresponding to the GPU kernels. This can be used to investigate why
the generated code isn't running as fast as you expect (not fun), or
even whether NVIDIAs compiler is miscompiling something (extremely not
fun). With the OpenCL backend, --dump-opencl-binary
does the same
thing.
On AMD platforms, --dump-opencl-binary
tends to produce an actual
binary of some kind, and it is pretty tricky to obtain a debugger for it
(they are available and open source, but the documentation and
installation instructions are terrible). Instead, AMDs OpenCL kernel
compiler accepts a -save-temps=foo
build option, which will make it
write certain intermediate files, prefixed with foo
. In particular, it
will write an .s
file that contains what appears to be HSA assembly
(at least when using ROCm). If you find yourself having to do do this,
then you are definitely going to have a bad day, and probably evening
and night as well.
Sometimes you have a program that produces the wrong results rather
than crashing the compiler. These are some of the most difficult bugs
to handle. If the result is at least deterministic and you have some
way of compiling the program that does work (either an older version
or a different backend), then the following procedure is useful for
reducing the program as much as possible. Suppose that we are trying
to debug a miscompilation for the opencl
backend where the c
backend works, the failing program is prog.fut
, and the input data
is prog.in
. Write the following script test.sh
:
set -x
set -e
futhark c prog.fut -o prog-c
futhark opencl prog.fut -o prog-opencl
cat prog.in | ./prog-c -b > output-c
cat prog.in | ./prog-opencl -b > output-opencl
futhark datacmp output-c output-opencl
This compares the results obtained from running the program with the
two compilers. You can now (manually) start removing parts of
prog.fut
while regularly rerunning test.sh
to verify that it still
fails. In particular, you can easily remove program return values,
which is not the case if you are comparing against a fixed expected
output. Eventually you will have a hopefully small program that
produces different results with the two compilers, and you can look in
detail at the IR to figure out what goes wrong.
Some passes can prettyprint internal representations in GraphViz format. For example, to see the fusion graph (prior to fusion), do
$ futhark dev -e --inline-aggr -e foo.fut --fusion-graph > foo.dot
and then to render foo.dot
as foo.dot.pdf
with GraphViz:
$ dot foo.dot -Tpdf -O
Oclgrind is an OpenCL simulator similar to Valgrind that can help find memory and synchronisation errors. It runs code somewhat slowly, but it allows testing of OpenCL code on systems that are not otherwise capable of executing OpenCL.
It is very easy to run a program in Oclgrind:
oclgrind ./foo
For use in futhark test
, we have a wrapper
script that returns with a nonzero exit
code if Oclgrind detects a memory error. You use it as follows:
futhark test foo.fut --backend=opencl --runner=tools/oclgrindrunner.sh
Some versions of Oclgrind have an unfortunate habit of generating code they don't know how to execute. To work around this, disable optimisations in the OpenCL compiler:
futhark test foo.fut --backend=opencl --runner=tools/oclgrindrunner.sh --pass-option=--build-option=-O0
The futhark script
command is a handy way to run (server-mode)
executables with arbitrary input, while also seeing logging output in
real time. This is particularly useful for programs whose benchmarking
input are complicated FutharkScript expressions.
If you have a program infinite.fut
containing
entry main n = iterate 1000000000 (map (+1)) (iota n)
then you can run
$ futhark script -D infinite.fut 'main 10i64'
to run it with debug prints. You can also use -L
instead of -D
to
just enable logging. The main 10i64
can be an arbitrary FutharkScript
expression.
The above will compile infinite.fut
using the c
backend before
running it. Pass a --backend
option to futhark script
to use a
different backend, or pass an already compiled program instead of a
.fut
file (e.g., infinite
).
See the manpages for futhark script
and futhark literate
for more
information.