Skip to content

Releases: diku-dk/futhark

nightly

20 Nov 21:55
Compare
Choose a tag to compare
nightly Pre-release
Pre-release

Commits

  • f346bf4: Better tracing of AD values in interpreter. (Troels Henriksen)

0.25.24

11 Nov 14:54
Compare
Choose a tag to compare

Added

  • futhark doc now produces better (and stable) anchor IDs.

  • futhark profile now supports multiple JSON files.

  • futhark fmt, by William Due and Therese Lyngby.

  • Lambdas can now be passed as the last argument to a function application.

Fixed

  • Negation of floating-point positive zero now produces a negative
    zero.

  • Necessary inlining of functions used inside AD constructs.

  • A compile time regression for programs that used higher order
    functions very aggressively.

  • Uniqueness bug related to slice simplification.

0.25.23

15 Oct 08:33
Compare
Choose a tag to compare

Added

  • Trailing commas are now allowed for arrays, records, and tuples in
    the textual value format and in FutharkScript.

  • Faster floating-point atomics with OpenCL backend on AMD and NVIDIA
    GPUs. This affects histogram workloads.

  • AD is now supported by the interpreter (thanks to Marcus Jensen).

Fixed

  • Some instances of invalid copy removal. (Again.)

  • An issue related to entry points with nontrivial sizes in their
    arguments, where the entry points were also used as normal functions
    elsewhere. (#2184)

0.25.22

10 Sep 08:15
Compare
Choose a tag to compare

Added

  • futhark script now supports an -f option.

  • futhark script now supports the builtin procedure $store.

Removed

Changed

Fixed

  • An error in tuning file validation.

  • Constant folding for loops that produce floating point results could
    result in different numerical behaviour.

  • Compiler crash in memory short circuiting (#2176).

0.25.21

01 Sep 13:24
Compare
Choose a tag to compare

Added

  • Logging now prints more GPU information on context initialisation.

  • GPU cache size can now be configured (tuning param: default_cache).

  • GPU shared memory can now be configured (tuning param: default_shared_memory).

  • GPU register capacity can now be configured.

  • futhark script now accepts a -b option for producing binary
    output.

Fixed

  • Type names for element types of array indexing functions in C
    interface are now often better - although there are still cases
    where you end up with hashed names. (#2172)

  • In some cases, GPU failures would not be reported properly if a
    previous failure was pending.

  • auto output didn't work if the .fut file did not have any path
    components.

  • Improved detection of malformed tuning files.

0.25.20

15 Aug 18:51
Compare
Choose a tag to compare

Added

  • Better error message when in-place updates fail at runtime due to a
    shape mismatch.

Fixed

  • #[unroll] on an outer loop now no longer causes unrolling of all
    loops nested inside the loop body.

  • Obscure issue related to replications of constants in complex
    intrablock kernels.

  • Interpreter no longer crashes on attributes in patterns.

  • Fixes to array indexing through C API when using GPU backends.

0.25.19

26 Jul 17:11
Compare
Choose a tag to compare

Added

  • The compiler now does slightly less aggressive inlining. Use the
    #[inline] attribute if you want to force inlining of some
    function.

  • Arrays of opaque types now support indexing through the C API.
    Arrays of records can also be constructed. (#2082)

Fixed

  • The opencl backend now always passes
    -cl-fp32-correctly-rounded-divide-sqrt to the kernel compiler, in
    order to match CUDA and HIP behaviour.

0.25.18

19 Jul 09:54
Compare
Choose a tag to compare

Added

  • New prelude function: rep, an implicit form of replicate.

  • Improved handling of large monomorphic single-dimensional array
    literals (#2160).

Fixed

  • futhark repl no longer asks for confirmation on EOF.

  • Obscure oversight related to abstract size-lifted types (#2120).

  • Accidential exponential-time algorithm in layout optimisation for
    multicore backends (#2151).

0.25.17

12 Jun 08:49
Compare
Choose a tag to compare
  • Faster device-to-device copies on CUDA.

  • "More correctly" detect L2 cache size for OpenCL backend on AMD GPUs.

Fixed

  • Handling of .. in import paths (again).

  • Detection of impossible loop parameter sizes (#2144).

  • Rare case where GPU histograms would use slightly too much shared
    memory and fail at run-time.

  • Rare crash in layout optimisation.

0.25.16

01 May 11:49
Compare
Choose a tag to compare

Added

  • futhark test: --no-terminal now prints status messages even when
    no failures occur.

  • futhark test no longer runs structure tests by default. Pass
    -s to run them.

  • Rewritten array layout optimisation pass by Bjarke Pedersen and
    Oscar Nelin. Minor speedup for some programs, but is more
    importantly a principled foundation for further improvements.

  • Better error message when exceeding shared memory limits.

  • Better dead code removal for the GPU representation (minor impact on
    some programs).

Fixed

  • Bugs related to deduplication of array payloads in sum types.
    Unfortunately, fixed by just not deduplicating in those cases.

  • Frontend bug related to turning size expressions into variables
    (#2136).

  • Another exotic monomorphisation bug.