Releases: GlobalArrays/ga
Releases · GlobalArrays/ga
v5.8.2
[5.8.2]
- Known Bugs
- The MPI RMA port still shows spotty behavior and many tests in the test suite
are failing for many MPI implementations. Currently, the Open MPI
implementation in version 4.1.4 is working well and all tests are passing.
- The MPI RMA port still shows spotty behavior and many tests in the test suite
- Added
- Setting ARMCI_VERBOSE=1 at runtime will also dump configuration details for
ComEx runtime
- Setting ARMCI_VERBOSE=1 at runtime will also dump configuration details for
- Changed
- Updated compiler settings in CMake build if Fujitsu compilers are detected
- Fixed
- Fixed gcc toolchain checks in CMake for clang build
- Fixed tiled arrays so that they work with restricted arrays and fixed some
additional bugs in block cyclic distributions - Removed several memory leaks
- Modified check on the number of processors that was being performed in the GA
create process. Previously this check was failing since it was possible that
the check was being performed before a process group had been assigned to
global array. - Fixed some issues with hidden string length argument in fortran interface
v5.8.1
- Known Bugs
- Added
- Added support in MA for CUDA managed memory. Provided by Jeff Hammond.
- Added a GA_Deallocate function that deallocates memory but leaves GA in
place. GA_Allocate can be called later on the handle. This can be used for
memory management.
- Changed
- Fixed
- Slurm conflict for free_buf symbol in DRA library. Fixed by Michael Klemm.
- Deallocate GA_MPI_World_comm_dup in GA_Terminate.
- Dependency of CMake build on C++ is configurable.
- Improved CMake integration of linear algebra libraries
v5.8
- Known Bugs
- The MPI RMA port remains unreliable for many MPI implementations. Open MPI
still reports many failures in the test suit. Intel MPI is better but still
reports several failures. It is recommended to use the latest MPI
implementations available.
- The MPI RMA port remains unreliable for many MPI implementations. Open MPI
- Added
- Version function that can be used to report the current version, subversion
and patch numbers of the current release - Overlay option for creating new GAs on top of existing GAs
- The number of progress ranks per node in the progress ranks runtime is now
configurable - Functions for duplicating process groups and returning a process group that
only contains the calling process - 64-bit versions of block-cyclic data distribution functions to
C interface - Non-blocking test function
- Read-only property based on caching
- GA name can be recovered from handle
- Added profiling capabilities to the GA branch that automatically generates
a log file in the running directory. This can be controlled with GAW_FILE_PREFIX
environment variable to add a prefix for the log files and the GAW_FMT
environment variable to create a CSV format or human readable format. The
default format is human readable.- For autotools, add --enable-profile=1 in the configure line
- For CMake add -DENABLE_PROFILING=ON
- Version function that can be used to report the current version, subversion
- Changed
- Non-blocking handle management was completely revamped. This simplifies
implementation and removes some bugs. The number of outstanding non-blocking
calls was increased to 256 - Modified internal function that computes rank of processors on the world
communicator so that it does not use the MPI_Comm_translate_ranks function.
This function is implemented with a loop that scales as the square of the
number of processors and is very slow at large processor counts - modified internal iterators so that block cyclic data distributions work on
processor groups - Improved CMake build
- Modified ga_print_distribution so that it works on block-cyclic data
distributions
- Non-blocking handle management was completely revamped. This simplifies
- Fixed
- Fixed a non-blocking error that was showing up in nbtest.x
v5.7.2
v5.7.1
-
Added
- added NOUSE_MMAP for 32bit linux
-
Fixed
- pgcc: need to rename f77 object to cfortran_test.o to avoid pgcc overwriting conftest.o during linking
http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id2788 - fixes for ga_diag_std_seq 32-bit integer interface
- fix for MKL error "PDSTEDC parameter number 10 had an illegal value" http://www.nwchem-sw.org/index.php/Special:AWCforum/st/id2660
- fix for MPI-2 deprecated MPI_Type_struct and MPI_Errhandler_set
- pgcc: need to rename f77 object to cfortran_test.o to avoid pgcc overwriting conftest.o during linking
-
Closed Issues
- [#157] add -fallow-argument-mismatch for gfortran 10
v5.7
- Known Bugs
- Some combinations of MPI implementations with the MPI RMA and PR
ports fail. Recommended to use latest MPI implementations available.
- Some combinations of MPI implementations with the MPI RMA and PR
- Added
- Tiled data layout
- Read-only property type using replication across SMP nodes
- Changed
-
GA is now thread safe
-
MPI3 implementation based on MPI RMA now uses data types in MPI
calls by default. This is higher performing but not as reliable as
using multiple contiguous data transfers. The build can be
configured to use contiguous transfers if data types are not working
for your MPI implementation. -
ComEx MPI-PR now uses MPI data types in strided put and get calls
by default. To enable the old packed behavior, set the following
environment variables to 0.- COMEX_ENABLE_PUT_DATATYPE
- COMEX_ENABLE_GET_DATATYPE
Additionally, the original packing implementation is faster for smaller
messages. Two new environment variables control at which point the MPI
data types are used.- COMEX_PUT_DATATYPE_THRESHOLD. Default 8192.
- COMEX_GET_DATATYPE_THRESHOLD. Default 8192.
-
- Fixed
- Message sizes exceeding 2GB now work correctly
- Mirrored Arrays now distributes data across SMP nodes for
ComEx-based runtimes - Matrix multiply works for non-standard data layouts (may not be
performant)
- Closed Issues
- [#48] Message sizes exceeding 2GB may not work correctly
v5.6.5
- Known Bugs
- [#48] Message sizes exceeding 2GB may not work correctly
- Added
- Environment variables to control internal ComEx MPI-PR settings
- COMEX_MAX_NB_OUTSTANDING. Default 8.
The maximum number of concurrent non-blocking operations. - COMEX_STATIC_BUFFER_SIZE. Default 2097152 bytes.
Some ComEx operations require a temporary buffer. Any message larger than this size will dynamically allocate and free a new buffer to hold the larger message. - COMEX_EAGER_THRESHOLD. Default -1.
Small messages can be sent as part of other internal ComEx operations. Recommended to set this to less than or equal to the corresponding MPI eager/rendezvous threshold cutoff. - COMEX_ENABLE_PUT_SELF. Default 1 (on). Contiguous put will use memcpy when target is same as originator.
- COMEX_ENABLE_GET_SELF. Default 1 (on). Contiguous get will use memcpy when target is same as originator.
- COMEX_ENABLE_ACC_SELF. Default 1 (on). Contiguous acc will use memcpy when target is same as originator.
- COMEX_ENABLE_PUT_SMP. Default 1 (on). Contiguous put will use memcpy when target is on the same host via shared memory.
- COMEX_ENABLE_GET_SMP. Default 1 (on). Contiguous get will use memcpy when target is on the same host via shared memory.
- COMEX_ENABLE_ACC_SMP. Default 1 (on). Contiguous acc will use memcpy when target is on the same host via shared memory.
- COMEX_ENABLE_PUT_PACKED. Default 1 (on). Strided put will pack the data into a contiguous buffer.
- COMEX_ENABLE_GET_PACKED. Default 1 (on). Strided get will pack the data into a contiguous buffer.
- COMEX_ENABLE_ACC_PACKED. Default 1 (on). Strided acc will pack the data into a contiguous buffer.
- COMEX_ENABLE_PUT_IOV. Default 1 (on). Vector put will pack the data into a contiguous buffer.
- COMEX_ENABLE_GET_IOV. Default 1 (on). Vector get will pack the data into a contiguous buffer.
- COMEX_ENABLE_ACC_IOV. Default 1 (on). Vector acc will pack the data into a contiguous buffer.
- COMEX_MAX_MESSAGE_SIZE. Default INT_MAX. All use of MPI will keep buffers less than this size. Sometimes useful in conjunction with eager thresholds to force all use of MPI below the eager threshold.
- COMEX_MAX_NB_OUTSTANDING. Default 8.
- armci-config and comex-config added
- --blas_size
- --use_blas
- --network_ldflags
- --network_libs
- ga-config added
- --blas_size
- --scalapack_size
- --use_blas
- --use_lapack
- --use_scalapack
- --use_peigs
- --use_elpa
- --use_elpa_2015
- --use_elpa_2016
- --network_ldflags
- --network_libs
- Environment variables to control internal ComEx MPI-PR settings
- Changed
- Removed case statement from install-autotools.sh
- Fixed
- install-autotools.sh works on FreeBSD
- patch locally built m4 for OSX High Sierra
- Closed Issues Requests
v5.6.4
- Known Bugs
- [#48] Message sizes exceeding 2GB may not work correctly
- Added
- armci-config and comex-config scripts to install.
- Changed
- install-autotools.sh installs all autotools regardless of existing versions
- configure tests needing mixed C/Fortran code now use C linker
- Fixed
- Test suite was broken when GA was cross-compiled
- eliop FreeBSD patch from Debichem
- Locally installed automake is patched to work with newer perl versions
- MPI-PR increased limit on number of possible comex_malloc invocations
- Closed Pull Requests
- [#92] eliop FreeBSD patch from Debian maintainers of the NWChem Package
- Closed Issues Requests
- [#82] Fortran failure on theta
- [#88] Automake regex expression broken for Perl versions >=5.26.0
- [#89] autogen fails on Mac 10.12
- [#90] configure script fails when using clang-4/5 + gfortran 6.3 compilers on Linux
- [#95] comex/src-mpi-pr/comex.c:996: _generate_shm_name: Assertion 'snprintf_retval < (int)31' failed
v5.6.3
v5.6.2
- Known Bugs
- [#48] Message sizes exceeding 2GB may not work correctly
- Fixed
- Bug in MPI-PT comex_malloc().
- Revert ARMCI contiguous check due to regression.
- ELPA updates.
- ScaLAPACK updates, including case for large matrices.
- ComEx OFI updates from Intel.
- Improved configure tests for LAPACK.
- Improved travis tests.