Releases: kimwalisch/primecount
primecount-backup-6.2
See the ChangeLog for what's new.
primecount-6.1
The main focus of this release has been on polishing the code and improving the documentation. I also tried many things to improve the scaling on servers with a large number of CPU cores, however I only achieved minor speed ups. The only meaningful improvement is that the same threads are now reused throughout the entire AC computation. This improves the scaling for small to medium sized computations up to 1020. GCC benefits most from this change whereas Clang performance is mostly unchanged.
- Xavier Gourdon's algorithm has been distributed using MPI so that computations can now run on HPC clusters.
CMakeLists.txt
: New WITH_JEMALLOC option (default OFF).AC.cpp
: Reuse the same threads throughout the computation.AC.cpp
: Improve upper bound of C2 formula.AC.cpp
: Avoid branch inside hot loop of A formula.SegmentedPiTable.cpp
: Reuse threads from AC.cpp.LoadBalancerP2.cpp
: New load balancer for P2 & B formulas.phi.cpp
: Reduce caching for tiny numbers.generate_phi.hpp
: Reduce caching for tiny numbers.pod_vector.hpp
: Like std::vector, but without default initialization (useful when allocating 100s of GiB).PiTable.cpp
: Multi-threaded initialization.Status.cpp
: Avoid thread synchronization when printing in order to improve scaling of AC and S2_easy.Status.cpp
: Improve S2_hard & D status accuracy.StatusAC.cpp
: More accurate status for AC formula.cmdoptions.cpp
: Add -B & -D options.
primecount-6.0
This version fixes a scaling issue that has been present in primecount since the first version. The computation of the hard special leaves (S2_hard & D formulas) used a flawed optimization that deteriorated the runtime complexity of the algorithm. The Special-Leaves.md document contains more information. The new version should run much faster for large computations β₯ 10^23.
Sieve.cpp
: Implemented O(sqrt(sieve_size)) counting, previously counting was O(sieve_size).AC_libdivide.cpp
: Up to 20% faster using GCC.S2_easy_libdivide.cpp
: Up to 20% faster using GCC.primecount.h
: New C API. primecount now has both a C API (primecount.h) and a C++ API (primecount.hpp).LoadBalancer.cpp
: Refactor using newThreadSettings
struct.find_optimal_alpha_*.sh
: New scripts that find optimal alpha tuning factors using the Linux perf tool.
primecount-backup-6.0
See the ChangeLog for what's new.
primecount-5.3
libprimesieve has been updated to the latest version which provides a minor speedup for all formulas that use primesieve::iterator
e.g. P2, B, pi_lehmer, ...
Sieve.hpp
: UseNOINLINE
.S2Status.hpp
: UseNOINLINE
.D4.cpp
: Simplify bounds.S2_hard.cpp
: Simplify bounds.LoadBalancer.cpp
: Simplify bounds.LoadBalancer.cpp
: Improve thread load balancing.RiemannR.cpp
: Add support for__float128
type.popcnt.hpp
: Faster ARM64 popcount implementation.fast_div.hpp
: AddENABLE_DIV32
macro.S2Status.cpp
: Fix non-critical data race.
primecount-backup-5.3
See the ChangeLog for what's new.
primecount-5.2
primecount-backup-5.2
I have now implemented Xavier Gourdon's algorithm in the primecount-backup version which runs up to 2x faster than the Deleglise-Rivat algorithm. See the ChangeLog for more information.
primecount-5.1
The memory usage of Xavier Gourdon's algorithm has been reduced from O(x^(1/2)) to O(x^(1/3) * log^3 x). Xavier Gourdon's algorithm is now fully implemented and luckily it scales well up to a very large number of CPU cores. Xavier Gourdon's algorithm is now enabled by default for numbers > 10^7.
main.cpp
: New options: --AC, --B, --D, --Phi0, --Sigma.SegmentedPiTable.cpp
: New PiTable implementation.AC.cpp
: Combine the A & C formulas to improve scaling.D4.cpp
: Tighter bounds, minor speedup.Sigma.cpp
: Reduce initialization overhead.Sieve.cpp
: Improved pre-sieving.S2_hard.cpp
: Improved pre-sieving.print.cpp
: Updated for Gourdon's algorithm.
C++ API Changes
In order to simplify the API the following functions have been removed from the primecount.hpp
header:
int64_t pi_deleglise_rivat(int64_t x);
int64_t pi_legendre(int64_t x);
int64_t pi_lehmer(int64_t x);
int64_t pi_lmo(int64_t x);
int64_t pi_meissel(int64_t x);
int64_t pi_primesieve(int64_t x);
You should use pi(x) instead which is an alias for the fastest prime counting function implementation in primecount.
primecount-5.0
I have finally implemented Xavier Gourdon's prime counting algorithm! Xavier Gourdon's algorithm is an improved version of the Deleglise-Rivat algorithm. According to my benchmarks Gourdon's algorithm runs up to 2x faster than the Deleglise-Rivat algorithm.
src/gourdon
: Implementation of Xavier Gourdon's algorithm.LoadBalancer.cpp
: Updated for Gourdon's algorithm.primecount.cpp
: Updated for Gourdon's algorithm.print.cpp
: Updated for Gourdon's algorithm.generate.cpp
: Generate array with largest prime factors.test.cpp
: Test Gourdon's algorithm.README.md
: Update benchmarks.