Skip to content

Commit

Permalink
[version] Bump rapidgzip version to 0.13.0
Browse files Browse the repository at this point in the history
  • Loading branch information
mxmlnkn committed Mar 4, 2024
1 parent 1d00dff commit b9d8635
Show file tree
Hide file tree
Showing 3 changed files with 40 additions and 4 deletions.
38 changes: 37 additions & 1 deletion python/rapidgzip/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,40 @@

# Version 0.13.0 built on 2024-02-04

## Added

- Use ISA-L CRC32 computation, which uses PCLMULQDQ if available
- Improve profiling output on `--verbose`.
- Add support for bzip2 decompression via the `ParallelGzipReader` architecture.
This is one small step to a unified parallelized and seekable decoder for multiple formats.
- Expose chunk size and I/O read method to Python interface.

## Performance

- Compress windows for chunks with large compression ratios in memory to reduce the memory footprint.
This reduces the memory usage for working with `wikidata-20220103-all.json.gz`
from 20 GB down to 12 GB and can have even larger effects for larger files.
The compression ratio threshold and the compression being done in parallel keeps the overhead
for this memory optimization to a minimum.
- Avoid temporary allocations for internal `SharedFileReader::getLock` calls.
- Automatically adjust chunk size for "small" files and large parallelizations.
- Use faster short-/long-LUT Huffman decoder if compiled without ISA-L.

## API

- Change template parameter `ENABLE_STATISTICS` into a member.
- Move `ChunkData` statistics into a subclass.

## Fixes

- Return only an appropriate exit code instead of showing a Python stacktrace in case of a broken pipe signal.
- Avoid segfault when exporting the index for an empty, invalid gzip file.
- Use `isatty` instead of poll with 100ms timeout to determine whether rapidgzip is piped to.
- Fix build error on macOS when no wheel are available.
- Many smaller adjustmenst to the profiling output with `--verbose`.
- Do not terminate with an error when trying to unlock the GIL during Python finalization


# Version 0.12.1 built on 2024-01-08

## Fixes
Expand Down Expand Up @@ -88,7 +124,7 @@
- Fix possible GIL deadlock when calling many `RapidgzipFile` methods in quick succession.
- Fix many issues with the GIL acquirement code logic.
- Avoid segfault when exporting the index for an empty, invalid gzip file.
- Use `isattay` instead of poll with 100ms timeout to determine whether rapidgzip is piped to.
- Use `isatty` instead of poll with 100ms timeout to determine whether rapidgzip is piped to.
- Fix build error on macOS when no wheel are available.


Expand Down
2 changes: 1 addition & 1 deletion python/rapidgzip/setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = rapidgzip
version = 0.12.1
version = 0.13.0

description = Parallel random access to gzip files
url = https://github.com/mxmlnkn/rapidgzip
Expand Down
4 changes: 2 additions & 2 deletions src/rapidgzip/rapidgzip.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@


#define RAPIDGZIP_VERSION_MAJOR 0
#define RAPIDGZIP_VERSION_MINOR 12
#define RAPIDGZIP_VERSION_PATCH 1
#define RAPIDGZIP_VERSION_MINOR 13
#define RAPIDGZIP_VERSION_PATCH 0
#define RAPIDGZIP_VERSION_FROM_SEMVER( a, b, c ) ( a * 0x10000 + b * 0x100 + c )
#define RAPIDGZIP_VERSION \
RAPIDGZIP_VERSION_FROM_SEMVER( RAPIDGZIP_VERSION_MAJOR, RAPIDGZIP_VERSION_MINOR, RAPIDGZIP_VERSION_PATCH )
Expand Down

0 comments on commit b9d8635

Please sign in to comment.