Skip to content

Commit

Permalink
[version] Bump rapidgzip version to 0.11.0
Browse files Browse the repository at this point in the history
  • Loading branch information
mxmlnkn committed Dec 19, 2023
1 parent 1aa6a55 commit d5e144c
Show file tree
Hide file tree
Showing 4 changed files with 45 additions and 3 deletions.
42 changes: 42 additions & 0 deletions python/rapidgzip/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,46 @@

# Version 0.11.0 built on 2023-12-19

## Added

- Make parallel decompression work from stdin and other non-seekable inputs.
- The setup.py file now comes with fine-granular dependency control via the environment variables:
`RAPIDGZIP_BUILD_CXXOPT`, `RAPIDGZIP_BUILD_ISAL`, `RAPIDGZIP_BUILD_RPMALLOC`, `RAPIDGZIP_BUILD_ZLIB`,
which can be set to `enable`, `disable`, or `system`. Cxxopts and zlib may not be disabled.
- Include `indexed_bzip2` classes and CLI method with the rapidgzip Python module. This only adds ~15%
space overhead to the precompiled binaries. This is a step towards one Python module offering seekable
access to many different file formats.
- Add import/export timings with `--verbose`.
- Enable checksum verification by default. This adds ~5 % overhead.
- Show a message about mismatching CRC32 during `--analyze` but try to read further.
- Track symbol usage in windows and show information with `--analyze`.
- Reorganize output of `--help`.
- Add `--io-read-method=...` option, which can be set to `pread`, `sequential`, or `locked-read`.
`--io-read-method=sequential` is advisable when decompressing from files on slow I/O devices such as HDDs.
- Add `RapidgzipFile.peek` method.

## Performance

- Clear seek points / windows when they are not needed, e.g., for one-pass sequential decompression
without `--export-index`. This reduces the memory usage for decompressing `wikidata-20220103-all.json.gz`
from 20 GB down to 10 GB and can have even larger effects for larger files.
- Avoid doubling memory usage during index import and export by streaming the data directly to the output file
without an internal copy.

## Fixes

- Show better error message when quitting via SIGINT during a long-running read loop over a RapidgzipFile
object working on a Python file object without using Python context managers / the with-statement.
This leads to the decompression threads being left running and trying to acquire a non-existing GIL
while Python interpreter finalization has already started.
- Fix compile error when compiling with Conda because it defines `__linux__` while not having `F_GETPIPE_SZ`.
- Improve error messages on EOF, for ISA-L and Zlib wrappers, and when file seeking fails.

## API

- Change `size_t FileReader::size()` to `std::optional<size_t> FileReader::size()`


# Version 0.10.4 built on 2023-11-25

## Fixes
Expand Down
2 changes: 1 addition & 1 deletion python/rapidgzip/rapidgzip.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -687,4 +687,4 @@ def ibzip2_cli():
PyBuffer_Release(&buffer)


__version__ = '0.10.4'
__version__ = '0.11.0'
2 changes: 1 addition & 1 deletion python/rapidgzip/setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = rapidgzip
version = 0.10.4
version = 0.11.0

description = Parallel random access to gzip files
url = https://github.com/mxmlnkn/rapidgzip
Expand Down
2 changes: 1 addition & 1 deletion src/tools/rapidgzip.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -251,7 +251,7 @@ rapidgzipCLI( int argc,

if ( parsedArgs.count( "version" ) > 0 ) {
std::cout << "rapidgzip, CLI to the parallelized, indexed, and seekable gzip decoding library rapidgzip "
<< "version 0.10.4.\n";
<< "version 0.11.0.\n";
return 0;
}

Expand Down

0 comments on commit d5e144c

Please sign in to comment.