diff --git a/python/rapidgzip/README.md b/python/rapidgzip/README.md index 15072383..1f719c79 100644 --- a/python/rapidgzip/README.md +++ b/python/rapidgzip/README.md @@ -131,7 +131,7 @@ This benchmarks uses gzip-compressed [FASTQ data](http://ftp.sra.ebi.ac.uk/vol1/ That's why the TAR file is repeated as often as there are number of cores in the benchmark to hold the decompression times roughly constant in order to make the benchmark over this large a range feasible. This is almost the worst case for rapidgzip because it contains many LZ77 back-references over very long ranges. This means that a fallback to ISA-L is not possible and it means that the costly two-staged decoding has to be done for almost all the data. -This is also the reason why if fails to scale above 64 cores, i.e, to teh second CPU socket. +This is also the reason why if fails to scale above 64 cores, i.e, to the second CPU socket. The first and second decompression stages are completely independently submitted to a thread pool, which on this NUMA architecture means, that data needs to be costly transferred from one processor socket to the other if the second step for a chunk is not done on the same processor as the first. This should be fixable by making the ThreadPool NUMA-aware. diff --git a/src/rapidgzip/ChunkData.hpp b/src/rapidgzip/ChunkData.hpp index 2a12099b..0c4a0c97 100644 --- a/src/rapidgzip/ChunkData.hpp +++ b/src/rapidgzip/ChunkData.hpp @@ -85,7 +85,7 @@ struct ChunkData : * might not work perfectly and might already have read some of the next block. * Currently, the unit tests, test that all possibilities to derive the footer offsets: GzipReader, decodeBlock, * decodeBlockWithInflateWrapper with ISA-L or zlib, return the same value. - * That value is currently the footer end because it seemed easier to implement. This might be subjecft to + * That value is currently the footer end because it seemed easier to implement. This might be subject to * change until it is actually used for something (e.g. smarter block splitting). * The most complicated to implement but least ambiguous solution would be to add all three boundaries to * this struct. @@ -354,7 +354,7 @@ struct ChunkData : } /** - * Appends gzip footer information at the given offset. + * Appends generic footer information at the given offset. */ void appendFooter( ChunkData::Footer&& footer ) diff --git a/src/rapidgzip/GzipChunkFetcher.hpp b/src/rapidgzip/GzipChunkFetcher.hpp index d34fc17a..f9a606df 100644 --- a/src/rapidgzip/GzipChunkFetcher.hpp +++ b/src/rapidgzip/GzipChunkFetcher.hpp @@ -1406,7 +1406,7 @@ class GzipChunkFetcher : * However, igzip -0 can compress the whole file in a single deflate block. * Decompressing such a file is not supported (yet). It would require some heavy * refactoring of the ChunkData class to support resuming the decompression so that - * we can simply break and return here insteda of throwing an exception. This would basically + * we can simply break and return here instead of throwing an exception. This would basically * require putting a whole GzipReader in the ChunkData so that even random access is supported * in an emulated manner. */ if ( blockBytesRead > 256_Mi ) {