Skip to content

Commit

Permalink
Allow BYTE_ARRAY_STOP to work on non-zero STOP code with TOK3.
Browse files Browse the repository at this point in the history
Our htscodec name tokeniser decoder always adds nul bytes between
names.  This happens to match the default STOP byte used in htslib's
CRAM implementation, but there's nothing to say it has to be 0 and
indeed Java uses 9 (tab).

This is an oversight and ideally we'd change the name tokeniser decode
function to take an additional parameter to specify the stop byte, but
that's changing the API.  Easiest is just to recognise this on-the-fly
and correct the error by looking for a different stop byte.

Also fixed cram_uncompress_block setting of b->orig_method.  This was
only correct when the original prototype definitions of RANS_PR0 were
in use, and with the RANSPR official numbering the calculation caused
RLE+O1 to be mislabelled as TOK3.

This field isn't used in anything else anyway during decode (but has
some diagnostic usage during encode).  The official API is via
cram_block_get_method and cram_expand_method.
  • Loading branch information
jkbonfield committed Jan 7, 2025
1 parent c705bec commit 329e794
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 5 deletions.
5 changes: 4 additions & 1 deletion cram/cram_codecs.c
Original file line number Diff line number Diff line change
Expand Up @@ -3613,7 +3613,10 @@ int cram_byte_array_stop_decode_block(cram_slice *slice, cram_codec *c,
cp = b->data + b->idx;
cp_end = b->data + b->uncomp_size;

stop = c->u.byte_array_stop.stop;
// STOP byte is hard-coded as zero by our name tokeniser decoder
// implementation, so we may ignore what was requested.
stop = b->orig_method == TOK3 ? 0 : c->u.byte_array_stop.stop;

if (cp_end - cp < out->alloc - out->byte) {
unsigned char *out_cp = BLOCK_END(out);
while (cp != cp_end && *cp != stop)
Expand Down
6 changes: 2 additions & 4 deletions cram/cram_io.c
Original file line number Diff line number Diff line change
Expand Up @@ -1698,8 +1698,7 @@ int cram_uncompress_block(cram_block *b) {
free(uncomp);
return -1;
}
b->orig_method = RANS_PR0 + (b->data[0]&1)
+ 2*((b->data[0]&0x40)>0) + 4*((b->data[0]&0x80)>0);
b->orig_method = RANSPR;
free(b->data);
b->data = (unsigned char *)uncomp;
b->alloc = usize2;
Expand All @@ -1718,8 +1717,7 @@ int cram_uncompress_block(cram_block *b) {
free(uncomp);
return -1;
}
b->orig_method = ARITH_PR0 + (b->data[0]&1)
+ 2*((b->data[0]&0x40)>0) + 4*((b->data[0]&0x80)>0);
b->orig_method = ARITH;
free(b->data);
b->data = (unsigned char *)uncomp;
b->alloc = usize2;
Expand Down

0 comments on commit 329e794

Please sign in to comment.