Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Switch from start-code framing to Low overhead bitstream format #43

Open
bilboed opened this issue Mar 7, 2023 · 2 comments
Open

Comments

@bilboed
Copy link

bilboed commented Mar 7, 2023

A "start-code based" framing, and other requirements, was introduced by MR #5

This RFC is to discuss:

  • Dropping that custom framing and use the Low overhead bitstream format from the base AV1 specification
  • Mandating the presence of obu_size (as required by the "low overhead bitstream" format)
  • Mandating the presence of OBU_TEMPORAL_DELIMITER and OBU_REDUNDANT_FRAME_HEADER

The goal is to provide:

  • The least difference with the bitstream expected/provided by other AV1 handlers (software or hardware)
  • While still providing easy access through/accross OBU (i.e. framing)

Specificities of AV1 bitstream

Unlike most video codec bitstreams, the AV1 specification has provisioned a flag (obu_has_size_field) in OBU headers and a variable-length field (obu_size) to be able to specify the size of the OBU payload.

This allows elements and hardware that process AV1 bitstream to easily skip/scan through OBU without requiring any other form of packing provided the container format specifies the beginning of one OBU.

This "size" feature is not present in any other major video codec bitstream, explaining why they have to resort to using a "startcode-based" system and provisioned their bitstream to support it (by having "emulation-prevention" bytes within their bitstream).

Lower overhead

The obu_size feature of AV1 bitstream provides a more compact bitstream than the "startcode-based" proposal:

  • Specifying the presence of a obu_size has no cost (flag included in header)
  • The obu_size field, being a leb128, uses less space on average than the mandatory 4 bytes of a startcode
    • 1 byte for up to (2^7) 128 bytes payload, for Temporal Delimiter, Frame Header and small metadata
    • 2 bytes for up to (2^14) 16kB payload
    • 3 bytes for up to (2^21) 2MB payload
    • 4 bytes (equivalent to 4 bytes of startcode) allows handling of (2^28) 256MB payload
    • More seems unlikely for now

The "startcode-based" format also requires modifying the bitstream to insert emulation-prevention bytes where needed, further increasing the payload.

Not scanning whole bitstream

Due to having the OBU size specified in the bitstream, this also allows direct seeking/skipping over the OBUs, instead of scanning for a startcode.

The only requirement for this is for the container to specify where a single OBU starts, which is easily done by mandating that the AV1 PES payload starts with a OBU header.

Compatibilty with existing hardware and software

Existing hardware and software that are fully compliant with the AV1 specification would require extra processing in order to be compatible with the proposed "start-code" format:

  • Full parsing of the bitstream in order to insert/remove the emulation bytes
  • Re-computation of the presence of OBU_TEMPORAL_DELIMITER and OBU_REDUNDANT_FRAME_HEADER
    • Note: The current av1-mpegts specification doesn't specify when/how they should be removed or re-inserted.

While less complex than the emulation-byte handling, the proposed "startcode-based" framing does not mandate the presence of obu_size (which the standard "Low Overhead bitstream format" mandates). This would also require re-computing the OBU header to re-insert (or remove) that mandatory obu_size.

Using the "Low overhead bitstream format" from the base specification avoids this complexity overhead and avoids potential issues/pitfalls when transforming the bitstream.

Informational : Why the Annex B "Length-delimited Bitstream Format" is not suitable

While tempting and slightly less complex, the Annex B formatting requires handling at the "Temporal Unit" level, which is not compatible with the proposed Access Unit PES framing which is at the "Frame Unit" level.

Creating such a bitstream would require accumulating the various "Frame Unit" in a "Temporal Unit" in order to compute the temporal_unit_size, introducing excessive latency.

@bilboed
Copy link
Author

bilboed commented Dec 20, 2024

It should be noted also that as of H.222.0 (2018) Amendment 1 (11/19) (and available with H.222.0 2021 and more recent), which adds jpeg-xs mapping, it makes it 100% clear that adding formats which don't have emulation prevention is fine:

NOTE – As there is no emulation prevention byte in JPEG XS elementary stream, there can be misdetection of PES start
code. To avoid this, and as the PES_packet_length is set to 'undefined' (0x0000), the start of a PES packet can be signalled
using the Payload Unit Start Indicator in the transport packet header.

@kierank
Copy link
Collaborator

kierank commented Dec 20, 2024

I think this is a practical consideration the authors of H.222 made because JPEG-XS is unlikely to be used in legacy systems that do processing in the PES domain. It's a very high bitrate (300Mbps+) codec unlike AV1. This makes AV1 likely to be used in legacy systems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants