Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avcc param sets with emulation prevention bytes don't parse #3

Closed
scottlamb opened this issue Mar 18, 2020 · 6 comments
Closed

avcc param sets with emulation prevention bytes don't parse #3

scottlamb opened this issue Mar 18, 2020 · 6 comments

Comments

@scottlamb
Copy link
Collaborator

In the new AVCC module (thank you! super helpful), an SPS with an emulation prevention doesn't parse. e.g., this panics:

    #[test]
    fn hikvision() {
        // From a Hikvision 2CD2032-I.
        let avcc_data = hex!("014d401e ffe10017 674d401e 9a660a0f
                              ff350101 01400000 fa000003 01f40101
                              000468ee 3c80");
        let avcc = AvcDecoderConfigurationRecord::try_from(&avcc_data[..]).unwrap();
        let sps_data = avcc.sequence_parameter_sets().next().unwrap().unwrap();
        dbg!(sps_data);
        let ctx = avcc.create_context(()).unwrap();
        let sps = ctx.sps_by_id(ParamSetId::from_u32(0).unwrap())
            .expect("missing sps");
    }

I'm not sure how you'd like this to work. Is ParamSetIter supposed to yield an encoded NAL unit or RBSP? If the former, Avcc::create_context should pass it through the RBSP decoder before calling SeqParameterSet::from_bytes. If the latter, ParamSetIter should pass it through the RBSP (and I suppose needs to return a Vec or Cow<[u8]> or some such).

@dholroyd
Copy link
Owner

Seems I've also left some dbg!() in my haste to push code I worked on a few months back. 😞

My preference is to have ParamSetIter yield data which is still encoded, and have the caller do the extra steps.

It's a shame that the RbspDecoder interface makes simple cases like this really complicated! I can't see an easy way to produce Cow<[u8]> via this interface either, although that would be lovely.

dholroyd added a commit that referenced this issue Mar 18, 2020
This is an ugly fix to get things working.  Look into RbspDecoder
refactor to make these cases simpler in future.

As reported by Scott Lamb in #3
@dholroyd
Copy link
Owner

Commit 7ef5c74 is a bit unsatisfactory, but gets this case working; I'll probably not have time in the short term to work out how to refactor RbspDecoder to support this better. Sorry, it's a bit hairy!

I'd be happy to re-do these interfaces -- any suggestions would be welcome!

@scottlamb
Copy link
Collaborator Author

Thank you!

Seems I've also left some dbg!() in my haste to push code I worked on a few months back. 😞

Did you mean to fix that in 7ef5c74? It's still there.

I'd be happy to re-do these interfaces -- any suggestions would be welcome!

fwiw, the "simple cases" are all my software sees right now. It currently gets all its H.264 data from ffmpeg, which always gives it a complete "extra data" (sps + pps) or frame. So my own, not-terribly-robust NAL splitting code is super easy to use—give it some data, it calls a closure once per NALU, no dealing with state between calls. So I haven't thought through the other cases nearly as much as you.

I want to replace the usage of ffmpeg here with a (to-be-determined/written) Rust rtsp crate, which may behave differently. So I may end up caring about the complicated cases as well. But I haven't given it much thought yet. Here are some ideas off the top of my head:

I don't think it's a problem RAM-wise to buffer a whole NALU at once. I assume the concern is CPU?

How much does the naive implementation that always copies to accumulate a whole NALU in a (reused) Vec cost? Maybe you already started with this and discarded it, but I'm catching up.

If that's too expensive, how often does a single NALU get split into two push calls in practice? Maybe the mux/"outer" parsers (RTP / .mp4 / MPEG-TS / Annex-B) could have a single method that pushes some bytes with an end flag. If the whole thing is in one call, it can pass the buffer through as-is; otherwise it can copy to an internal buffer.

The RBSP layer could always expect a complete NAL. Perhaps (if the outer parser's buffer stores only one NALU), it could take a mutable buffer and shift over all the non-"emulation prevention" bytes, sort of like Vec::retain does.

I also wonder if when doing this with network protocols and tokio and such, it might be better to use the bytes crate rather than plain &[u8] / Cow<[u8]> / Vec<u8>. But I haven't actually worked with tokio yet, so I don't really know.

Speaking of the bytes crate, one of its selling points is to "facilitate zero-copy network programming by allowing multiple Bytes objects to point to the same underlying memory." So maybe my "if the outer parser's buffer stores only one NALU" caveat could be removed by using it as a convenient way of starting with some subset of the buffer, then shrinking it a bit as the emulation prevention bytes are removed.

@dholroyd
Copy link
Owner

Now removed the dbg!() in a follow-up commit 😅

moonfire-nvr looks really cool, I will be taking a look!

My aim with the design (maybe not 100% realised yet!) was to avoid another portion of memory write+read workload at more-or-less the source video bitrate. It's true that, depending on the size of the 'lump' that need to be accumulated, the writes+reads might not get all the way through the cache hierarchy down to RAM, but wanting to avoid cpu 'back-end stalls' (I think that's the right term?) was the motivator. As you already know, those costs aren't a deal-breaker for lots of workloads.

I confess I've not really measured the result yet for this crate, but this is mirroring the approach in mpeg2ts-reader, which avoids accumulating the full PES-packets for the caller, and instead hands on the little fragments of PES packet from each TS-packet payload. Benchmark results vs ffmpeg appear good (but you should never trust benchmarks!).

It all pales into insignificance if the video actually has to be decoded of course! 😱

I sometimes use this code with higher-bitrate streams (40-50Mbps is common; 100Mbps is sometimes suggested). My hope is that when the app is just proxying / extracting metadata, the network will be the limiting factor, rather than CPU / cache / RAM.

So in the end, the complexity arises from my desire to parse the header from slice_*_rbsp() without having to accumulate those whole NALUs. For SPS/ PPS etc my code already accumulates the complete NALU -- I've no idea how I'd parse a partial one :)

It would be nice for this crate to support both a simple interface and a complex 'partial-input' one, with a common implementation. Not sure how yet.

I have looked briefly at bytes a while back but couldn't quite work out if it would solve my problems. I'll have to look again! I took the impression that bytes adds the cost of an Rc on taking sub-slices, so I'd want to measure the cost to my precious inner-loops 😄

@dholroyd
Copy link
Owner

I've raised #4 to track the desire to support better APIs for some of these things.

dholroyd added a commit that referenced this issue Mar 31, 2020
Existing SeqParameterSetNalHandler / PicParameterSetNalHandler types,
when used together with RbspDecoder, already have the functionality of
the decode() function, so we can kill that function and just use the
existing types.

Revision to the fix for #3
@dholroyd
Copy link
Owner

FWIW, I revised the fix for this problem in 2ce960b, since there was actually already a way mechanism to perform the required decode, but I'd forgotten about it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants