Cuda accelerated tonemap Filter #5

FCLC · 2021-01-10T15:57:43Z

Initial idea of using POCL as a cuda translation layer isnt viable because of POCL not working with image formats on cuda.

Currently reaching out to Yasroslav Pogrebnyak, the developer of the VF_overlay_cuda ffmpeg filter.

Reaching out to nyanmisaka. Seem's to have a lot of experience working on FFmpeg filters and frankly knows more than I do.

In addition to this, collaborating with Ed Borasky to confirm function on jetson platforms.

vf_tonemap_cuda.txt
(renamed from .c to .txt to make github happy )

Missing: tonemap.cu with proper kernel side code. this is easy once I know how to properly call the cuda kernel side from the ffmpeg side.

Standard stride blocks should work, define total amount of blocks using height. most resolution will be 16:9, so by using height parameter, we have a higher chance of hitting divisible by 3 cleanly, so we can take advantage of cuda language data structure.

Other option is taking the R G and B value of a given pixel which is guaranteed to be *3. this might also help for other tone mapping algorithms that use relative offset from local peak luma as input for tonemapping output

FCLC · 2021-01-14T20:43:13Z

major bodge, but the new version of /cuda_filter/vf_scale_cuda.cu should be able to do rein hard tonemapping. the below is an extract from the ffmpeg devel mailing list relevant to this:

For ease of developement, I've kept everything the same including the name of the filter, only changing the function within the file. This is very much a bodge to facilitate development. As such, for testing, this file should replace the vf_scale_cuda.cu file in ffmpeg/libavfilter/vf_scale_cuda.cu

FFmpeg should then be compiled as standard for cuda filters and should be called as you would call the standard vf_scale_cuda filter.
The command would be similar to:
ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf scale_cuda=Source_width:Source_Height -c:a copy -c:v h264_nvenc -b:v 5M output.mp4

The above should decode in hardware, tonemap the frame on gpu and re-encode in hardware at a given bitrate.

FCLC · 2021-01-20T16:20:23Z

Used overlay filter as base instead of scale- seems much better for my purposes. reach out to @znmeb to ask for a test on his side.

syntax will be:
ffmpeg -i INPUT -i INPUT -filter_complex 'hwupload_cuda,overlay_cuda' OUTPUT

This is a bodge for now, since it's only modifying the output of the cuda kernel itself

to use it, replace ffmpeg/libavfilter/vf_overlay_cuda.cu with this file: https://github.com/Camofelix/Jetson_ffmpeg_trancode_cluster/blob/master/cuda_filter/vf_overlay_cuda.cu

It compiles fine, but can't test without a nano for actual usage.

will require to self build:

git fetch ffmpeg source code

make clean

./configure --enable-nonfree --enable-cuda

mv ~/path/to/new/file ~/path/to/ffmpeg/libavfilter/vf_overlay_cuda.cu

make -j

get coffee

ffmpeg -i INPUT -i INPUT -filter_complex 'hwupload_cuda,overlay_cuda' OUTPUT

ffplay output

is output different?

test file: https://4kmedia.org/lg-new-york-hdr-uhd-4k-demo/

FCLC · 2021-01-26T03:10:24Z

Further update:
Currently concerned about if the jetson will be able to use the standard library of Cuda filters in tandem with decoding and encoding, ideally without making too many memory copies.

AnterCreeper · 2021-11-21T15:06:06Z

In my test, opencl is necessary.
cuda accelerated filter is usable.

AnterCreeper · 2021-11-21T15:08:54Z

you just need to

git clone nv-codec-headers
build ffmpeg with --enable-cuda --enable-cuda-nvcc
enjoy
however, the only usable filters is scale_cuda and yadif_cuda😅
maybe i need to backport some features. like tonemap_cuda etc.

FCLC · 2021-11-21T15:14:39Z

you just need to

git clone nv-codec-headers

build ffmpeg with --enable-cuda --enable-cuda-nvcc

enjoy

however, the only usable filters is scale_cuda and yadif_cuda😅

maybe i need to backport some features. like tonemap_cuda etc.

I haven't looked at this project in a while, been working on other GPGPUrelated things.

Have they ported the cuda filters to work on the nano?

AnterCreeper · 2021-11-21T15:15:43Z

I don't know. I just test and finally build successfully.

AnterCreeper · 2021-11-21T15:18:46Z

The speed......😅 Not too bad. And the GPU usage is low.
h264 1080p -> h265 720p 6Mbps

AnterCreeper · 2021-11-21T15:20:59Z

Command:

sudo ffmpeg -init_hw_device cuda=gpu:0.0 -filter_hw_device gpu -c:v h264_nvmpi -i /mnt/source/Paprika.2006.JAPANESE.1080p.BluRay.x264.DTS-FGT.mkv -vf "format=yuv420p,hwupload,scale_cuda=1280:720,hwdownload,format=yuv420p" -c:v hevc_nvmpi -b:v 6000k -preset medium -profile:v high -acodec ac3 output.mp4

AnterCreeper · 2021-11-21T15:24:50Z

Also what i should say is that my device is throttled due to low current and voltage.😅 The power is utter garbage.

AnterCreeper · 2021-11-21T15:52:10Z

cuvid is unusable because lacking of libnvcuvid.so.1
Due to the special architecture of jetson(?) The things are quite different.

FCLC mentioned this issue Jan 10, 2021

Tonemapping function relying on OpenCL filter and NVENC HEVC decoder jellyfin/jellyfin#3442

Merged

FCLC self-assigned this Jan 14, 2021

FCLC linked a pull request Jan 14, 2021 that will close this issue

new version of vf scale cuda is a major bodge- but should be able to … #6

Merged

FCLC added this to the Create compilable filter that *should* work milestone Jan 20, 2021

FCLC added the WIP Work in Progress label Jan 22, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuda accelerated tonemap Filter #5

Cuda accelerated tonemap Filter #5

FCLC commented Jan 10, 2021 •

edited

Loading

FCLC commented Jan 14, 2021

FCLC commented Jan 20, 2021 •

edited

Loading

FCLC commented Jan 26, 2021

AnterCreeper commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021 •

edited

Loading

FCLC commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021 •

edited

Loading

AnterCreeper commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021

Cuda accelerated tonemap Filter #5

Cuda accelerated tonemap Filter #5

Comments

FCLC commented Jan 10, 2021 • edited Loading

FCLC commented Jan 14, 2021

FCLC commented Jan 20, 2021 • edited Loading

FCLC commented Jan 26, 2021

AnterCreeper commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021 • edited Loading

FCLC commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021 • edited Loading

AnterCreeper commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021

AnterCreeper commented Nov 21, 2021

FCLC commented Jan 10, 2021 •

edited

Loading

FCLC commented Jan 20, 2021 •

edited

Loading

AnterCreeper commented Nov 21, 2021 •

edited

Loading

AnterCreeper commented Nov 21, 2021 •

edited

Loading