Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cuda accelerated tonemap Filter #5

Open
FCLC opened this issue Jan 10, 2021 · 11 comments · Fixed by #6
Open

Cuda accelerated tonemap Filter #5

FCLC opened this issue Jan 10, 2021 · 11 comments · Fixed by #6
Assignees
Labels
WIP Work in Progress

Comments

@FCLC
Copy link
Owner

FCLC commented Jan 10, 2021

Initial idea of using POCL as a cuda translation layer isnt viable because of POCL not working with image formats on cuda.

Currently reaching out to Yasroslav Pogrebnyak, the developer of the VF_overlay_cuda ffmpeg filter.

Reaching out to nyanmisaka. Seem's to have a lot of experience working on FFmpeg filters and frankly knows more than I do.

In addition to this, collaborating with Ed Borasky to confirm function on jetson platforms.

vf_tonemap_cuda.txt
(renamed from .c to .txt to make github happy )

Missing: tonemap.cu with proper kernel side code. this is easy once I know how to properly call the cuda kernel side from the ffmpeg side.

Standard stride blocks should work, define total amount of blocks using height. most resolution will be 16:9, so by using height parameter, we have a higher chance of hitting divisible by 3 cleanly, so we can take advantage of cuda language data structure.

Other option is taking the R G and B value of a given pixel which is guaranteed to be *3. this might also help for other tone mapping algorithms that use relative offset from local peak luma as input for tonemapping output

@FCLC
Copy link
Owner Author

FCLC commented Jan 14, 2021

major bodge, but the new version of /cuda_filter/vf_scale_cuda.cu should be able to do rein hard tonemapping. the below is an extract from the ffmpeg devel mailing list relevant to this:

For ease of developement, I've kept everything the same including the name of the filter, only changing the function within the file. This is very much a bodge to facilitate development. As such, for testing, this file should replace the vf_scale_cuda.cu file in ffmpeg/libavfilter/vf_scale_cuda.cu

FFmpeg should then be compiled as standard for cuda filters and should be called as you would call the standard vf_scale_cuda filter.
The command would be similar to:
ffmpeg -y -vsync 0 -hwaccel cuda -hwaccel_output_format cuda -i input.mp4 -vf scale_cuda=Source_width:Source_Height -c:a copy -c:v h264_nvenc -b:v 5M output.mp4

The above should decode in hardware, tonemap the frame on gpu and re-encode in hardware at a given bitrate.

@FCLC
Copy link
Owner Author

FCLC commented Jan 20, 2021

Used overlay filter as base instead of scale- seems much better for my purposes. reach out to @znmeb to ask for a test on his side.

syntax will be:
ffmpeg -i INPUT -i INPUT -filter_complex 'hwupload_cuda,overlay_cuda' OUTPUT

This is a bodge for now, since it's only modifying the output of the cuda kernel itself

to use it, replace ffmpeg/libavfilter/vf_overlay_cuda.cu with this file: https://github.com/Camofelix/Jetson_ffmpeg_trancode_cluster/blob/master/cuda_filter/vf_overlay_cuda.cu

It compiles fine, but can't test without a nano for actual usage.

will require to self build:

git fetch ffmpeg source code

make clean

./configure --enable-nonfree --enable-cuda

mv ~/path/to/new/file ~/path/to/ffmpeg/libavfilter/vf_overlay_cuda.cu

make -j

get coffee

ffmpeg -i INPUT -i INPUT -filter_complex 'hwupload_cuda,overlay_cuda' OUTPUT

ffplay output

is output different?

test file: https://4kmedia.org/lg-new-york-hdr-uhd-4k-demo/

@FCLC FCLC added the WIP Work in Progress label Jan 22, 2021
@FCLC
Copy link
Owner Author

FCLC commented Jan 26, 2021

Further update:
Currently concerned about if the jetson will be able to use the standard library of Cuda filters in tandem with decoding and encoding, ideally without making too many memory copies.

@AnterCreeper
Copy link

In my test, opencl is necessary.
cuda accelerated filter is usable.

@AnterCreeper
Copy link

AnterCreeper commented Nov 21, 2021

you just need to

  1. git clone nv-codec-headers
  2. build ffmpeg with --enable-cuda --enable-cuda-nvcc
  3. enjoy
    however, the only usable filters is scale_cuda and yadif_cuda😅
    maybe i need to backport some features. like tonemap_cuda etc.

@FCLC
Copy link
Owner Author

FCLC commented Nov 21, 2021

you just need to

  1. git clone nv-codec-headers

  2. build ffmpeg with --enable-cuda --enable-cuda-nvcc

  3. enjoy

however, the only usable filters is scale_cuda and yadif_cuda😅

maybe i need to backport some features. like tonemap_cuda etc.

I haven't looked at this project in a while, been working on other GPGPUrelated things.

Have they ported the cuda filters to work on the nano?

@AnterCreeper
Copy link

I don't know. I just test and finally build successfully.

@AnterCreeper
Copy link

AnterCreeper commented Nov 21, 2021

image

image

The speed......😅 Not too bad. And the GPU usage is low.
h264 1080p -> h265 720p 6Mbps

@AnterCreeper
Copy link

Command:

sudo ffmpeg -init_hw_device cuda=gpu:0.0 -filter_hw_device gpu -c:v h264_nvmpi -i /mnt/source/Paprika.2006.JAPANESE.1080p.BluRay.x264.DTS-FGT.mkv -vf "format=yuv420p,hwupload,scale_cuda=1280:720,hwdownload,format=yuv420p" -c:v hevc_nvmpi -b:v 6000k -preset medium -profile:v high -acodec ac3 output.mp4

@AnterCreeper
Copy link

Also what i should say is that my device is throttled due to low current and voltage.😅 The power is utter garbage.

@AnterCreeper
Copy link

cuvid is unusable because lacking of libnvcuvid.so.1
Due to the special architecture of jetson(?) The things are quite different.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WIP Work in Progress
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants