Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added a decode+resize benchmark and cuda decoder #378

Merged
merged 6 commits into from
Nov 19, 2024

Conversation

ahmadsharif1
Copy link
Contributor

@ahmadsharif1 ahmadsharif1 commented Nov 15, 2024

  1. Make the generate_readme_data.py script not regenerate the videos when the parent dir exists.
  2. Add a benchmark column for "dataloader style" benchmark that measures throughput of a batch of videos that are decoded and resized.
  3. Add a bar for cuda decoder which is nothing but torchcodec_public with device="cuda"

Benchmark results show cuda decoder is faster than CPU decoder at dataloader style benchmark.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 15, 2024
import torchvision # noqa: F401
from torchvision.transforms import v2 as transforms_v2

self.torchvision = torchvision
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we actually use self.torchvision? We should be able to remove it.

Copy link
Contributor Author

@ahmadsharif1 ahmadsharif1 Nov 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. done.

@@ -535,6 +595,8 @@ def run_benchmarks(
results = []
df_data = []
verbose = False
# TODO: change this back before landing.
min_runtime_seconds = 0.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

frame = next(reader)
frames.append(frame["data"].permute(1, 2, 0))
frames = [frame.to(device) for frame in frames]
frames = self.transforms_v2.functional.resize(frames, (height, width))
Copy link
Contributor

@scotts scotts Nov 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naive question that applies to all implementations that use the transformation: how do we ensure it's done on the GPU?

Realized after walking away from my laptop that it's controlled by where the data lives, not by some parameter on the transform. So to answer my question: in the frame.to(device) call.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct

@ahmadsharif1 ahmadsharif1 marked this pull request as ready for review November 18, 2024 21:13
@scotts
Copy link
Contributor

scotts commented Nov 19, 2024

Ideally we would also update the README description, as once this PR is merged, the chart is live. But we can also do that on a follow-up. That is, a 22-core Linux system with whatever kind of GPU.

@ahmadsharif1
Copy link
Contributor Author

I tweaked the README.md file as well. Later on we can put the machine info in the chart itself so this doesn't have to be kept manually up-to-date.

@ahmadsharif1 ahmadsharif1 merged commit 425c94f into pytorch:main Nov 19, 2024
37 of 47 checks passed
NicolasHug pushed a commit to NicolasHug/torchcodec that referenced this pull request Nov 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants