-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added a decode+resize benchmark and cuda decoder #378
Conversation
import torchvision # noqa: F401 | ||
from torchvision.transforms import v2 as transforms_v2 | ||
|
||
self.torchvision = torchvision |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we actually use self.torchvision
? We should be able to remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. done.
@@ -535,6 +595,8 @@ def run_benchmarks( | |||
results = [] | |||
df_data = [] | |||
verbose = False | |||
# TODO: change this back before landing. | |||
min_runtime_seconds = 0.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have an unmerged change in PR #362 to make this sort of benchmark testing easier: https://github.com/pytorch/torchcodec/pull/362/files#diff-c378bd5d03e7daa116eaeeeb86921e8134f0feefb0fcdf9f020f872676c5c00dR31-R34
frame = next(reader) | ||
frames.append(frame["data"].permute(1, 2, 0)) | ||
frames = [frame.to(device) for frame in frames] | ||
frames = self.transforms_v2.functional.resize(frames, (height, width)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Naive question that applies to all implementations that use the transformation: how do we ensure it's done on the GPU?
Realized after walking away from my laptop that it's controlled by where the data lives, not by some parameter on the transform. So to answer my question: in the frame.to(device)
call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
Ideally we would also update the README description, as once this PR is merged, the chart is live. But we can also do that on a follow-up. That is, a 22-core Linux system with whatever kind of GPU. |
I tweaked the README.md file as well. Later on we can put the machine info in the chart itself so this doesn't have to be kept manually up-to-date. |
Benchmark results show cuda decoder is faster than CPU decoder at dataloader style benchmark.