-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is this usable with a serverless architecture? #131
Comments
Yes, so a lot of the optimizations here use compile and autotuning to get good runtime performance. You could try to use torch.export (using the latest nightlies) similar to this code here: https://github.com/pytorch/ao/blob/5d1444bdef6df15eb89c4c5716ede1c5f8677798/examples/sam2_amg_server/compile_export_utils.py#L147 |
So the optimizations are running on the first run only?
I thought it would just make it faster but this compile and autotuning is making it much more slower. Thanks for the help 🙏 |
@quantumcode-martin - ah, so torch.export let's you save out the code that comes from running with compile and max-autotune. So you run export on a GPU, store the resulting binary somewhere and then load it back up. Like here: https://github.com/pytorch/ao/blob/d0e434c8d825f7ac69e26585cb2ceb002a287f24/examples/sam2_amg_server/cli_on_modal.py#L156-L165 or more generically here: https://github.com/pytorch/ao/blob/d0e434c8d825f7ac69e26585cb2ceb002a287f24/examples/sam2_amg_server/compile_export_utils.py#L285-L308 |
@quantumcode-martin and yes, it is a drop in replacement, but torch.compile does have compile overhead on the first run. It's just a part of that compiler, but with export you can do the compiler work and then store the result so you don't need to do it on every start. export is new and so it's not part of SAM-fast, but it is part of SAM2-fast :) |
Hey @cpuhrsch thanks a loot for the help! Sorry if I'm bothering you with noob questions but I got a bit outside of my confort zone trying to run this model faster. 😅 Here is my attempt to compile the model to get it to run fast right after a cold start: from segment_anything_fast import (
SamPredictor,
sam_model_fast_registry,
SamAutomaticMaskGenerator,
)
import torch
from torch.export import export
sam = sam_model_fast_registry["vit_h"]()
device = "cuda"
sam.to(device)
predictor = SamPredictor(sam)
# compiling
predictor.set_image(image=image)
export(
predictor,
"/workspace/exports/",
) But I get:
I understand the error but I'm not sure about where I can find a |
Ah, so you might want to try exporting the |
I want to speed up SAM on A100 as much as possible, but on a serverless GPU architecture.
I can load the model in a cold start and run it during the job.
I made this change:
(I have predictor and mask_generator because I use SAM in different ways (everything and with a point)).
I am wondering if I am missing something as my job is taking extremely long. I feel like the autotune is running, I am not sure I understand what it is doing, but can't I use pre-optimized functions for A100?
Here are the logs:
The text was updated successfully, but these errors were encountered: