Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stable Diffusion Inference Overhaul #254

Merged
merged 8 commits into from
Sep 26, 2023
Merged

Conversation

harubaru
Copy link
Contributor

@harubaru harubaru commented Sep 14, 2023

As per #251, this PR introduces a few fixes and improvements to the older Stable Diffusion Inference example. And most importantly, it makes the example work again.

  • Updated Tensorizer support. Current Stable Diffusion models have been added to the public s3://tensorizer bucket in order to allow fast loading of the SD base models using Tensorizer.
  • Dropped PVC loading. Stable Diffusion models are only loaded through CoreWeave S3 object storage via Tensorizer. This simplifies the example significantly by not introducing an additional download and PVC creation step.
  • Replaced base image with CoreWeave Torch image. This enables faster deployments by simply using a smaller image with all of the prerequisite dependencies we need.
  • Dropped KServe dependency. KServe conflicts with Tensorizer so it was best to remove it entirely and replace it with FastAPI.
  • Simpler serialization & S3 upload example. Since Tensorizer has built-in support to push to S3 storage, this eliminates requiring s3cmd and further simplifies the example.
  • Single Docker image to rule them all. Instead of having multiple docker images for the serializer, s3 upload, downloader, and the inference service, they have been coalesced into one as they share the same dependencies.

To run the inference example as-is:

  1. The inference service loads a public tensorized model by default, so it can be started by simply running kubectl apply -f 02-inference-service.yaml.

To run the inference example with a custom serialized SD model:

  1. A S3 key must be generated through the cloud app, once this is done a bucket would have to be made. With s3cmd, this can be done by running s3cmd mb s3://YOURBUCKET.
  2. Now, the S3 secrets have to be installed into Kubernetes. Under 00-optional-s3-secret.yaml, you would replace each secret's placeholder with your base64 encoded keys. This can be done by running echo -n "YOURKEYHERE" | base64" for each key and the host url which is the S3 endpoint. Once this is done, you can install the secrets by running kubectl apply -f 00-optional-s3-secret.yaml.
  3. To serialize the model, you would have to modify the command arguments in 01-optional-s3-serialize-job.yaml to replace --dest-bucket with the bucket you are serializing to, and --hf-model-id with the custom model you would like to serialize. Once this is done, you can run the job by running kubectl apply -f 01-optional-s3-serialize-job.yaml.
  4. To run the inference service, all you would have to do is replace the model URI in 02-inference-service.yaml with the S3 URI pointing to your custom model. After that, you should be ready to start the service by running kubectl apply -f 02-inference-service.yaml.

To test the inference endpoint:

  • You can run the command below, though replace the base of the URL with the link to your ksvc which can be found by listing the knative services by running kubectl get ksvc.
  • curl -X POST 'http://sd.tenant-sta-amercurio-amercurio.knative.ord1.coreweave.cloud/generate' -H 'accept: application/json' -H 'Content-Type: application/json' -d '{"prompt": "a cat sleeping comfortably on a bed", "guidance_scale": 7, "num_inference_steps": 28, "seed": 42, "width": 768, "height": 512}' -o cat.png

@harubaru harubaru requested a review from wbrown September 14, 2023 00:48
Copy link
Contributor

@rtalaricw rtalaricw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@rtalaricw
Copy link
Contributor

@harubaru Tested and works with custom/regular. Please merge when you can.

@harubaru harubaru merged commit f4a5946 into master Sep 26, 2023
@harubaru harubaru deleted the amercurio/sd-inference-fix branch September 26, 2023 22:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants