Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for setting scaleToZeroTimeout with create_inference_endpoint/update_inference_endpoint #2462

Closed
hommayushi3 opened this issue Aug 19, 2024 · 1 comment

Comments

@hommayushi3
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I would like to be able to programmatically set the scaleToZeroTimeout parameter in HFApi.create_inference_endpoint and HFApi.update_inference_endpoint programmatically. The only ways to do this currently are through the HF inference endpoints UI or directly through the /v2/endpoint/{namespace} post request.

Describe the solution you'd like

from huggingface_hub import HfApi
api = HfApi()
endpoint = api.create_inference_endpoint(
    "aws-zephyr-7b-beta-0486",
    repository="HuggingFaceH4/zephyr-7b-beta",
    framework="pytorch",
    task="text-generation",
    accelerator="gpu",
    vendor="aws",
    region="us-east-1",
    type="protected",
    instance_size="x1",
    instance_type="nvidia-a10g",
    min_replica=0,
    max_replica=1,
    scale_to_zero_timeout=30,  # in minutes ************ REQUESTED PARAMETER **********************
    custom_image={
        "health_route": "/health",
        "env": {
            "MAX_BATCH_PREFILL_TOKENS": "2048",
            "MAX_INPUT_LENGTH": "1024",
            "MAX_TOTAL_TOKENS": "1512",
            "MODEL_ID": "/repository"
        },
        "url": "ghcr.io/huggingface/text-generation-inference:1.1.0",
    },
)

Describe alternatives you've considered

  • HF inference endpoints UI
  • Direct post request to /v2/endpoint/{namespace} endpoint

Additional context

@Wauplin
Copy link
Contributor

Wauplin commented Aug 20, 2024

Thanks for the suggestion and PR @hommayushi3! I just reviewed and merged it: #2463. Clean, concise and documented like I like ❤️

@Wauplin Wauplin closed this as completed Aug 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants