From e52b8ab0d302f63b84afe9a5f19d0e49dd57b0a1 Mon Sep 17 00:00:00 2001
From: Arthur Leung <arcyleung@gmail.com>
Date: Fri, 4 Oct 2024 16:34:04 -0400
Subject: [PATCH 1/2] Add user defined scheduling scaling proposal

---
 ...4-10-03-user-defined-scheduling-scaling.md | 43 +++++++++++++++++++
 1 file changed, 43 insertions(+)
 create mode 100644 reps/2024-10-03-user-defined-scheduling-scaling.md

diff --git a/reps/2024-10-03-user-defined-scheduling-scaling.md b/reps/2024-10-03-user-defined-scheduling-scaling.md
new file mode 100644
index 0000000..afa765c
--- /dev/null
+++ b/reps/2024-10-03-user-defined-scheduling-scaling.md
@@ -0,0 +1,43 @@
+# User-defined scheduling and scaling policies for Ray Serve deployments
+
+## Summary
+Provide support for user-defined scheduling and scaling policies
+
+## Motivation
+
+### General Motivation
+
+The default scheduling is based on power of 2 choices and the default scaling is based on target ongoing requests/ max ongoing requests. Users of Ray Serve may have different needs such as SLAs for the requests they serve. This proposal aims to provide a way to configure custom scheduling and scaling policies for Ray Serve deployments, with a user-defined policy config.
+
+
+=============== 
+
+What use cases is this proposal supposed to enhance. If possible, please include details like the environment and scale.
+### Should this change be within `ray` or outside?
+From a software layering perspective, should this change be part of the main `ray` project, part of an ecosystem project under `ray-project`, or a new ecosystem project?
+
+When reviewing the REP, the reviewers and the shepherd should apply the following judgements:
+- If an author proposes a change to be within the `ray` repo, the reviewers and the shepherd should assess whether the change can be layered on top of `ray` instead. 
+If so we should try to make the change in a separate repo. 
+- For a change proposed as an ecosystem project under `ray-project`: the reviewers and the shepherd should make sure that the technical quality
+meets the bar of (at least) a good "experimental" or "alpha" feature -- we should be comfortable welcoming Ray users with similar use cases to try this project.
+- For a change proposed as a new ecosystem project (outside of `ray-project`): then this REP is just serving as a "request for comments". 
+We don't need to go through the voting process, since it's not Ray committers' decision to approve the change. 
+
+## Stewardship
+### Required Reviewers
+The proposal will be open to the public, but please suggest a few experienced Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers. 
+### Shepherd of the Proposal (should be a senior committer)
+To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review.
+
+## Design and Architecture
+The proposal should include sufficient technical details for reviewers to determine the anticipated benefits and risks.
+
+## Compatibility, Deprecation, and Migration Plan
+An important part of the proposal is to explicitly point out any compability implications of the proposed change. If there is any, we should thouroughly discuss a plan to deprecate existing APIs and migration to the new one(s).
+
+## Test Plan and Acceptance Criteria
+The proposal should discuss how the change will be tested **before** it can be merged or enabled. It should also include other acceptance criteria including documentation and examples. 
+
+## (Optional) Follow-on Work
+Optionally, the proposal should discuss necessary follow-on work after the change is accepted.

From 5e88f0feaf9dbaf5c3dcdcc628d79eeb67bca2be Mon Sep 17 00:00:00 2001
From: Arthur Leung <arcyleung@gmail.com>
Date: Thu, 17 Oct 2024 02:02:19 -0400
Subject: [PATCH 2/2] Add proposed ray serve API change

---
 ...4-10-03-user-defined-scheduling-scaling.md | 63 +++++++++++++------
 1 file changed, 45 insertions(+), 18 deletions(-)

diff --git a/reps/2024-10-03-user-defined-scheduling-scaling.md b/reps/2024-10-03-user-defined-scheduling-scaling.md
index afa765c..ca58836 100644
--- a/reps/2024-10-03-user-defined-scheduling-scaling.md
+++ b/reps/2024-10-03-user-defined-scheduling-scaling.md
@@ -4,40 +4,67 @@
 Provide support for user-defined scheduling and scaling policies
 
 ## Motivation
-
 ### General Motivation
+The default scheduling is based on power of 2 choices and the default scaling is based on target ongoing requests/ max ongoing requests. Users of Ray Serve may have different needs such as latency-based SLAs (ie. p99 or p95 requirements) for the requests they serve. This proposal aims to provide a way to configure custom scheduling and scaling policies for Ray Serve deployments with a user-defined policy config, mostly for convenience and out-of-the box support, similar to [multiplexed](https://docs.ray.io/en/latest/serve/model-multiplexing.html) deployments for multi-model serve deployments.
 
-The default scheduling is based on power of 2 choices and the default scaling is based on target ongoing requests/ max ongoing requests. Users of Ray Serve may have different needs such as SLAs for the requests they serve. This proposal aims to provide a way to configure custom scheduling and scaling policies for Ray Serve deployments, with a user-defined policy config.
-
-
-=============== 
-
-What use cases is this proposal supposed to enhance. If possible, please include details like the environment and scale.
 ### Should this change be within `ray` or outside?
-From a software layering perspective, should this change be part of the main `ray` project, part of an ecosystem project under `ray-project`, or a new ecosystem project?
-
-When reviewing the REP, the reviewers and the shepherd should apply the following judgements:
-- If an author proposes a change to be within the `ray` repo, the reviewers and the shepherd should assess whether the change can be layered on top of `ray` instead. 
-If so we should try to make the change in a separate repo. 
-- For a change proposed as an ecosystem project under `ray-project`: the reviewers and the shepherd should make sure that the technical quality
-meets the bar of (at least) a good "experimental" or "alpha" feature -- we should be comfortable welcoming Ray users with similar use cases to try this project.
-- For a change proposed as a new ecosystem project (outside of `ray-project`): then this REP is just serving as a "request for comments". 
-We don't need to go through the voting process, since it's not Ray committers' decision to approve the change. 
+Inside `ray serve`, as the scheduling and autoscaling policies will be used for out-of-the box deployments. There exist external schedulers such as YuniKorn and Volcano for batch jobs but this is specific to application-level deployments
 
 ## Stewardship
 ### Required Reviewers
-The proposal will be open to the public, but please suggest a few experienced Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers. 
+The proposal will be open to the public, but please suggest a few experienced Ray contributors in this technical domain whose comments will help this proposal. Ideally, the list should include Ray committers.
+
+
 ### Shepherd of the Proposal (should be a senior committer)
 To make the review process more productive, the owner of each proposal should identify a **shepherd** (should be a senior Ray committer). The shepherd is responsible for working with the owner and making sure the proposal is in good shape (with necessary information) before marking it as ready for broader review.
 
+
 ## Design and Architecture
 The proposal should include sufficient technical details for reviewers to determine the anticipated benefits and risks.
 
+The proposed change requires addition of two parameters called "scaling_policy" and "scheduling_policy" passed to `ray/serve/deployment.py`, such that they can be used in the `ray.serve` decorator, in the form of python `Callable`s.
+
+```python
+# Current (pow2 / target_ongoing requests)
+@serve.deployment(max_ongoing_requests=1, max_queued_requests=1)
+# Proposed
+@serve.deployment(scaling_policy=MyAutoscaler, scheduling_policy=MyScheduler)
+```
+
 ## Compatibility, Deprecation, and Migration Plan
 An important part of the proposal is to explicitly point out any compability implications of the proposed change. If there is any, we should thouroughly discuss a plan to deprecate existing APIs and migration to the new one(s).
 
+The following is a backwards-compatible option:
+- Check if the old parameters `max_ongoing_requests` and `max_ongoing_requests` are specified without either `scaling_policy` or `scheduling_policy`, then default to initializng the deployment using the default pow2 / target_ongoing requests policies.
+
+- Existing `AutoScalingConfig` class should be refactored to only include the basic parameters across all autoscaling algorithms, which include limits such as min/max replicas. Any reference to `max_queued_requests` `max_ongoing_requests` etc `replica_queue_length_autoscaling_policy`/ target ongoing requests should be moved to its own specific subclass (ie. `RequestLengthPolicy`) or `autoscaling_config`
+- The `get_decision_num_replicas` should accept custom arguments as to what is monitored. The `get_policy` already returns a Callable, so it simply needs to be updated to pass variable parameters, depending on which heuristics the user's custom autoscaling policy is required to monitor:
+
+```python
+        decision_num_replicas = self._policy(
+            curr_target_num_replicas=curr_target_num_replicas,
+            total_num_requests=self.get_total_num_requests(),
+            num_running_replicas=len(self._running_replicas),
+            config=self._config,
+            capacity_adjusted_min_replicas=self.get_num_replicas_lower_bound(),
+            capacity_adjusted_max_replicas=self.get_num_replicas_upper_bound(),
+            policy_state=self._policy_state,
+        )
+```
+to
+```python
+        decision_num_replicas = self._policy(
+            config=self._config,
+            policy_state=self._policy_state,
+            capacity_adjusted_min_replicas=self.get_num_replicas_lower_bound(),
+            capacity_adjusted_max_replicas=self.get_num_replicas_upper_bound(),
+            # Autoscaling policy may be based on queue length, SLA violations on each replica, or another heuristic
+            **current_metrics,
+        )
+```
+
 ## Test Plan and Acceptance Criteria
-The proposal should discuss how the change will be tested **before** it can be merged or enabled. It should also include other acceptance criteria including documentation and examples. 
+The proposal should discuss how the change will be tested **before** it can be merged or enabled. It should also include other acceptance criteria including documentation and examples.
 
 ## (Optional) Follow-on Work
 Optionally, the proposal should discuss necessary follow-on work after the change is accepted.