forked from openkruise/kruise-game
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Feat: add external scaler (openkruise#39)
* Feat: add external scaler Signed-off-by: ChrisLiu <[email protected]> * Add docs for autoscaling Signed-off-by: ChrisLiu <[email protected]> --------- Signed-off-by: ChrisLiu <[email protected]>
- Loading branch information
1 parent
1cb53de
commit 7c072bc
Showing
16 changed files
with
1,212 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
resources: | ||
- service.yaml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
--- | ||
apiVersion: v1 | ||
kind: Service | ||
metadata: | ||
name: external-scaler | ||
namespace: kruise-game-system | ||
spec: | ||
ports: | ||
- port: 6000 | ||
targetPort: 6000 | ||
selector: | ||
control-plane: controller-manager |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,99 @@ | ||
## Feature overview | ||
|
||
Compared to stateless service types, game servers have higher requirements for automatic scaling, especially in terms of scaling down. | ||
|
||
The differences between game servers become more and more obvious over time, and the precision requirements for scaling down are extremely high. Coarse-grained scaling mechanisms can easily cause negative effects such as player disconnections, resulting in huge losses for the business. | ||
|
||
The horizontal scaling mechanism in native Kubernetes is shown in the following figure: | ||
|
||
![autoscaling-k8s-en.png](../../images/autoscaling-k8s-en.png) | ||
|
||
In the game scenario, its main problems are: | ||
|
||
- At the pod level, it is unable to perceive the game server game status and therefore cannot set deletion priority based on game status. | ||
- At the workload level, it cannot select scaling-down objects based on game status. | ||
- At the autoscaler level, it cannot accurately calculate the appropriate number of replicas based on the game server game status. | ||
|
||
In this way, the automatic scaling mechanism based on native Kubernetes will cause two major problems in the game scenario: | ||
|
||
- The number of scaling down is not accurate. It is easy to delete too many or too few game servers. | ||
- The scaling-down object is not accurate. It is easy to delete game servers with high game load levels. | ||
|
||
|
||
The automatic scaling mechanism of OKG is shown in the following figure: | ||
|
||
![autoscaling-okg-en.png](../../images/autoscaling-okg-en.png) | ||
|
||
- At the game server level, each game server can report its own status and expose whether it is in the WaitToBeDeleted state through custom service quality or external components. | ||
- At the workload level, the GameServerSet can determine the scaling-down object based on the business status reported by the game server. As described in Game Server Horizontal Scaling, the game server in the WaitToBeDeleted state is the highest priority game server to be deleted during scaling down. | ||
- At the autoscaler level, accurately calculate the number of game servers in the WaitToBeDeleted state, and use it as the scaling-down quantity to avoid accidental deletion. | ||
|
||
In this way, OKG's automatic scaler will only delete game servers in the WaitToBeDeleted state during the scaling-down window, achieving targeted and precise scaling down. | ||
|
||
## Usage Example | ||
|
||
_**Prerequisites: Install [KEDA](https://keda.sh/docs/2.10/deploy/) in the cluster.**_ | ||
|
||
Deploy the ScaledObject object to set the automatic scaling strategy. Refer to the [ScaledObject API](https://github.com/kedacore/keda/blob/main/apis/keda/v1alpha1/scaledobject_types.go) for the specific field meanings. | ||
|
||
```yaml | ||
apiVersion: keda.sh/v1alpha1 | ||
kind: ScaledObject | ||
metadata: | ||
name: minecraft # Fill in the name of the corresponding GameServerSet | ||
spec: | ||
scaleTargetRef: | ||
name: minecraft # Fill in the name of the corresponding GameServerSet | ||
apiVersion: game.kruise.io/v1alpha1 | ||
kind: GameServerSet | ||
pollingInterval: 30 | ||
minReplicaCount: 0 | ||
advanced: | ||
horizontalPodAutoscalerConfig: | ||
behavior: # Inherit from HPA behavior, refer to https://kubernetes.io/zh-cn/docs/tasks/run-application/horizontal-pod-autoscale/#configurable-scaling-behavior | ||
scaleDown: | ||
stabilizationWindowSeconds: 45 # Set the scaling-down stabilization window time to 45 seconds | ||
policies: | ||
- type: Percent | ||
value: 100 | ||
periodSeconds: 15 | ||
triggers: | ||
- type: external | ||
metricType: Value | ||
metadata: | ||
scalerAddress: kruise-game-external-scaler.kruise-game-system:6000 | ||
|
||
``` | ||
|
||
After deployment, change the opsState of the gs minecraft-0 to WaitToBeDeleted (see [Custom Service Quality](service_qualities.md) for automated setting of game server status). | ||
|
||
```bash | ||
kubectl edit gs minecraft-0 | ||
|
||
... | ||
spec: | ||
deletionPriority: 0 | ||
opsState: WaitToBeDeleted # Set to None initially, and change it to WaitToBeDeleted | ||
updatePriority: 0 | ||
... | ||
|
||
``` | ||
|
||
After the scaling-down window period, the game server minecraft-0 is automatically deleted. | ||
|
||
```bash | ||
kubectl get gs | ||
NAME STATE OPSSTATE DP UP | ||
minecraft-0 Deleting WaitToBeDeleted 0 0 | ||
minecraft-1 Ready None 0 0 | ||
minecraft-2 Ready None 0 0 | ||
|
||
# After a while | ||
... | ||
|
||
kubectl get gs | ||
NAME STATE OPSSTATE DP UP | ||
minecraft-1 Ready None 0 0 | ||
minecraft-2 Ready None 0 0 | ||
|
||
``` |
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
## 功能概览 | ||
|
||
游戏服与无状态业务类型不同,对于自动伸缩特性有着更高的要求,其要求主要体现在缩容方面。 | ||
|
||
由于游戏为强有状态业务,随着时间的推移,游戏服之间的差异性愈加明显,缩容的精确度要求极高,粗糙的缩容机制容易造成玩家断线等负面影响,给业务造成巨大损失。 | ||
|
||
原生Kubernetes中的水平伸缩机制如下图所示 | ||
|
||
![autoscaling-k8s.png](../../images/autoscaling-k8s.png) | ||
|
||
在游戏场景下,它的主要问题在于: | ||
|
||
- 在pod层面,无法感知游戏服业务状态,进而无法通过业务状态设置删除优先级 | ||
- 在workload层面,无法根据业务状态选择缩容对象 | ||
- 在autoscaler层面,无法定向感知游戏服业务状态计算合适的副本数目 | ||
|
||
这样一来,基于原生Kubernetes的自动伸缩机制将在游戏场景下造成两大问题: | ||
|
||
- 缩容数目不精确。容易删除过多或过少的游戏服。 | ||
- 缩容对象不精确。容易删除业务负载水平高的游戏服。 | ||
|
||
OKG 的自动伸缩机制如下所示 | ||
|
||
![autoscaling-okg.png](../../images/autoscaling-okg.png) | ||
|
||
- 在游戏服层面,每个游戏服可以上报自身状态,通过自定义服务质量或外部组件来暴露自身是否为WaitToBeDeleted状态。 | ||
- 在workload层面,GameServerSet可根据游戏服上报的业务状态来决定缩容的对象,如[游戏服水平伸缩](../快速开始/游戏服水平伸缩.md)中所述,WaitToBeDeleted的游戏服是删除优先级最高的游戏服,缩容时最优先删除。 | ||
- 在autoscaler层面,精准计算WaitToBeDeleted的游戏服个数,将其作为缩容数量,不会造成误删的情况。 | ||
|
||
如此一来,OKG的自动伸缩器在缩容窗口期内只会删除处于WaitToBeDeleted状态的游戏服,真正做到定向缩容、精准缩容。 | ||
|
||
## 使用示例 | ||
|
||
_**前置条件:在集群中安装 [KEDA](https://keda.sh/docs/2.10/deploy/)**_ | ||
|
||
部署ScaledObject对象来设置自动伸缩策略,具体字段含义可参考 [ScaledObject API](https://github.com/kedacore/keda/blob/main/apis/keda/v1alpha1/scaledobject_types.go) | ||
|
||
```yaml | ||
apiVersion: keda.sh/v1alpha1 | ||
kind: ScaledObject | ||
metadata: | ||
name: minecraft #填写对应GameServerSet的名称 | ||
spec: | ||
scaleTargetRef: | ||
name: minecraft #填写对应GameServerSet的名称 | ||
apiVersion: game.kruise.io/v1alpha1 | ||
kind: GameServerSet | ||
pollingInterval: 30 | ||
minReplicaCount: 0 | ||
advanced: | ||
horizontalPodAutoscalerConfig: | ||
behavior: #继承HPA策略,可参考文档 https://kubernetes.io/zh-cn/docs/tasks/run-application/horizontal-pod-autoscale/#configurable-scaling-behavior | ||
scaleDown: | ||
stabilizationWindowSeconds: 45 #设置缩容稳定窗口时间为45秒 | ||
policies: | ||
- type: Percent | ||
value: 100 | ||
periodSeconds: 15 | ||
triggers: | ||
- type: external | ||
metricType: Value | ||
metadata: | ||
scalerAddress: kruise-game-external-scaler.kruise-game-system:6000 | ||
``` | ||
部署完成后,更改gs minecraft-0 的 opsState 为 WaitToBeDeleted(可参考[自定义服务质量](自定义服务质量.md)实现自动化设置游戏服状态) | ||
```bash | ||
kubectl edit gs minecraft-0 | ||
|
||
... | ||
spec: | ||
deletionPriority: 0 | ||
opsState: WaitToBeDeleted #初始为None, 将其改为WaitToBeDeleted | ||
updatePriority: 0 | ||
... | ||
``` | ||
|
||
经过缩容窗口期后,游戏服minecraft-0自动被删除 | ||
```bash | ||
kubectl get gs | ||
NAME STATE OPSSTATE DP UP | ||
minecraft-0 Deleting WaitToBeDeleted 0 0 | ||
minecraft-1 Ready None 0 0 | ||
minecraft-2 Ready None 0 0 | ||
|
||
# After a while | ||
... | ||
|
||
kubectl get gs | ||
NAME STATE OPSSTATE DP UP | ||
minecraft-1 Ready None 0 0 | ||
minecraft-2 Ready None 0 0 | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,75 @@ | ||
package externalscaler | ||
|
||
import ( | ||
"context" | ||
"fmt" | ||
gamekruiseiov1alpha1 "github.com/openkruise/kruise-game/apis/v1alpha1" | ||
"k8s.io/apimachinery/pkg/types" | ||
"k8s.io/klog/v2" | ||
"sigs.k8s.io/controller-runtime/pkg/client" | ||
) | ||
|
||
type ExternalScaler struct { | ||
client client.Client | ||
} | ||
|
||
func (e *ExternalScaler) mustEmbedUnimplementedExternalScalerServer() { | ||
} | ||
|
||
func (e *ExternalScaler) IsActive(ctx context.Context, scaledObject *ScaledObjectRef) (*IsActiveResponse, error) { | ||
return &IsActiveResponse{ | ||
Result: true, | ||
}, nil | ||
} | ||
|
||
func (e *ExternalScaler) StreamIsActive(scaledObject *ScaledObjectRef, epsServer ExternalScaler_StreamIsActiveServer) error { | ||
return nil | ||
} | ||
|
||
func (e *ExternalScaler) GetMetricSpec(ctx context.Context, scaledObjectRef *ScaledObjectRef) (*GetMetricSpecResponse, error) { | ||
name := scaledObjectRef.GetName() | ||
ns := scaledObjectRef.GetNamespace() | ||
gss := &gamekruiseiov1alpha1.GameServerSet{} | ||
err := e.client.Get(ctx, types.NamespacedName{Namespace: ns, Name: name}, gss) | ||
if err != nil { | ||
klog.Error(err) | ||
return nil, err | ||
} | ||
desireReplicas := gss.Spec.Replicas | ||
klog.Infof("GameServerSet %s/%s TargetSize is %d", ns, name, *desireReplicas) | ||
return &GetMetricSpecResponse{ | ||
MetricSpecs: []*MetricSpec{{ | ||
MetricName: "gssReplicas", | ||
TargetSize: int64(*desireReplicas), | ||
}}, | ||
}, nil | ||
} | ||
|
||
func (e *ExternalScaler) GetMetrics(ctx context.Context, metricRequest *GetMetricsRequest) (*GetMetricsResponse, error) { | ||
name := metricRequest.ScaledObjectRef.GetName() | ||
ns := metricRequest.ScaledObjectRef.GetNamespace() | ||
gss := &gamekruiseiov1alpha1.GameServerSet{} | ||
err := e.client.Get(ctx, types.NamespacedName{Namespace: ns, Name: name}, gss) | ||
if err != nil { | ||
klog.Error(err) | ||
return nil, err | ||
} | ||
currentReplicas := gss.Status.CurrentReplicas | ||
numWaitToBeDeleted := gss.Status.WaitToBeDeletedReplicas | ||
if numWaitToBeDeleted == nil || currentReplicas == 0 { | ||
return nil, fmt.Errorf("GameServerSet %s/%s has not inited", ns, name) | ||
} | ||
klog.Infof("GameServerSet %s/%s desire replicas is %d", ns, name, currentReplicas-*numWaitToBeDeleted) | ||
return &GetMetricsResponse{ | ||
MetricValues: []*MetricValue{{ | ||
MetricName: "gssReplicas", | ||
MetricValue: int64(currentReplicas - *numWaitToBeDeleted), | ||
}}, | ||
}, nil | ||
} | ||
|
||
func NewExternalScaler(client client.Client) *ExternalScaler { | ||
return &ExternalScaler{ | ||
client: client, | ||
} | ||
} |
Oops, something went wrong.