-
Notifications
You must be signed in to change notification settings - Fork 263
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
a763109
commit 467d8ad
Showing
9 changed files
with
721 additions
and
423 deletions.
There are no files selected for viewing
105 changes: 105 additions & 0 deletions
105
docs/ecosystem/doris-operator/doris-operator-overview.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,105 @@ | ||
--- | ||
{ | ||
"title": "Doris Kubernetes Operator", | ||
"language": "en" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
为满足用户在 Kubernetes 平台上对 Doris 的高效部署和运维需求诞生的 [Kubernetes Operator](https://github.com/apache/doris-operator)(简称:Doris Operator), | ||
集成了原生 Kubernetes 资源的复杂管理能力,并融合了 Doris 组件间的分布式协同、用户集群形态的按需定制等经验,为用户提供了一个更简洁、高效、易用的容器化部署方案。 | ||
旨在实现 Doris 在 Kubernetes 上的高效管控,帮助用户减少运维管理和学习成本的同时,提供强大的功能和灵活的配置能力。 | ||
|
||
Doris Operator 基于 Kubernetes CustomResourceDefinitions(CRD)实现了 Doris 在 Kubernetes 平台的配置、管理和调度。 Doris Operator 能够根据用户自定义的期望状态,自动创建 Pods 及其他资源以启动服务。通过自动注册机制,可将所有启动的服务整合成一个完整的 Doris 集群。这一实现显着降低了在 Doris 集群中处理配置信息、节点发现与注册、访问通信及健康检查等生产环境必备操作的复杂性和学习成本。 | ||
|
||
## Doris Operator Architecture | ||
|
||
The design of Doris Operator is based on the principle of a two-layer scheduler. The first-layer scheduling of each component uses native StatefulSet and Service resources to directly manage the corresponding Pod service, which makes it fully compatible with open source Kubernetes clusters, including public clouds, private clouds, and self-built Kubernetes platforms. | ||
|
||
Based on the deployment definition provided by Doris Operator, users can customize the Doris deployment state and send it to the Kubernetes cluster through the kubectl management command of Kubernetes. Doris Operator converts the deployment of each service into StatefulSet and its affiliated resources (such as Service) according to the customized state, and then schedules the desired Pods through StatefulSet. It simplifies unnecessary configuration in the StatefulSet specification by abstracting the final state of the Doris cluster, thereby reducing the user's learning cost. | ||
|
||
## Key capabilities | ||
|
||
- **Final state deployment**: | ||
|
||
Kubernetes uses the final state operation and maintenance mode to manage services, and Doris Operator defines a resource type that can describe the Doris cluster - DorisCluster. Users can refer to relevant documents and usage examples to easily configure the required cluster. | ||
Users can send the configuration to the Kubernetes cluster through the Kubernetes command line tool kubectl. Doris Operator automatically builds the required cluster and updates the cluster status to the corresponding resources in real time. This process ensures efficient management and monitoring of the cluster and greatly simplifies operation and maintenance operations. | ||
|
||
- **Easy to expand**: | ||
|
||
Doris Operator supports concurrent real-time horizontal expansion in a cloud disk-based environment. All component services of Doris are deployed and managed through Kubernetes' StatefulSet. When deploying or expanding, Pods are created using StatefulSet's Parallel mode, so that in theory all replicas can be started within the time it takes to start a node. The startup of each replica does not interfere with each other, and when a service fails to start, the startup of other services will not be affected. | ||
Doris Operator uses concurrent mode to start services and has a built-in distributed architecture, which greatly simplifies the process of service expansion. Users only need to set the number of replicas to easily complete the expansion, completely freeing up the complexity of operation and maintenance operations. | ||
|
||
- **Unnoticeable changes**: | ||
|
||
In a distributed environment, service restarts may cause temporary instability of services. Especially for services such as databases that have extremely high requirements for stability, how to ensure the stability of services during the restart process is a very important topic. Doris uses the following three mechanisms on Kubernetes to ensure the stability of the service restart process, thereby achieving an imperceptible experience for the business during the restart and upgrade process. | ||
|
||
1. Graceful exit | ||
2. Rolling restart | ||
3. Actively stop query allocation | ||
|
||
- **Host system configuration**: | ||
|
||
In some scenarios, it is necessary to configure the host system parameters to achieve the ideal performance of Apache Doris. In the containerized scenario, the uncertainty of host deployment and the difficulty of parameter modification bring challenges to users. To solve this problem, Doris Operator uses Kubernetes's initialization container to make the host parameters configurable. | ||
Doris Operator allows users to configure commands executed on the host and make them effective by initializing containers. To improve availability, Doris Operator abstracts the configuration method of Kubernetes initialization containers, making the setting of host commands simpler and more intuitive. | ||
|
||
- **Persistent configuration**: | ||
|
||
Doris Operator uses the Kubernetes StorageClass mode to provide storage configuration for each service. It allows users to customize the mount directory. When customizing the startup configuration, if the storage directory is modified, the directory can be set as a persistent location in the custom resource, so that the service uses the specified directory in the container to store data. | ||
|
||
- **Runtime debugging**: | ||
|
||
One of the biggest challenges for Trouble Shorting with containerized services is how to debug at runtime. While pursuing availability and ease of use, Doris Operator also provides more convenient conditions for problem location. In the basic image of Doris, a variety of tools for problem location are pre-set. When you need to view the status in real time, you can enter the container through the exec command provided by kubectl and use the built-in tools for troubleshooting. | ||
When the service cannot be started for unknown reasons, Doris Operator provides a Debug running mode. When a Pod is set to Debug startup mode, the container will automatically enter the running state. At this time, you can enter the container through the `exec` command, manually start the service and locate the problem. For details, please refer to [this document](../../install/cluster-deployment/k8s-deploy/compute-storage-coupled/cluster-operation.md#How to enter the container in the case of service-crash) | ||
|
||
## Compatibility | ||
|
||
Doris Operator is developed in accordance with standard K8s specifications and is compatible with all standard K8s platforms, including those provided by mainstream cloud vendors, self-built K8s platforms based on standards, and user-built platforms. | ||
### Cloud vendor compatibility | ||
|
||
Fully compatible with the containerized service platforms of mainstream cloud vendors. For environment preparation and usage suggestions for Doris Operator, please refer to the following documents: | ||
|
||
- [Alibaba Cloud](./on-alibaba) | ||
|
||
- [AWS](./on-aws) | ||
|
||
## Installation and management | ||
|
||
### Prerequisites | ||
|
||
Before deployment, you need to check the host system. Refer to [Operating System Check](../../install/preparation/os-checking.md) | ||
|
||
### Deploy Doris Operator | ||
|
||
Before deploying Doris Operator on Kubernetes, you need to install Doris Operator CRD and Doris Operator management components. | ||
|
||
* For detailed installation documents, please refer to: [Doris Operator Installation](../../install/cluster-deployment/k8s-deploy/compute-storage-coupled/install-doris-operator.md) | ||
|
||
### Deploy Doris cluster | ||
|
||
* For cluster configuration documents, please refer to: [Doris Operator Cluster Configuration](../../install/cluster-deployment/k8s-deploy/compute-storage-coupled/install-config-cluster.md) | ||
* For installation documents, please refer to: [Doris Cluster Installation](../../install/cluster-deployment/k8s-deploy/compute-storage-coupled/install-doris-cluster.md) | ||
|
||
### Cluster operation and maintenance | ||
|
||
* For cluster operation and maintenance documents, please refer to: [Doris Operator Cluster Operation](../../install/cluster-deployment/k8s-deploy/compute-storage-coupled/cluster-operation.md) | ||
* For cluster access documents, please refer to: [Doris Operator Cluster Access](../../install/cluster-deployment/k8s-deploy/compute-storage-coupled/access-cluster.md) | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,158 @@ | ||
--- | ||
{ | ||
"title": "Recommendations on Alibaba Cloud", | ||
"language": "en" | ||
} | ||
--- | ||
|
||
<!-- | ||
Licensed to the Apache Software Foundation (ASF) under one | ||
or more contributor license agreements. See the NOTICE file | ||
distributed with this work for additional information | ||
regarding copyright ownership. The ASF licenses this file | ||
to you under the Apache License, Version 2.0 (the | ||
"License"); you may not use this file except in compliance | ||
with the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, | ||
software distributed under the License is distributed on an | ||
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY | ||
KIND, either express or implied. See the License for the | ||
specific language governing permissions and limitations | ||
under the License. | ||
--> | ||
|
||
## Alibaba ACK | ||
|
||
Alibaba Cloud Container Service ACK is a managed containerized service after purchasing an ECS instance, so you can obtain full access control permissions to adjust related system parameters. Use the instance image: Alibaba Cloud Linux 3. The current system parameters fully meet the requirements for running Doris. Those that do not meet the requirements can also be corrected in the container through the K8s privileged mode to ensure stable operation. | ||
**Alibaba Cloud ACK cluster, deployed using Doris Operator, most environmental requirements can be met by the ECS default configuration. If not met, Doris Operator can correct it by itself**. Users can also manually correct it, as follows: | ||
|
||
### Already exists cluster | ||
|
||
If the Container Service cluster has already been created, you can modify it by referring to this document: [Cluster Environment OS Checking](../../install/preparation/os-checking.md) | ||
Focus on the BE startup parameter requirements: | ||
1. Disable and close swap: `swapon --show` will not be output if it is not enabled | ||
2. Check the maximum number of open file handles in the system `ulimit -n` | ||
3. Check and modify the number of virtual memory areas `sysctl vm.max_map_count` | ||
4. Whether transparent huge pages are closed `cat /sys/kernel/mm/transparent_hugepage/enabled` contains never | ||
The default values of the corresponding parameters are as follows: | ||
|
||
```shell | ||
[root@iZj6c12a1czxk5oer9rbp8Z ~]# swapon --show | ||
[root@iZj6c12a1czxk5oer9rbp8Z ~]# ulimit -n | ||
65535 | ||
[root@iZj6c12a1czxk5oer9rbp8Z ~]# sysctl vm.max_map_count | ||
vm.max_map_count = 262144 | ||
[root@iZj6c12a1czxk5oer9rbp8Z ~]# cat /sys/kernel/mm/transparent_hugepage/enabled | ||
[always] madvise never | ||
``` | ||
|
||
### Create a new cluster | ||
|
||
If the cluster has not been purchased and created, you can click "Create Cluster" in the Alibaba Cloud Container Service ACK console to purchase it. You can adjust the configuration as needed. The above parameters can be added to the system adjustment script in "Instance Pre-customized Data" in the "Node Pool Configuration" step of creating a cluster. | ||
After the cluster is started, restart the node to complete the configuration. The reference script is as follows: | ||
|
||
```shell | ||
#!/bin/bash | ||
chmod +x /etc/rc.d/rc.local | ||
echo "sudo systemctl stop firewalld.service" >> /etc/rc.d/rc.local | ||
echo "sudo systemctl disable firewalld.service" >> /etc/rc.d/rc.local | ||
echo "sysctl -w vm.max_map_count=2000000" >> /etc/rc.d/rc.local | ||
echo "swapoff -a" >> /etc/rc.d/rc.local | ||
current_limit=$(ulimit -n) | ||
desired_limit=1000000 | ||
config_file="/etc/security/limits.conf" | ||
if [ "$current_limit" -ne "$desired_limit" ]; then | ||
echo "* soft nofile 1000000" >> "$config_file" | ||
echo "* hard nofile 1000000" >> "$config_file" | ||
fi | ||
``` | ||
|
||
## Alibaba ACS | ||
|
||
The ACS service is a cloud computing service that uses K8s as the user interface to provide container computing resources, providing elastic computing resources that are billed on demand. Unlike the above ACK, you do not need to pay attention to the specific use of ECS. | ||
The following points should be noted when using ACS: | ||
|
||
### Image repository | ||
|
||
When using ACS, it is recommended to use the supporting Alibaba [Container Registry](https://www.alibabacloud.com/en/product/container-registry)(ACR). The personal and enterprise versions are enabled on demand. | ||
|
||
After configuring the ACR and image transfer environment, you need to migrate the official image provided by Doris to the corresponding ACR. | ||
|
||
If you use a private ACR to enable authentication, you can refer to the following steps: | ||
|
||
1. You need to set a `secret` of type `docker-registry` in advance to configure the authentication information for accessing the image warehouse. | ||
|
||
```shell | ||
kubectl create secret docker-registry image-hub-secret --docker-server={your-server} --docker-username={your-username} --docker-password={your-pwd} | ||
``` | ||
|
||
2. Configure the secret using the above steps on DCR: | ||
|
||
```yaml | ||
spec: | ||
feSpec: | ||
replicas: 1 | ||
image: crpi-4q6quaxa0ta96k7h-vpc.cn-hongkong.personal.cr.aliyuncs.com/selectdb-test/doris.fe-ubuntu:3.0.3 | ||
imagePullSecrets: | ||
- name: image-hub-secret | ||
beSpec: | ||
replicas: 3 | ||
image: crpi-4q6quaxa0ta96k7h-vpc.cn-hongkong.personal.cr.aliyuncs.com/selectdb-test/doris.be-ubuntu:3.0.3 | ||
imagePullSecrets: | ||
- name: image-hub-secret | ||
systemInitialization: | ||
initImage: crpi-4q6quaxa0ta96k7h-vpc.cn-hongkong.personal.cr.aliyuncs.com/selectdb-test/alpine:latest | ||
``` | ||
### Be systemInitialization | ||
Currently, Alibaba Cloud is gradually pushing the ability to enable privileged mode on fully managed ACS services (some regions may not be enabled yet, you can submit a work order to apply for the ability to be enabled). | ||
The Doris BE node startup requires some special environment parameters, such as Modify the number of virtual memory areas `sysctl -w vm.max_map_count=2000000` | ||
Setting this parameter inside the container requires modifying the host configuration, so regular K8s clusters need to enable privileged mode in the pod. Operator adds `InitContainer` to the BE pod through `systemInitialization` to perform such operations. | ||
|
||
:::tip Tip | ||
**If the current cluster cannot use privileged mode, the BE node cannot be started**. You can choose ACK container service + host to deploy the cluster. | ||
::: | ||
|
||
### Service | ||
|
||
Since the ACS service is a cloud computing service that uses K8s as the user interface to provide container computing resources, it provides computing resources. Its nodes are virtual computing resources, and users do not need to pay attention to them. They are charged according to the amount of resources used, and can be expanded infinitely. That is, there is no physical concept of conventional nodes: | ||
|
||
```shell | ||
$ kubectl get nodes | ||
NAME STATUS ROLES AGE VERSION | ||
virtual-kubelet-cn-hongkong-d Ready agent 27h v1.31.1-aliyun.1 | ||
``` | ||
|
||
Therefore, when deploying the Doris cluster, serviceType disables the NodePort mode and allows the use of ClusterIP and LB modes. | ||
|
||
- ClusterIP mode: | ||
|
||
ClusterIP modethe default network mode of Operator. For specific usage and access methods, please refer to [this document](https://kubernetes.io/docs/concepts/services-networking/service/#type-clusterip) | ||
|
||
- Load balancing mode: | ||
|
||
can be configured as follows: | ||
|
||
- Configure LB access through the DCR service annotations provided by Operator. The steps are as follows: | ||
1. A CLB or NLB instance has been created through the load balancing console, and the instance is in the same region as the ACK cluster. If you haven't created one yet, see [Create and manage a CLB instance](https://www.alibabacloud.com/help/en/slb/classic-load-balancer/user-guide/create-and-manage-a-clb-instance) and [Create and manage an NLB instance](https://www.alibabacloud.com/help/en/slb/network-load-balancer/user-guide/create-and-manage-an-nlb-instance)。 | ||
2. Through DCR configuration, the access annotations of the above LB are in the following format: | ||
```yaml | ||
feSpec: | ||
replicas: 3 | ||
image: crpi-4q6quaxa0ta96k7h-vpc.cn-hongkong.personal.cr.aliyuncs.com/selectdb-test/doris.fe-ubuntu:3.0.3 | ||
service: | ||
type: LoadBalancer | ||
annotations: | ||
service.beta.kubernetes.io/alibaba-cloud-loadbalancer-address-type: "intranet" | ||
``` | ||
|
||
- Host the LB service through the ACS console and generate a statefulset service bound to the corresponding resource control of FE or BE | ||
The steps are as follows: | ||
1. serviceType is ClusterIP (default policy) | ||
2. You can create a load balancing service through the Alibaba Cloud console interface: Container Compute Service ACS -> Cluster List -> Cluster -> Service, and use the `Create` button. | ||
3. Select the newly created LB in the interface for creating `service`, which will be bound to `service` and will also be deregistered when the `service` is deregistered. However, this `service` is not controlled by Doris Operator. | ||
|
Oops, something went wrong.