Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: Agent 配置重载功能优化 (closed #1769) #1861

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d5df9a4
minor: start new version 2.4.2
ZhuoZhuoCrayon Aug 30, 2023
199242b
feature: 插件包描述文件支持定义配置模板填写参数(closed #654)
neko12583 Aug 15, 2023
2527251
feat: 插件包描述文件支持定义配置模板填写参数(closed #654)
GONGONGONG Sep 5, 2023
b5b81f6
feat: 2.0 Proxy 支持配置临时文件传输路径 (closed #1757) (#1795)
GONGONGONG Sep 6, 2023
9bb624e
feature: Agent 额外配置目录支持 (closed #1756) (#1786)
CohleRustW Sep 6, 2023
6ac3747
feature: 2.0 Proxy 支持配置临时文件传输路径 (closed #1757) (#1794)
CohleRustW Sep 7, 2023
8e3e1a8
sprintfix: 修复 gse windows 环境变量默认值错误 (closed #1756) (#1807)
CohleRustW Sep 8, 2023
f968da3
feature: 提供同步主机时接入点选择策略 (closed #1790) (#1801)
wyyalt Sep 11, 2023
5c5ef23
feature: Agent / Proxy 类操作支持多接入点 Agent 操作 (closed #1714) (#1791)
wyyalt Sep 11, 2023
3b13913
bugfix: 插件标签映射错误 (closed #1799)
CohleRustW Sep 6, 2023
773282d
feature: 接入点屏蔽方案 (closed #1796)
CohleRustW Sep 8, 2023
9765a4c
feature: 安装通道兼容多套接入点的场景 (closed #1784)
wyyalt Sep 7, 2023
fcf3bf7
bugfix: 修复历史任务列表单测有概率失败问题(fixed #1631)
neko12583 Sep 12, 2023
89743ca
feature: 支持细粒度控制渲染 Agent 包进程拉起 & 托管模式 (closed #1810)
ZhuoZhuoCrayon Sep 13, 2023
f150c40
feat: 接入点屏蔽方案 (closed #1796)
GONGONGONG Sep 14, 2023
1945117
bugfix: 修改GSE_ENVIRON_WIN_DIR 默认路径 (closed #1829)
wyyalt Sep 19, 2023
a509bb0
sprintfix: Agent 额外配置目录配置路径调整 (closed #1756)
CohleRustW Sep 19, 2023
9a7c764
sprintfix: 修复额外配置目录路径错误 (closed #1756)
CohleRustW Sep 20, 2023
03468e0
bugfix: 修复安装通道场景下发安装包到Proxy 接入点适配 (closed #1834)
wyyalt Sep 22, 2023
688d73b
sprintfix: 修复配置渲染失败的问题 (fixed #654)
ZhuoZhuoCrayon Sep 25, 2023
95ba208
feature: 提供增量业务自动加入 GSE 2.0 灰度的能力 (closed #1805)
wyyalt Sep 18, 2023
281abb8
feature: Py36 编译 Dockerfile (closed #1827)
CohleRustW Sep 19, 2023
6d572a9
bugfix: 修复Windows部分机型脚本卡住问题 (closed #1839)
wyyalt Sep 25, 2023
a53b110
sprintfix: 修复 Proxy 缓存目录在安装额外 Agent 时错误
CohleRustW Sep 22, 2023
fa05e30
bugfix: 安装脚本创建相关目录失败时不报错 (closed #1823)
CohleRustW Sep 18, 2023
a207837
bugfix: 修复 Windows P-Agent 2.0 安装冗余探测 20020 端口的问题 (fixed #1841)
ZhuoZhuoCrayon Oct 10, 2023
fce1736
bugfix: 新增或更新主机信息异常 (closed #1845)
CohleRustW Oct 9, 2023
2ad6dbb
feature: 安装脚本健康检查重试机制 (closed #1854)
CohleRustW Oct 12, 2023
3a7aea1
bugfix: 针对渲染插件配置文件缺失 control_info 变量增加单元测试(fixed #654)
neko12583 Oct 8, 2023
30ef612
sprintfix: setup_proxy 安装脚本健康检查重试机制语法错误 (closed #1854)
CohleRustW Oct 16, 2023
19dd095
bugfix: windows安装脚本下载包异常修复 (closed #1873)
wyyalt Oct 19, 2023
f8344e6
feature: 可观测建设 (closed #1852)
ZhuoZhuoCrayon Oct 23, 2023
4b38911
bugfix: 修复gsectl脚本获取进程启动时间重复问题 (closed #1870)
wyyalt Oct 19, 2023
d456910
sprintfix: 依赖补充 (fixed #1852)
ZhuoZhuoCrayon Oct 23, 2023
16b39b0
minor: auto push 2.4.2 release log
ZhuoZhuoCrayon Oct 24, 2023
cd3c323
minor: start new version 2.4.3
ZhuoZhuoCrayon Oct 24, 2023
176f530
bugfix: 安装额外Agent查询任务状态修复 (closed #1881)
wyyalt Oct 24, 2023
79d60af
feature: plugin_search 接口支持返回部分接入点信息 (closed #1858)
ping15 Oct 19, 2023
cc843c6
feature: Agent 配置重载功能优化 (closed #1769)
wyyalt Oct 16, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion app.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ introduction: 通过节点管理,可以对蓝鲸体系中的gse agent进行管
introduction_en: NodeMan can be used to manage the gse agent in the BlueKing system.
Its functions include agent installation, status query, version update, plugin management,
health check, process control, and so on.
version: 2.4.1
version: 2.4.3
category: 运维工具
language_support: 英语,中文
desktop:
Expand Down
15 changes: 14 additions & 1 deletion apps/backend/agent/artifact_builder/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -259,10 +259,23 @@ def _inject_dependencies(self, extract_dir: str):
# TODO 后续如果 1.0 Agent 包也纳入管理,此处需要区分
# 1.0 Agent / Proxy 的 gsectl 采用固定的 rclocal 模式
if gsectl_filename == "gsectl" and "{{ AUTO_TYPE }}" in gsectl_file_content:
auto_type: str = models.GlobalSettings.get_config(
auto_type_strategy: typing.Union[str, typing.Dict[str, str]] = models.GlobalSettings.get_config(
models.GlobalSettings.KeyEnum.GSE2_LINUX_AUTO_TYPE.value,
constants.GseLinuxAutoType.RCLOCAL.value,
)
logger.info(f"get auto_type_strategy -> {auto_type_strategy} from global settings")

if isinstance(auto_type_strategy, dict):
try:
auto_type = auto_type_strategy[self.NAME]
except KeyError:
auto_type = constants.GseLinuxAutoType.RCLOCAL.value
else:
if auto_type_strategy not in constants.GseLinuxAutoType.list_member_values():
auto_type = constants.GseLinuxAutoType.RCLOCAL.value
else:
auto_type = auto_type_strategy

logger.info(f"apply auto_type -> {auto_type} to gsectl")
gsectl_file_content = gsectl_file_content.replace("{{ AUTO_TYPE }}", auto_type)
if "{{ AUTO_TYPE }}" in gsectl_file_content:
Expand Down
17 changes: 17 additions & 0 deletions apps/backend/agent/solution_maker.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@
from apps.core.script_manage.base import ScriptHook
from apps.node_man import constants, models
from apps.utils import basic
from apps.utils.files import PathHandler


class ExecutionSolutionStepContent:
Expand Down Expand Up @@ -101,6 +102,14 @@ def choose_script_file(cls, host: models.Host, is_execute_on_target: bool) -> st
script_file_name = constants.SCRIPT_FILE_NAME_MAP[host.os_type]
return script_file_name

@staticmethod
def get_gse_extra_config_dir(os_type: str):
extra_config_sub_dir: str = "user_conf"
if os_type.upper() == constants.OsType.WINDOWS:
return json.dumps(PathHandler(os_type).join(settings.GSE_ENVIRON_WIN_DIR, extra_config_sub_dir))[1:-1]
else:
return PathHandler(os_type).join(settings.GSE_ENVIRON_DIR, extra_config_sub_dir)


class BaseExecutionSolutionMaker(metaclass=abc.ABCMeta):
# 是否直接在目标机器上执行
Expand Down Expand Up @@ -349,11 +358,19 @@ def get_create_pre_dirs_step(self, is_shell_adapter: bool = False) -> ExecutionS
"""
# 目前依赖文件路径相关配置分两类:1-文件名路径,创建上级目录,2-目录路径,暂无需求
filepath_config_names: typing.List[str] = []
filepath_necessary_names: typing.List[str] = []

if self.host.os_type != constants.OsType.WINDOWS:
filepath_config_names.extend(["dataipc"])

if not self.agent_setup_info.is_legacy:
# GSE 1.0 不需要创建额外配置目录
filepath_necessary_names.append(ExecutionSolutionTools.get_gse_extra_config_dir(self.host.os_type))

dirs_to_be_created: typing.Set[str] = {self.dest_dir}
for filepath_necessary_name in filepath_necessary_names:
dirs_to_be_created.add(filepath_necessary_name)

# 获取到相应操作系统
agent_config: typing.Dict[str, typing.Any] = self.host_ap.get_agent_config(self.host.os_type)
for filepath_config_name in filepath_config_names:
Expand Down
2 changes: 1 addition & 1 deletion apps/backend/agent/tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -257,7 +257,7 @@ def gen_commands(
# 批量场景请传入Optional所需对象,以避免 n+1 查询,提高执行效率
host_ap = host_ap or host.ap
identity_data = identity_data or host.identity
install_channel = install_channel or host.install_channel
install_channel = install_channel or host.install_channel(ap_id=host_ap.id)
proxies = proxies if proxies is not None else fetch_proxies(host, host_ap)
# TODO 如果是额外安装场景,这里的 jumpserver 应该选取当前作业平台全业务集可用的机器
gse_servers_info: Dict[str, Any] = fetch_gse_servers_info(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ def static_ip_selector(
except_bk_biz_id: int = sub_inst.instance_info["host"]["bk_biz_id"]
if except_bk_biz_id != cmdb_host["bk_biz_id"]:
self.move_insts_to_failed(
sub_inst.id,
[sub_inst.id],
log_content=_("主机期望注册到业务【ID:{except_bk_biz_id}】,但实际存在于业务【ID: {actual_biz_id}】,请前往该业务进行安装").format(
except_bk_biz_id=except_bk_biz_id, actual_biz_id=cmdb_host["bk_biz_id"]
),
Expand Down Expand Up @@ -132,7 +132,7 @@ def dynamic_ip_selector(
break
else:
self.move_insts_to_failed(
sub_inst.id,
[sub_inst.id],
log_content=_(
"主机期望注册到业务【ID:{except_bk_biz_id}】,但实际存在于业务【ID: {actual_biz_id}】,请前往该业务进行安装"
).format(except_bk_biz_id=except_bk_biz_id, actual_biz_id=cmdb_host["bk_biz_id"]),
Expand Down
47 changes: 31 additions & 16 deletions apps/backend/components/collections/agent_new/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,25 @@
from apps.backend.subscription.steps.agent_adapter.adapter import AgentStepAdapter
from apps.node_man import constants, models
from apps.node_man.exceptions import AliveProxyNotExistsError
from apps.prometheus import metrics
from apps.prometheus.helper import SetupObserve

from .. import job
from ..base import BaseService, CommonData


@dataclass
class AgentCommonData(CommonData):
# 默认接入点
default_ap: models.AccessPoint
# 主机ID - 接入点 映射关系
host_id__ap_map: Dict[int, models.AccessPoint]
# AgentStep 适配器
agent_step_adapter: AgentStepAdapter
# 注入AP_ID
injected_ap_id: int


class AgentBaseService(BaseService, metaclass=abc.ABCMeta):
"""
AGENT安装基类
Expand All @@ -47,6 +61,7 @@ def sub_inst_failed_handler(self, sub_inst_ids: Union[List[int], Set[int]]):
pass

@classmethod
@SetupObserve(histogram=metrics.app_task_engine_get_common_data_duration_seconds, labels={"step_type": "AGENT"})
def get_common_data(cls, data):
"""
初始化常用数据,注意这些数据不能放在 self 属性里,否则会产生较大的 process snap shot,
Expand Down Expand Up @@ -145,15 +160,14 @@ def get_agent_pkg_name(

return (agent_pkg_name.format(cpu_arch=host.cpu_arch), agent_pkg_name)[return_name_with_cpu_tmpl]

@classmethod
def get_agent_pkg_dir(cls, common_data: "AgentCommonData", host: models.Host) -> str:
def get_agent_pkg_dir(self, common_data: "AgentCommonData", host: models.Host) -> str:
"""
获取 Agent 安装包目录
:param common_data: AgentCommonData
:param host: models.Host
:return:
"""
host_ap = common_data.host_id__ap_map[host.bk_host_id]
host_ap: models.AccessPoint = self.get_host_ap(common_data=common_data, host=host)
download_path = host_ap.nginx_path or settings.DOWNLOAD_PATH
if common_data.agent_step_adapter.is_legacy:
# 旧版本 Agent 安装包位于下载目录
Expand All @@ -169,12 +183,12 @@ def get_host_id__install_channel_map(
hosts: List[models.Host],
host_id__sub_inst_id: Dict[int, int],
cloud_id__proxies_map: Dict[int, List[models.Host]],
common_data: AgentCommonData,
) -> Dict[int, Tuple[Optional[models.Host], Dict[str, List]]]:
install_channel_ids: List[int] = list({host.install_channel_id for host in hosts})
install_channel_id__jump_servers_map: Dict[
int, List[models.Host]
] = models.InstallChannel.install_channel_id__host_objs_map(install_channel_ids)

# 建立通道ID - 通道的映射关系
id__install_channel_obj_map: Dict[int, models.InstallChannel] = {}
for install_channel_obj in models.InstallChannel.objects.filter(id__in=install_channel_ids):
Expand All @@ -188,6 +202,7 @@ def get_host_id__install_channel_map(

host_id__install_channel_map: Dict[int, Tuple[Optional[models.Host], Dict[str, List]]] = {}
for host in hosts:
host_ap: models.AccessPoint = self.get_host_ap(common_data=common_data, host=host)
sub_inst_id = host_id__sub_inst_id[host.bk_host_id]
install_channel_obj = id__install_channel_obj_map.get(host.install_channel_id)
if install_channel_obj:
Expand All @@ -198,7 +213,10 @@ def get_host_id__install_channel_map(
[sub_inst_id], log_content=_("所选安装通道「{name}」 没有可用跳板机".format(name=install_channel_obj.name))
)
else:
host_id__install_channel_map[host.bk_host_id] = (jump_server, install_channel_obj.upstream_servers)
host_id__install_channel_map[host.bk_host_id] = (
jump_server,
install_channel_obj.get_upstream_servers_by_ap_id(host_ap.id),
)
elif host.bk_cloud_id != constants.DEFAULT_CLOUD and host.node_type != constants.NodeType.PROXY:
# 仅校验
alive_proxies = cloud_id__alive_proxies_map.get(host.bk_cloud_id, [])
Expand Down Expand Up @@ -233,6 +251,7 @@ def get_host_id__installation_tool_map(
hosts=hosts_need_gen_commands,
host_id__sub_inst_id=common_data.host_id__sub_inst_id_map,
cloud_id__proxies_map=cloud_id__proxies_map,
common_data=common_data,
)

id__sub_inst_obj_map: Dict[int, models.SubscriptionInstanceRecord] = {
Expand Down Expand Up @@ -305,17 +324,13 @@ def maintain_agent_proc_status_uniqueness(self, bk_host_ids: Set[int]) -> None:
proc_statuses_to_be_created.append(models.ProcessStatus(bk_host_id=host_id, **self.agent_proc_common_data))
models.ProcessStatus.objects.bulk_create(proc_statuses_to_be_created, batch_size=self.batch_size)


@dataclass
class AgentCommonData(CommonData):
# 默认接入点
default_ap: models.AccessPoint
# 主机ID - 接入点 映射关系
host_id__ap_map: Dict[int, models.AccessPoint]
# AgentStep 适配器
agent_step_adapter: AgentStepAdapter
# 注入AP_ID
injected_ap_id: int
def get_host_ap(self, common_data: AgentCommonData, host: models.Host) -> models.AccessPoint:
# 优先使用注入的AP ID
if common_data.injected_ap_id:
host_ap: models.AccessPoint = common_data.ap_id_obj_map[common_data.injected_ap_id]
else:
host_ap: models.AccessPoint = common_data.host_id__ap_map[host.bk_host_id]
return host_ap


class RetryHandler:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,8 @@ def get_script_content(self, data, common_data: AgentCommonData, host: models.Ho
path_handler: PathHandler = PathHandler(host.os_type)
ctl_exe_name: str = ("gse_agent", "gse_agent.exe")[host.os_type == constants.OsType.WINDOWS]
general_node_type: str = self.get_general_node_type(host.node_type)
setup_path: str = common_data.host_id__ap_map[host.bk_host_id].get_agent_config(host.os_type)["setup_path"]
host_ap: models.AccessPoint = self.get_host_ap(common_data=common_data, host=host)
setup_path: str = host_ap.get_agent_config(host.os_type)["setup_path"]
agent_path: str = path_handler.join(setup_path, general_node_type, "bin", ctl_exe_name)

return f"{agent_path} --version"
Loading