Skip to content

Commit

Permalink
Merge pull request opensciencegrid#1953 from matyasselmeci/pr/SOFTWAR…
Browse files Browse the repository at this point in the history
…E-4704.cacher

Topology Cacher (SOFTWARE-4704)
  • Loading branch information
matyasselmeci authored Aug 27, 2021
2 parents 76354b2 + 0059a65 commit c6ca431
Show file tree
Hide file tree
Showing 5 changed files with 601 additions and 2 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/validate-data.yml
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,6 @@ jobs:
run: |
./src/tests/verify_authfile.sh
./src/tests/verify_origin_authfile.sh
- name: Validate cacher
run: |
./src/topology_cacher.py --outdir=/tmp/topology-cacher
27 changes: 25 additions & 2 deletions rpm/topology.spec
Original file line number Diff line number Diff line change
@@ -1,32 +1,55 @@
%define __python /usr/bin/python3

Summary: Client tools for OSG Topology
Name: topology-client
Version: 1.4.6
Release: 1%{?dist}
Version: 1.8.0
Release: 0.1%{?dist}
Source: topology-%{version}.tar.gz
License: Apache 2.0
BuildArch: noarch
Url: https://github.com/opensciencegrid/topology/
Requires: python-gnupg
Requires: python-requests


%description
Client tools that interact with OSG Topology data


%package -n topology-cacher
Summary: A utility for periodically downloading OSG Topology data

%description -n topology-cacher
A utility for periodically downloading OSG Topology data.


%prep
%setup -q -n topology-%{version}

%install
install -D -m 0755 bin/osg-notify %{buildroot}/%{_bindir}/osg-notify
install -D -m 0644 src/net_name_addr_utils.py %{buildroot}/%{python_sitelib}/net_name_addr_utils.py
install -D -m 0644 src/topology_utils.py %{buildroot}/%{python_sitelib}/topology_utils.py
install -D -m 0755 src/topology_cacher.py %{buildroot}/%{python_sitelib}/topology_cacher.py
install -D -m 0644 topology-cacher.cron %{buildroot}/etc/cron.d/topology-cacher.cron

%files
%{_bindir}/osg-notify
%{python_sitelib}/net_name_addr_utils.py*
%{python_sitelib}/topology_utils.py*
%{python_sitelib}/__pycache__/net_name_addr_utils*
%{python_sitelib}/__pycache__/topology_utils*

%files -n topology-cacher
%{python_sitelib}/topology_cacher.py*
%{python_sitelib}/__pycache__/topology_cacher*
%config(noreplace) /etc/cron.d/topology-cacher.cron


%changelog
* Mon Aug 16 2021 Mátyás Selmeci <[email protected]> 1.8.0-0.1
- Add topology-cacher (SOFTWARE-4704)

* Thu Mar 18 2021 Mátyás Selmeci <[email protected]> 1.4.6-1
- Fix crash when writing unsigned messages (SOFTWARE-4538)

Expand Down
178 changes: 178 additions & 0 deletions src/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,181 @@ XML consumers
- [Perfsonar ETF](https://topology.opensciencegrid.org/rgsummary/xml?summary_attrs_showservice=on&summary_attrs_showfqdn=on&gip_status_attrs_showtestresults=on&downtime_attrs_showpast=&account_type=cumulative_hours&ce_account_type=gip_vo&se_account_type=vo_transfer_volume&bdiitree_type=total_jobs&bdii_object=service&bdii_server=is-osg&start_type=7daysago&start_date=11%2F17%2F2014&end_type=now&all_resources=on&facility_sel%5B%5D=10009&gridtype=on&gridtype_1=on&active=on&active_value=1&disable_value=0) (OSG)
- SAM (WLCG)
- SiteDB (CMS), soon to be CRIC


## Topology cacher

The topology cacher (`topology_cacher.py`) is a script, designed to be run from cron, that downloads topology XML information,
saves it locally, and combines some of the information into JSON files.

It queries the `/rgsummary/xml` and `/miscproject/xml` endpoints (as-is, no arguments).

In addition to saving the XML files, it creates two JSON files:

- `project_resource_allocations.json` is for looking up resource allocations for projects.
- `resource_info_lookups.json` contains dicts for easier lookups of common queries, such as "resource name by FQDN."

### project_resource_allocations.json

This conversion is done by `TopologyData.get_project_resource_allocations()` which converts XML from miscproject.xml like
```xml
<Projects>
<Project>
<Name>MyProject</Name>
<ResourceAllocations>
<ResourceAllocation>
<Type>Other</Type>
<SubmitResources>
<SubmitResource>Submit1</SubmitResource>
<SubmitResource>Submit2</SubmitResource>
</SubmitResources>
<ExecuteResourceGroups>
<ExecuteResourceGroup>
<GroupName>ExampleNetCEs</GroupName>
<LocalAllocationID>ID1</LocalAllocationID>
</ExecuteResourceGroup>
</ExecuteResourceGroups>
</ResourceAllocation>
</ResourceAllocations>
</Project>
</Projects>
```
into a Python dict like
```python
{
"MyProject": [
{
"type": "Other",
"submit_resources": [
{ "group_name": "ExampleNetSubmits", "name": "Submit1", "fqdn": "submit1.example.net" },
{ "group_name": "ExampleNetSubmits", "name": "Submit2", "fqdn": "submit2.example.net" }
],
"execute_resource_groups": [
{
"group_name": "ExampleNetCEs",
"local_allocation_id": "ID1",
"ces": [
{ "name": "CE1", "fqdn": "ce1.example.net" },
{ "name": "CE2", "fqdn": "ce2.example.net" }
]
}
]
}
]
}
```
Resource names, Resource Group names, CEs, and FQDN info are all taken fron rgsummary.xml.


```json
{
"ACE_LIAID": [],
"CHTC-Staff": [ {
"execute_resource_groups": [ {
"ces": [ {
"fqdn": "itb-slurm-ce.osgdev.chtc.io",
"name": "CHTC-ITB-SLURM-CE"
}
],
"group_name": "CHTC-ITB",
"local_allocation_id": "glow"
}
],
"submit_resources": [ {
"fqdn": "submittest0000.chtc.wisc.edu",
"group_name": "CHTC-ITB",
"name": "CHTC-ITB-submittest0000"
}
],
"type": "Other"
}
]
}
```

Projects data only lists execute resources by resource group but we need to know the possible CEs the job will run on so I add those ad well.


### resource_info_lookups.json example

```json
{
"resource_lists_by_group": {
"AGLT2": [
{
"fqdn": "squid.aglt2.org",
"group_name": "AGLT2",
"name": "AGLT2-squid",
"service_ids": [
"138"
],
"tags": []
},
{
"fqdn": "sl-um-es3.slateci.io",
"group_name": "AGLT2",
"name": "AGLT2-squid-2",
"service_ids": [
"138"
],
"tags": []
}
],
"AMNH": [
{
"fqdn": "hosted-ce22.opensciencegrid.org",
"group_name": "AMNH",
"name": "AMNH-ARES",
"service_ids": [
"1"
],
"tags": [
"CC*"
]
}
]
},
"resources_by_fqdn": {
"249cc.yeg.rac.sh": {
"fqdn": "249cc.yeg.rac.sh",
"group_name": "CyberaEdmonton",
"name": "CYBERA_EDMONTON",
"service_ids": [
"1"
],
"tags": []
},
"40.119.41.40": {
"fqdn": "40.119.41.40",
"group_name": "UCSDT2",
"name": "UCSDT2-Cloud-3-squid",
"service_ids": [
"138"
],
"tags": []
}
},
"resources_by_name": {
"AGLT2-squid": {
"fqdn": "squid.aglt2.org",
"group_name": "AGLT2",
"name": "AGLT2-squid",
"service_ids": [
"138"
],
"tags": []
},
"AGLT2-squid-2": {
"fqdn": "sl-um-es3.slateci.io",
"group_name": "AGLT2",
"name": "AGLT2-squid-2",
"service_ids": [
"138"
],
"tags": []
}
}
}
```
service_ids are numeric -- see `services.yaml` in the Topology data for the corresponding names.

Loading

0 comments on commit c6ca431

Please sign in to comment.