Skip to content

Commit

Permalink
Add [user|VO|cluster] stats to API (#50)
Browse files Browse the repository at this point in the history
* feat: add deployments stats endpoint

* fix: mount logs in docker compose file

* fix: add localhost:8080 to CORS

* update readme

* update readme

* fix: `user_stats` bug

* Fix stats cache issue

* refactor: change varnames

---------

Co-authored-by: Marta Obregón <[email protected]>

* feat: add cluster stats

* Add endpoint to get cluster stats

* Add function to get gpu flavours

* Get the resources without using flavours csv

* refactor: clean implementation

* fix: cache response

* fix: properly count gpus

---------

Co-authored-by: Marta Obregón <[email protected]>

* fix: stats VO timeseries should only return last three months

* refactor(ClusterStats): Adapt stats to dashboard

* feat: run stats computation in a background task

* perf(ClusterStats): add periodic background task to compute cluster stats

* perf(ClusterStats): use repeat_every annotation instead of while

* fix(ClusterStats): fix asynchronous function and cache

* fix(ClusterStats): clear cache in main

* build: add fastapi-utils to requirements

* fix: upgrade fastapi version

* style: style fixes

---------

Co-authored-by: Marta Obregón <[email protected]>

* feat: add stats for each gpu model (#41)

* feat(Stats): show gpu stats per model

* refactor: syntax fix

---------

Co-authored-by: Marta Obregón <[email protected]>

* feat: add datacenter stats (#48)

* feat(datacenters): add datacenter info, and job number and gpu models of each node

* Add datacenters info csv

* feat: add federated cluster datacenters

* fat: allow for VO specific stats

* feat: ignore datacenters with no nodes

* refactor: avoid hardcoding headers

* fix: avoid breaking if info not found

* fix: fix datacenter file path

---------

Co-authored-by: Marta Obregón <[email protected]>

* fix: slightly modify IFCA locations to avoid collapsing in the same point

* fix: fix disk used

---------

Co-authored-by: Marta Obregón <[email protected]>
  • Loading branch information
IgnacioHeredia and MartaOB authored May 7, 2024
1 parent 3733159 commit bc19170
Show file tree
Hide file tree
Showing 10 changed files with 402 additions and 4 deletions.
2 changes: 1 addition & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ lib/
lib64/
parts/
sdist/
var/
# var/ #################################### DO NOT IGNORE IN PAPI ################
*.egg-info/
.installed.cfg
*.egg
Expand Down
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -202,6 +202,21 @@ More details can be found in the [API docs](https://api.cloud.ai4eosc.eu/docs).
* `/v1/deployments/`: (🔒)
deploy modules/tools in the platform to perform trainings

* `/v1/stats/deployments/`: (🔒)
retrieve usage stats for users and overall platform.

<details>
<summary>Requirements</summary>

For this you need to declare a ENV variable with the path of the Nomad cluster
logs repo:
```bash
export ACCOUNTING_PTH="/your/custom/path/ai4-accounting"
```
It will serve the contents of the `ai4-accounting/summaries` folder.
</details>


<details>
<summary>The API methods can also be accessed by interacting directly with
the Python package.</summary>
Expand Down
25 changes: 24 additions & 1 deletion ai4papi/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,16 +2,19 @@
Create an app with FastAPI
"""

from contextlib import asynccontextmanager
import fastapi
import uvicorn

from ai4papi.conf import MAIN_CONF, paths
from fastapi.responses import FileResponse
from ai4papi.routers import v1
from ai4papi.routers.v1.stats.deployments import get_cluster_stats_bg
from fastapi.middleware.cors import CORSMiddleware
from fastapi_utils.tasks import repeat_every


description=(
description = (
"<img"
" src='https://ai4eosc.eu/wp-content/uploads/sites/10/2023/01/horizontal-bg-dark.png'"
" width=200 alt='' />"
Expand Down Expand Up @@ -39,9 +42,19 @@

)

@asynccontextmanager
async def lifespan(app: fastapi.FastAPI):
# on startup
await get_cluster_stats_thread()
yield
# on shutdown
# (nothing to do)


app = fastapi.FastAPI(
title="AI4EOSC Platform API",
description=description,
lifespan=lifespan,
)

app.add_middleware(
Expand Down Expand Up @@ -111,5 +124,15 @@ def run(
)


# Compute cluster stats in background task
@repeat_every(seconds=30)
async def get_cluster_stats_thread():
"""
Recompute cluster stats
"""
get_cluster_stats_bg.cache_clear()
get_cluster_stats_bg()


if __name__ == "__main__":
run()
3 changes: 2 additions & 1 deletion ai4papi/routers/v1/__init__.py
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
import fastapi

from . import catalog, deployments, secrets
from . import catalog, deployments, secrets, stats

app = fastapi.APIRouter()
app.include_router(catalog.app)
app.include_router(deployments.app)
app.include_router(secrets.router)
app.include_router(stats.app)


@app.get(
Expand Down
10 changes: 10 additions & 0 deletions ai4papi/routers/v1/stats/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
import fastapi

from . import deployments


app = fastapi.APIRouter()
app.include_router(
router=deployments.router,
prefix='/deployments',
)
Loading

0 comments on commit bc19170

Please sign in to comment.