Skip to content

Commit

Permalink
[Chore] Automatically cleanup old resources each night (#400)
Browse files Browse the repository at this point in the history
## Problem

Sometimes when test cleanup steps fail, indexes and collections get left
behind.

## Solution

Create a nightly job to cleanup leftover indexes. Inspect the names of
each index and collection to see whether they are more than 24 hours old
prior to deleting. This should prevent deleting resources out from
underneath any tests that may be running at the same time as the delete
job.

## Type of Change

- [x] Infrastructure change (CI configs, etc)
  • Loading branch information
jhamon authored Oct 21, 2024
1 parent c13a249 commit 3780924
Show file tree
Hide file tree
Showing 3 changed files with 82 additions and 4 deletions.
17 changes: 17 additions & 0 deletions .github/workflows/cleanup-nightly.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
name: 'Cleanup All Indexes/Collections (Nightly)'

on:
schedule:
- cron: '5 22 * * *' # 5 minutes after 10pm UTC, every day

jobs:
cleanup-all:
name: Cleanupu all indexes/collections
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Cleanup all
uses: ./.github/actions/cleanup-all
with:
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
DELETE_ALL: false
3 changes: 2 additions & 1 deletion .github/workflows/cleanup.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ jobs:
- name: Cleanup all
uses: ./.github/actions/cleanup-all
with:
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
PINECONE_API_KEY: ${{ secrets.PINECONE_API_KEY }}
DELETE_ALL: true
66 changes: 63 additions & 3 deletions scripts/cleanup-all.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
import os
import re
from pinecone import Pinecone
from datetime import datetime, timedelta


def main():
pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY", None))

def delete_everything(pc):
for collection in pc.list_collections().names():
try:
print("Deleting collection: " + collection)
Expand All @@ -22,5 +22,65 @@ def main():
pass


def parse_date(resource_name):
match = re.search(r"-\d{8}-", resource_name)
if match:
date_string = match.group(0).strip("-")
return datetime.strptime(date_string, "%Y%m%d")
else:
return None


def is_resource_old(resource_name):
print(f"Checking resource name: {resource_name}")
resource_datetime = parse_date(resource_name)
if resource_datetime is None:
return False
current_time = datetime.now()

# Calculate the difference
time_difference = current_time - resource_datetime

# Check if the time difference is greater than 24 hours
print(f"Resource timestamp: {resource_datetime}")
print(f"Time difference: {time_difference}")
return time_difference > timedelta(hours=24)


def delete_old(pc):
for collection in pc.list_collections().names():
if is_resource_old(collection):
try:
print("Deleting collection: " + collection)
pc.delete_collection(collection)
except Exception as e:
print("Failed to delete collection: " + collection + " " + str(e))
pass
else:
print("Skipping collection, not old enough: " + collection)

for index in pc.list_indexes().names():
if is_resource_old(index):
try:
print("Deleting index: " + index)
pc.delete_index(index)
except Exception as e:
print("Failed to delete index: " + index + " " + str(e))
pass
else:
print("Skipping index, not old enough: " + index)


def main():
pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY", None))

if os.environ.get("DELETE_ALL", None) == "true":
print("Deleting everything")
delete_everything(pc)
else:
print("Deleting old resources")
delete_old(pc)


if __name__ == "__main__":
main()

0 comments on commit 3780924

Please sign in to comment.