Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slack Notification when importing urls #1229

Conversation

saifrk
Copy link
Collaborator

@saifrk saifrk commented Feb 21, 2025

Fixes Issue #1014.

@saifrk saifrk changed the title Slack Notifications for Collection Imports into COSMOS Slack Notification for Collection Import into COSMOS Feb 21, 2025
Your Name and others added 5 commits February 21, 2025 03:53
…ere-expected-how-many-succeeded-and-how-many-failed' of https://github.com/NASA-IMPACT/COSMOS into 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
…ere-expected-how-many-succeeded-and-how-many-failed' of https://github.com/NASA-IMPACT/COSMOS into 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
@saifrk saifrk changed the title Slack Notification for Collection Import into COSMOS Slack Notification when importing urls Feb 21, 2025
Comment on lines 637 to 652
def count_curated_urls(self):
"""Return the count of Curated URLs for the collection."""
return CuratedUrl.objects.filter(collection=self).count()

def count_dump_urls(self):
"""Return the count of all Dump URLs for the collection."""
return DumpUrl.objects.filter(collection=self).count()

def count_delta_urls(self):
"""Return the count of Delta URLs identified."""
return DeltaUrl.objects.filter(collection=self).count()

def count_marked_for_deletion_urls(self):
"""Return the count of Delta URLs marked for deletion."""
return DeltaUrl.objects.filter(collection=self, to_delete=True).count()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need these as model functions. Can we move them to the tasks file?

@@ -215,6 +216,14 @@ def fetch_and_replace_full_text(collection_id, server_name):
collection.reindexing_status = ReindexingStatusChoices.REINDEXING_READY_FOR_CURATION
collection.save()

curated_count = collection.count_curated_urls()
dump_count = collection.count_dump_urls()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be executed prior to migrate dump to delta. It is advertised to the end user as "num urls successfully imported" or something descriptive of that effect.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urls imported

Your Name and others added 10 commits February 23, 2025 22:19
…how-many-were-expected-how-many-succeeded-and-how-many-failed
…ere-expected-how-many-succeeded-and-how-many-failed' of https://github.com/NASA-IMPACT/COSMOS into 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
…how-many-were-expected-how-many-succeeded-and-how-many-failed
@@ -257,7 +257,7 @@ def get_full_texts(
if total_count is None:
total_count = response.get("TotalRowCount", 0)

yield self._process_rows_to_records(rows)
yield (self._process_rows_to_records(rows), total_count)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a dedicated function def get_total_count() which returns an int of the total count?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When processing the records, we are already looping through batches. In that context, capturing the total count from the first response is efficient and avoids making an extra api call.If our goal was to get the total_server_count without processing the records, then a dedicated query ( or a separate api call) that returns only the count could be considered . Since the slack notification is triggered when we're importing URLs for a collection, we're already processing these records. So, capturing the count during this process seems to be a better approach.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: Since it is not an expensive function to call and adds modularity, I have incorporated it

Your Name and others added 4 commits February 28, 2025 14:20
…how-many-were-expected-how-many-succeeded-and-how-many-failed
…ere-expected-how-many-succeeded-and-how-many-failed' of https://github.com/NASA-IMPACT/COSMOS into 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
@dhanur-sharma dhanur-sharma merged commit 16aae30 into dev Feb 28, 2025
6 checks passed
@dhanur-sharma dhanur-sharma deleted the 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed branch February 28, 2025 20:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add logs when importing URLs so we know how many were expected, how many succeeded, and how many failed
4 participants