-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slack Notification when importing urls #1229
Slack Notification when importing urls #1229
Conversation
for more information, see https://pre-commit.ci
…ere-expected-how-many-succeeded-and-how-many-failed' of https://github.com/NASA-IMPACT/COSMOS into 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
for more information, see https://pre-commit.ci
…ere-expected-how-many-succeeded-and-how-many-failed' of https://github.com/NASA-IMPACT/COSMOS into 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
sde_collections/models/collection.py
Outdated
def count_curated_urls(self): | ||
"""Return the count of Curated URLs for the collection.""" | ||
return CuratedUrl.objects.filter(collection=self).count() | ||
|
||
def count_dump_urls(self): | ||
"""Return the count of all Dump URLs for the collection.""" | ||
return DumpUrl.objects.filter(collection=self).count() | ||
|
||
def count_delta_urls(self): | ||
"""Return the count of Delta URLs identified.""" | ||
return DeltaUrl.objects.filter(collection=self).count() | ||
|
||
def count_marked_for_deletion_urls(self): | ||
"""Return the count of Delta URLs marked for deletion.""" | ||
return DeltaUrl.objects.filter(collection=self, to_delete=True).count() | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need these as model functions. Can we move them to the tasks file?
sde_collections/tasks.py
Outdated
@@ -215,6 +216,14 @@ def fetch_and_replace_full_text(collection_id, server_name): | |||
collection.reindexing_status = ReindexingStatusChoices.REINDEXING_READY_FOR_CURATION | |||
collection.save() | |||
|
|||
curated_count = collection.count_curated_urls() | |||
dump_count = collection.count_dump_urls() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should be executed prior to migrate dump to delta. It is advertised to the end user as "num urls successfully imported" or something descriptive of that effect.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urls imported
…how-many-were-expected-how-many-succeeded-and-how-many-failed
for more information, see https://pre-commit.ci
…ere-expected-how-many-succeeded-and-how-many-failed' of https://github.com/NASA-IMPACT/COSMOS into 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
for more information, see https://pre-commit.ci
…how-many-were-expected-how-many-succeeded-and-how-many-failed
sde_collections/sinequa_api.py
Outdated
@@ -257,7 +257,7 @@ def get_full_texts( | |||
if total_count is None: | |||
total_count = response.get("TotalRowCount", 0) | |||
|
|||
yield self._process_rows_to_records(rows) | |||
yield (self._process_rows_to_records(rows), total_count) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make a dedicated function def get_total_count()
which returns an int of the total count?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When processing the records, we are already looping through batches. In that context, capturing the total count from the first response is efficient and avoids making an extra api call.If our goal was to get the total_server_count without processing the records, then a dedicated query ( or a separate api call) that returns only the count could be considered . Since the slack notification is triggered when we're importing URLs for a collection, we're already processing these records. So, capturing the count during this process seems to be a better approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: Since it is not an expensive function to call and adds modularity, I have incorporated it
for more information, see https://pre-commit.ci
…how-many-were-expected-how-many-succeeded-and-how-many-failed
…how-many-were-expected-how-many-succeeded-and-how-many-failed
…ere-expected-how-many-succeeded-and-how-many-failed' of https://github.com/NASA-IMPACT/COSMOS into 1014-add-logs-when-importing-urls-so-we-know-how-many-were-expected-how-many-succeeded-and-how-many-failed
Fixes Issue #1014.