-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(thumbnail cache): Enabling force parameter on screenshot/thumbnail cache #31757
Conversation
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment
|
f05f59a
to
5ac969c
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #31757 +/- ##
===========================================
+ Coverage 0 83.45% +83.45%
===========================================
Files 0 544 +544
Lines 0 38998 +38998
===========================================
+ Hits 0 32547 +32547
- Misses 0 6451 +6451
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
22f419d
to
3112c16
Compare
adding CacheScreenshotPayload state machine
a63c5ec
to
896b029
Compare
/korbit-review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've completed my review and didn't find any issues.
Files scanned
File Path | Reviewed |
---|---|
superset/tasks/thumbnails.py | ✅ |
superset/models/slice.py | ✅ |
superset/dashboards/schemas.py | ✅ |
superset/utils/screenshots.py | ✅ |
superset/utils/webdriver.py | ✅ |
superset/charts/api.py | ✅ |
superset/charts/schemas.py | ✅ |
superset/dashboards/api.py | ✅ |
superset/config.py | ✅ |
Explore our documentation to understand the languages and file types we support and the files we ignore.
Need a new review? Comment
/korbit-review
on this PR and I'll review your latest changes.Korbit Guide: Usage and Customization
Interacting with Korbit
- You can manually ask Korbit to review your PR using the
/korbit-review
command in a comment at the root of your PR.- You can ask Korbit to generate a new PR description using the
/korbit-generate-pr-description
command in any comment on your PR.- Too many Korbit comments? I can resolve all my comment threads if you use the
/korbit-resolve
command in any comment on your PR.- Chat with Korbit on issues we post by tagging @korbit-ai in your reply.
- Help train Korbit to improve your reviews by giving a 👍 or 👎 on the comments Korbit posts.
Customizing Korbit
- Check out our docs on how you can make Korbit work best for you and your team.
- Customize Korbit for your organization through the Korbit Console.
Current Korbit Configuration
General Settings
Setting Value Review Schedule Automatic excluding drafts Max Issue Count 10 Automatic PR Descriptions ❌ Issue Categories
Category Enabled Naming ✅ Database Operations ✅ Documentation ✅ Logging ✅ Error Handling ✅ Systems and Environment ✅ Objects and Data Structures ✅ Readability and Maintainability ✅ Asynchronous Processing ✅ Design Patterns ✅ Third-Party Libraries ✅ Performance ✅ Security ✅ Functionality ✅ Feedback and Support
Note
Korbit Pro is free for open source projects 🎉
Looking to add Korbit to your team? Get started with a free 2 week trial here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First pass comments
current_user=get_current_user(), | ||
chart_id=target.id, | ||
force=True, | ||
current_user=get_current_user(), chart_id=target.id, force=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'd maybe be ok with forcefully invalidating thumbs on chart/dashboard change, but recaching it? I'd propose adding a config flag THUMBNAIL_ON_CHANGE: "invalidate" | "recalculate" | None
for admins to be able to specify if they want to just ignore changes, invalidate or recache every time the object changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
discussed on slack - out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second pass comments. I'd propose breaking the status check into a method of CachePayload
to avoid duplication of the logic. Something like .should_trigger_task()
that returns bool
based on status, error TTL etc.
superset/utils/screenshots.py
Outdated
class ScreenshotCachePayload: | ||
def __init__(self, image: bytes | None = None): | ||
self._image = image | ||
self._timestamp = datetime.now().strftime("%Y/%m/%d-%H:%M:%S") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I'd prefer not to use a custom format here. Let's just use isoformat()
self._timestamp = datetime.now().strftime("%Y/%m/%d-%H:%M:%S") | |
self._timestamp = datetime.now().isoformat() |
superset/utils/screenshots.py
Outdated
self.status = StatusValues.UPDATED | ||
|
||
def update_timestamp(self) -> None: | ||
self._timestamp = datetime.now().strftime("%Y/%m/%d-%H:%M:%S") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
self._timestamp = datetime.now().strftime("%Y/%m/%d-%H:%M:%S") | |
self._timestamp = datetime.now().isoformat() |
superset/charts/api.py
Outdated
error_cache_ttl = config["THUMBNAIL_ERROR_CACHE_TTL"] | ||
error_cache_expired = ( | ||
datetime.now() | ||
- datetime.strptime(cache_payload.get_timestamp(), "%Y/%m/%d-%H:%M:%S") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
- datetime.strptime(cache_payload.get_timestamp(), "%Y/%m/%d-%H:%M:%S") | |
- datetime.fromisoformat(cache_payload.get_timestamp()) |
superset/charts/api.py
Outdated
202: | ||
description: Chart async result | ||
description: Chart async task created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry about the back and forth here, but this seems more appropriate
description: Chart async task created | |
description: Chart screenshot task created |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few last non-blocking comments. Great to see this land before 5.0 cut, this will save considerable compute where enabled!
superset/utils/screenshots.py
Outdated
def is_error_cache_ttl_expired() -> bool: | ||
error_cache_ttl = app.config["THUMBNAIL_ERROR_CACHE_TTL"] | ||
return ( | ||
datetime.now() - datetime.fromisoformat(self.get_timestamp()) | ||
).total_seconds() > error_cache_ttl |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there some reason we're nesting this? Couldn't it just be a regular method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I figured that we don't use it anywhere else and we only need to check it inside of should_trigger_task
method. I can make it a regular method or a private one instead of nested if you think it's better
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer a separate method, as redefining it each time will incur some extra overhead and makes the main function slightly less readable.
superset/charts/api.py
Outdated
task_status=cache_payload.get_status(), | ||
) | ||
|
||
if cache_payload.should_trigger_task() or force: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very very nit, but flipping this around will likely save some CPU. To DRY it even more, we could maybe even consider making force
an optional parameter in should_trigger_task
that defaults to False
(not highly opinionated here)
if cache_payload.should_trigger_task() or force: | |
if force or cache_payload.should_trigger_task(): |
superset/dashboards/api.py
Outdated
|
||
def trigger_celery() -> WerkzeugResponse: | ||
if cache_payload.should_trigger_task() or force: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
if cache_payload.should_trigger_task() or force: | |
if force or cache_payload.should_trigger_task(): |
SUMMARY
Both the
/api/v1/chart/{{id}}/cache_screenshot/
and/api/v1/dashboard/{{id}}/cache_dashboard_screenshot/
endpoints enable generating and retrieve a screenshot from the respective resource.The endpoints accept sending force=true in the query parameters, but currently this parameter does nothing. It should instead invalidate the previous image and generate a new one.
This PR introduces a fix as well as some improvements to the functioning of the thumbnail cache, such that the status of the screenshot can be tracked by the relevant api endpoints as well.
Rather than storing plain bytes objects in the cache, this PR introduces a new class
ScreenshotCachePayoad
which can store the image bytes as well astimestamp
andstatus
values. Status values includePending
: screenshot is awaiting being processedComputing
: screenshot job has been picked up by the worker and is processingUpdated
: screenshot has been updatedError
: an error occured while the webdriver was processing the screenshot task.While in a
Computing
orUpdated
state, requests to regenerate the screenshot will be blocked unless theforce
parameter is set toTrue
Additional
update_status
andupdated_at
fields have been added to 202 response payload for endpoints as follows:How to reproduce the bug
Create a chart.
Perform a modification in the data.
Send a GET request to /api/v1/chart/{{id}}/cache_screenshot/?q=(force:!t).
Access the URL returned in the image_url value.
BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
TESTING INSTRUCTIONS
tests have been added in the
tests/unit_tests/utils/screenshot_tests.py
For manual testing
/api/v1/chart/<pk>/cache_screenshot/
endpoint. Ensure that a 202 response is received./api/v1/chart/cache_screenshot/<pk>/?q=(force:!t)
to ensure that the status changes to "Pending"Repeat the steps for the
POST /api/v1/dashboard/<pk>/cache_dashboard_screenshot/
GET /api/v1/chart/<pk>/thumbnail/<digest>
endpoint to ensure that a 202 response is received.GET /api/v1/dashboard/<pk>/thumbnail/<digest>
ADDITIONAL INFORMATION