Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(export): export generates unnecessary files content #26765

Conversation

Always-prog
Copy link
Contributor

SUMMARY

Currently, exporting models functionality is slower than it can be.

Model export is slow because models export their related models, and since related models cannot check if they have already been exported to the final archive, they are exported anyway. This leads to generating unnecessary export model files.
For example, exporting a dashboard with 2 charts will generate 4 files - a dashboard, two charts, a dataset, and a database, but ““behind the scenes” there will actually be 7 files generated, since each chart will re-export the dataset and database, and at the end, the highest-level export model (dashboard export) will remove these duplicates.

To solve this problem, it is proposed to start generating export files not immediately, but only when they are written to a file, by convert file_content to function that generates content when we call it.

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

pay attention to speed up of generating an dashboard export archive!

Before

without-perf.mp4

After

with-perf.mp4

TESTING INSTRUCTIONS

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

@geido
Copy link
Member

geido commented Jan 25, 2024

Hello @Always-prog thanks for the PR! Would you mind taking a look at the failing CI steps?

Copy link

codecov bot commented Jan 26, 2024

Codecov Report

Attention: Patch coverage is 87.82609% with 14 lines in your changes missing coverage. Please review.

Project coverage is 69.51%. Comparing base (4796484) to head (9d2e16e).
Report is 1562 commits behind head on master.

Files with missing lines Patch % Lines
superset/commands/dashboard/export.py 59.25% 11 Missing ⚠️
superset/commands/export/models.py 80.00% 2 Missing ⚠️
superset/commands/export/assets.py 96.15% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #26765      +/-   ##
==========================================
+ Coverage   67.18%   69.51%   +2.33%     
==========================================
  Files        1900     1900              
  Lines       74443    74491      +48     
  Branches     8293     8293              
==========================================
+ Hits        50012    51785    +1773     
+ Misses      22376    20651    -1725     
  Partials     2055     2055              
Flag Coverage Δ
hive 53.82% <46.08%> (?)
mysql 78.04% <78.26%> (+0.01%) ⬆️
postgres 78.14% <78.26%> (+0.01%) ⬆️
presto 53.77% <46.08%> (?)
python 83.09% <87.82%> (+4.83%) ⬆️
sqlite 77.66% <78.26%> (+0.01%) ⬆️
unit 56.49% <60.00%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@Always-prog
Copy link
Contributor Author

@betodealmeida Hi! Thanks for review! I have committed fixes by your comments.

But the only thing that pre-commit fails due to file .github/workflows/update-monorepo-lockfiles.yml, which I have not fixed.

@geido
Copy link
Member

geido commented Feb 7, 2024

Hi @Always-prog it seems you have a conflicting file. Can you fix it, please? I am adding myself as a reviewer and checking back on this asap. Thank you!

@geido geido self-requested a review February 7, 2024 17:17
@geido
Copy link
Member

geido commented Feb 12, 2024

👀

@github-actions github-actions bot added the api Related to the REST API label Feb 13, 2024
@Always-prog
Copy link
Contributor Author

@geido Hi! Rebased!

@geido
Copy link
Member

geido commented Feb 19, 2024

/testenv up

Copy link
Contributor

@geido Ephemeral environment spinning up at http://54.191.84.78:8080. Credentials are admin/admin. Please allow several minutes for bootstrapping and startup.

Copy link
Member

@michael-s-molina michael-s-molina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested the PR and was able to successfully export/import dashboards.

Thank you for the work @Always-prog, the performance improvements are impressive!

@Always-prog
Copy link
Contributor Author

@michael-s-molina Thank you for testing my PR. Can I get merge?

@rusackas rusackas merged commit 2e4f6d3 into apache:master Feb 21, 2024
40 checks passed
Copy link
Contributor

Ephemeral environment shutdown and build artifacts deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Related to the REST API 🏷️ bot A label used by `supersetbot` to keep track of which PR where auto-tagged with release labels size/L 🚢 4.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants