Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

patch psutil virtual memory for test suite in CI #424

Closed
wants to merge 1 commit into from

Conversation

mattseddon
Copy link
Member

@mattseddon mattseddon commented Sep 11, 2024

Related to #405 / #406

I have a theory that in the Windows CI psutil.virtual_memory().percent > 80 is being hit often in process_udf_outputs (which is executed under parallel mode). Code is below:

def process_udf_outputs(
warehouse: "AbstractWarehouse",
udf_table: "Table",
udf_results: Iterator[Iterable["UDFResult"]],
udf: UDFBase,
batch_size: int = INSERT_BATCH_SIZE,
cb: Callback = DEFAULT_CALLBACK,
) -> None:
rows: list[UDFResult] = []
# Optimization: Compute row types once, rather than for every row.
udf_col_types = get_udf_col_types(warehouse, udf)
for udf_output in udf_results:
if not udf_output:
continue
for row in udf_output:
cb.relative_update()
rows.append(adjust_outputs(warehouse, row, udf_col_types))
if len(rows) >= batch_size or (
len(rows) % 10 == 0 and psutil.virtual_memory().percent > 80
):
for row_chunk in batched(rows, batch_size):
warehouse.insert_rows(udf_table, row_chunk)
rows.clear()
if rows:
for row_chunk in batched(rows, batch_size):
warehouse.insert_rows(udf_table, row_chunk)

This would mean that we are concurrently writing to the warehouse more often, giving the UDF far more chance to fail due to SQLite locks.

This PR patches psutil.virtual_memory to be automatically pinned at 50% utilised for the entirety of the test suite.

Copy link

cloudflare-workers-and-pages bot commented Sep 11, 2024

Deploying datachain-documentation with  Cloudflare Pages  Cloudflare Pages

Latest commit: 29c28a7
Status: ✅  Deploy successful!
Preview URL: https://2369a79a.datachain-documentation.pages.dev
Branch Preview URL: https://ensure-rows.datachain-documentation.pages.dev

View logs

Copy link

codecov bot commented Sep 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 87.32%. Comparing base (aab3885) to head (29c28a7).
Report is 10 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #424   +/-   ##
=======================================
  Coverage   87.32%   87.32%           
=======================================
  Files          92       92           
  Lines        9986     9986           
  Branches     2041     2041           
=======================================
  Hits         8720     8720           
  Misses        911      911           
  Partials      355      355           
Flag Coverage Δ
datachain 87.27% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@mattseddon mattseddon self-assigned this Sep 11, 2024
@mattseddon mattseddon marked this pull request as ready for review September 11, 2024 05:05
@mattseddon mattseddon requested a review from a team September 11, 2024 05:05
@skshetry
Copy link
Member

@mattseddon, thank you for investigating. Can you demonstrate that this fixes the timing-out issue?

Although it looks like a good thing that we should do anyway.

@mattseddon
Copy link
Member Author

@mattseddon, thank you for investigating. Can you demonstrate that this fixes the timing-out issue?

Although it looks like a good thing that we should do anyway.

@skshetry PTAL at #427 it looks like those changes should fix the test suite running against Windows. I think the main thing is moving all of the tests that use _tmpfile fixtures onto the same worker.

@mattseddon
Copy link
Member Author

closing in favour of #427

@mattseddon mattseddon closed this Sep 11, 2024
@mattseddon mattseddon deleted the ensure-rows branch September 11, 2024 23:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants