-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Catching a CrateDB fluke: ShardCollectContext for {0,2} already added
#53
Comments
ShardCollectContext for 0 already added
Hi, I don't have permissions to assign myself. I will look into this issue. |
Thank you, I've just assigned you. Please note that it was only a single event, and it hasn't been observed ever after. Maybe a real SEU? So, you will have to evaluate if this is important / feels serious, or not. If you decide that it may actually have been an SEU, or if you can't find anything suspicious, let's just close the issue again. |
Maybe relates to crate/crate#11677 |
Another spot on behalf of GH-64. |
could it be that we need to have an |
Thank you for watching this conversation, Marios.
I am not sure what you are particularly referring to with |
Just to add more context here, to improve the original report: The flaw is apparently not happening on CrateDB startup, but at regular runtime.
|
I'm probably wrong, I was thinking to check that all shards of a table are allocated before proceeding with statements, but we use only one cratedb node, right? |
The output I've shared above clearly indicates that it is happenening at the very same test case function This observation indicates that CrateDB might not always be ready to accept a We can try, and it may qualify as a workaround. However, you may also want to address this on behalf of CrateDB itself, when applicable. Footnotes
|
This is correct, the issue is about two flukes reported from pedantically observing CI runs where CrateDB Nightly is used purposely. |
To add more information about chronology: It has first happened here at GH-52, on Wed, 01 Nov 2023, with this release 1:
The second spot was:
Footnotes
|
It happened again, this time on a CI run triggered by a PR submitted by Dependabot. OriginSpots |
ShardCollectContext for 0 already added
ShardCollectContext for {0,2} already added
Now, it is about
|
Once more.
|
That patch will increase CrateDB's heap size on CI, in order to explore whether the problem originates from low-memory situations. |
Indeed. Decreasing heap size triggers the issue right away. In this spirit, creating a reproducer will be much easier. docker run --rm -it --name=cratedb \
--publish=4200:4200 --publish=5432:5432 \
--env=CRATE_HEAP_SIZE=256m \
crate/crate:nightly -Cdiscovery.type=single-node time pytest -vvv -k "test_search_runs_returns_expected_results_with_large_experiment or test_search_runs_run_id" mlflow.exceptions.MlflowException: (crate.client.exceptions.ProgrammingError) SQLParseException[ShardCollectContext for 0 already added]
[SQL: DELETE FROM metrics]
(Background on this error at: https://sqlalche.me/e/20/f405) |
The issue has been reported to the upstream crate/crate repository. |
Coming from crate/crate#15518 (comment), and looking at recent nightly scheduled job executions of https://github.com/crate-workbench/mlflow-cratedb/actions, it looks like this problem has been mitigated. Therefore, I am closing this. Thanks, @jeeminso. |
Report
While trying to bring in GH-52, we caught an unusual error from CrateDB we haven't seen before.
It is happening on a
DELETE FROM
SQL statement.Details
The
DELETE FROM metrics
is happening within a regular integration test scenario on the canonicalsetUp()
method calling out toself.pruneTables()
, in order to supply the test cases with a blank canvas. This is nothing special, we have been doing it like this for a while already, on behalf of different test suites we are maintaining. Also note it was really only a fluke: Re-running the test cases made them succeed on the first attempt already.-- https://github.com/crate-workbench/mlflow-cratedb/actions/runs/6721484528/job/18267317481?pr=52#step:6:491
Thoughts
As we are using CrateDB nightly for all of our downstream integration tests, it may be a regression introduced just recently.
/cc @matriv, @seut
The text was updated successfully, but these errors were encountered: