Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TSML: Error in timeseries-anomaly-detection.ipynb #426

Closed
amotl opened this issue Apr 18, 2024 · 5 comments · Fixed by #425
Closed

TSML: Error in timeseries-anomaly-detection.ipynb #426

amotl opened this issue Apr 18, 2024 · 5 comments · Fixed by #425
Labels
bug Something isn't working

Comments

@amotl
Copy link
Member

amotl commented Apr 18, 2024

Problem

The timeseries-anomaly-detection.ipynb notebook errors out, both on Python 3.10 and 3.11 12.

ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by SimpleImputer.

Observations

Because it happens on both versions of Python, it is most probably unrelated to the change per se where it started tripping.

Thoughts

Most probably another dependency flaw?

Footnotes

  1. https://github.com/crate/cratedb-examples/actions/runs/8743815972/job/23995257165?pr=425#step:6:1505

  2. https://github.com/crate/cratedb-examples/actions/runs/8743815972/job/23995257591?pr=425#step:6:1350

@amotl amotl added the bug Something isn't working label Apr 18, 2024
@amotl
Copy link
Member Author

amotl commented Apr 18, 2024

Observations

scikit-learn 1.4.2 was released on Apr 9, 2024. Is it related?

-- https://pypi.org/project/scikit-learn/1.4.2/#history

Thoughts

If it is, the reason why the corresponding CI job did not fail before more prominently, on the nightly runs to validate functionality, is most probably because dependencies are configured to be cached when the local requirements files do not change.

In this case, the nightly CI jobs do not catch updates to transitive dependencies not enumerated locally, and thus, do not hold up to their promise to give you a constant piece of mind in "on stage" situations. In this spirit, what is reflected on the Build Status page, might not convey the whole truth, and I am sad about it.

/cc @marijaselakovic, @ckurze, @hammerhead, @simonprickett

@amotl
Copy link
Member Author

amotl commented Apr 18, 2024

I am able to confirm this error on my workstation, using Python 3.11.

source .venv/bin/activate
pip install --upgrade scikit-learn
cd topic/timeseries
pytest -k timeseries-anomaly-detection.ipynb
ValueError: Found array with 0 sample(s) (shape=(0, 1)) while a minimum of 1 is required by SimpleImputer.

However, I am also seeing this one, where the second one might actually be a follow-up error.

ProgrammingError: (crate.client.exceptions.ProgrammingError) RelationAlreadyExists[Relation 'notebook.machine_data' already exists.]
[SQL: CREATE TABLE machine_data ("timestamp" TIMESTAMP, "value" DOUBLE PRECISION)]

@amotl
Copy link
Member Author

amotl commented Apr 18, 2024

On behalf of GH-425, the RelationAlreadyExists error has been fixed with fdb91dd, but, despite downgrading scikit-learn using d244f43, the array shape error is still there, but only on Python 3.10 now, and only on CI. On my workstation, software tests also succeed using Python 3.10.13.

-- https://github.com/crate/cratedb-examples/actions/runs/8744363838/job/23997048918?pr=425#step:6:951

@amotl
Copy link
Member Author

amotl commented Apr 18, 2024

Taking a closer look, ValueError: Found array with 0 sample(s) may also convey it is related to CrateDB's eventual consistency, so ab42144 adds a relevant REFRESH TABLE "tablename"; SQL statement, in order to synchronize writes.

@amotl
Copy link
Member Author

amotl commented Apr 18, 2024

Indeed, it apparently has been the missing REFRESH TABLE statement, so writes have not been synchronized, so the result was not visible by subsequent querying statements. Apparently, it is not related to scikit-learn 1.4.2 at all. GH-425 will improve the situation. d244f43 has been removed again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant