-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Duckdb's TypeMismatchException
raised in CytoTable's convert() workflow due to nan
values being stored as strings instead of expected types
#38
Comments
related to cytomining/pycytominer#79 |
Thank you for raising this @axiomcura ! This is worth considering for CytoTable. Contextually it's related to conversations in cytomining/pycytominer#198 (comment) which resulted in work found within "Like nulls" is a reference to data values which look like null-types but are actually strings which have found their way into numeric type columns (SQLite allows this flexibility). The CytoTable might need to use |
Hi @d33bs, @MattsonCam and I are getting this same error when downloading SQLite files from AWS and converting to parquet files. CodeWe are using this file here, but below are the exact %%time
what = cytotable.convert(
source_path="/".join(manifest_df.sqlite_file[2].split("/")[0:-1]),
dest_path="test2.parquet",
dest_datatype="parquet",
chunk_size=150000,
parsl_config=parsl_config,
# changed preset to this since compartments don't use prefix, but the CP version is not the same
preset="cellprofiler_sqlite"
) OutputThe error we are receiving is as follows: TypeMismatchException: Mismatch Type Error: Invalid type in column "Cells_Correlation_K_DNA_Mito": expected float or integer, found "nan" of type "text" instead. SolutionThere is no solution we can come up with at this time since even if we were to download all SQLite files from AWS onto our local machine, we would still have this error. We will likely have to use pycytominer We hope to see a solution to this and are happy to explain more of the issue! |
Hi @jenna-tomkinson , thank you for adding to this issue, and sorry to hear this is giving you and @MattsonCam trouble. This hasn't been yet resolved with code additions. There are some open code changes related to this which seek to resolve the issue in #50. This work hasn't been yet merged into |
We just merged #50! 🎉 @MattsonCam - if you're able, please test the newest version and report back if it solves this issue. We can then close it :) |
@MattsonCam - were you able to test the newest version? Can we close this issue? |
Hi @MattsonCam, @jenna-tomkinson, and @axiomcura - I just wanted to double check on this. Do you know if this issue may be closed (or does the challenge still occur)? I'm also working on validating this as well but it's taking some time to process due to the large data size (will follow up). |
I was able to confirm this is now addressed with a completed CytoTable run on Please note: the confirmation relies on an incoming change found within #168 (which addresses a separate issue related to completion of data processing and not data type processing errors). Thanks again @axiomcura, @jenna-tomkinson, and @MattsonCam for your help with addressing this issue! Closing it for now. Please don't hesitate to reopen or reach out if you have any questions. |
CytoTable
'sconvert()
function seems to capturenan
's as string types within the cell-helath dataset causingduckdb
to raiseduckdb.TypeMismatchException
error.Below is the code to replicate the problem:
link to download data
Traceback
From Prefect:
It seems that the second exception being raised by
Prefect
is caused by the previous exception thrown byduckdb
, which prevents it to change the state of the data.The text was updated successfully, but these errors were encountered: