You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Convert the two CellProfiler SQLite outputs into parquet and merge single cells using CytoTable convert.
Merge the two parquet files into one using pandas concat because the two outputs should be the same file but were split due to a power outage stopping the CellProfiler run.
Annotate the new combined parquet file with Pycytominer annotate
Perform normalization with Pycytominer normalize
Perform feature selection with Pycytominer feature_select
When the two parquet files were merged, the new parquet file is 13.1GB:
When attempting to run the scripts as described above, the kernel would be killed when attempting to run the annotate function:
This means that this function attempted to use about 102GB, while I only have about 49GB.
After talking with @axiomcura, he believes the issue might be arising in this part of the annotate function:
My apologies; I forgot to add the code that I was using. Here it is:
# add metadata from platemap file to extracted single cell featuresannotated_df=annotate(
profiles=single_cell_df,
platemap=platemap_df,
join_on=["Metadata_well_id", "Image_Metadata_Well"],
)
# move metadata well and single cell count to the front of the df (for easy visualization in python)well_column=annotated_df.pop("Metadata_Well")
singlecell_column=annotated_df.pop("Metadata_number_of_singlecells")
# insert the column as the second index column in the dataframeannotated_df.insert(1, "Metadata_Well", well_column)
annotated_df.insert(2, "Metadata_number_of_singlecells", singlecell_column)
# save annotated df as parquet fileoutput(
df=annotated_df,
output_filename=output_file,
output_type="parquet",
)
I assumed that the pd.pop nor the pd.insert would cause for the memory issue. Output also doesn't seem like the culprit but I am open to ideas.
This issue is related to issue #233.
I created a group of Python files to:
convert
.annotate
normalize
feature_select
When the two parquet files were merged, the new parquet file is 13.1GB:
When attempting to run the scripts as described above, the kernel would be killed when attempting to run the
annotate
function:This means that this function attempted to use about 102GB, while I only have about 49GB.
After talking with @axiomcura, he believes the issue might be arising in this part of the annotate function:
Machine info
@gwaybio @d33bs
The text was updated successfully, but these errors were encountered: