-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New cyto tool: create cell locations file #257
Conversation
python \
pycytominer/cyto_utils/cell_locations_cmd.py \
--input_parquet_file pycytominer/tests/test_data/cell_locations_example_data/load_data_with_illum_subset.parquet \
--sqlite_file pycytominer/tests/test_data/cell_locations_example_data/BR00126114_subset.sqlite \
--output_parquet_file \
~/Desktop/load_data_with_illum_and_cell_location_subset.parquet |
Codecov Report
@@ Coverage Diff @@
## master #257 +/- ##
==========================================
- Coverage 95.71% 95.47% -0.25%
==========================================
Files 53 57 +4
Lines 2826 3048 +222
==========================================
+ Hits 2705 2910 +205
- Misses 121 138 +17
Flags with carried forward coverage won't be shown. Click here to find out more.
... and 3 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
This can be run on command line using metadata_input="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/load_data_csv/2021_08_23_Batch12/BR00126114/load_data_with_illum_subset.parquet"
single_single_cell_input="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/backend/2021_08_23_Batch12/BR00126114/BR00126114_subset.sqlite"
augmented_metadata_output="s3://cellpainting-gallery/test-cpg0016-jump/source_4/workspace/load_data_csv/2021_08_23_Batch12/BR00126114/load_data_with_illum_and_cell_location_subset.parquet"
python \
pycytominer/cyto_utils/cell_locations_cmd.py \
--metadata_input ${metadata_input} \
--single_cell_input ${single_single_cell_input} \
--augmented_metadata_output ${augmented_metadata_output} \
add_cell_location |
Currently, one of the test fixtures ( For now, import boto3
from typing import List
from moto import mock_s3
from moto.server import ThreadedMotoServer
# moto patterns are from
# https://github.com/cytomining/CytoTable/blob/415f2ecd94b66c979bfda595448264089f9a1c80/tests/conftest.py
# credits: @gwaygenomics and @d33b
@pytest.fixture(scope="session", name="s3_session")
def fixture_s3_session() -> boto3.session.Session:
"""
Yield a mocked boto session for s3 tests.
"""
# start a moto server for use in testing
server = ThreadedMotoServer()
server.start()
with mock_s3():
yield boto3.session.Session()
@pytest.fixture(name="example_local_sources")
def fixture_example_local_sources() -> List[pathlib.Path]:
"""
Provide a list of example sources
"""
return [
pathlib.Path(__file__).parent.parent
/ "test_data"
/ "cell_locations_example_data"
/ "BR00126114_subset.sqlite",
pathlib.Path(__file__).parent.parent
/ "test_data"
/ "cell_locations_example_data"
/ "load_data_with_illum_subset.parquet",
]
@pytest.fixture()
def example_s3_endpoint(
s3_session: boto3.session.Session,
example_local_sources: List[pathlib.Path],
) -> str:
"""
Create a mocked bucket which includes example sources
"""
# s3 is a fixture defined above that yields a boto3 s3 client.
# Feel free to instantiate another boto3 S3 client -- Keep note of the region though.
endpoint_url = "http://localhost:5000"
bucket_name = "example"
# create s3 client
s3_client = s3_session.client("s3", endpoint_url=endpoint_url)
# create a bucket for content to land in
s3_client.create_bucket(Bucket=bucket_name)
# upload each example file to the mock bucket
for source_path in example_local_sources:
s3_client.upload_file(
Filename=str(source_path),
Bucket=bucket_name,
Key=source_path.name,
)
# return endpoint url for use in testing
return endpoint_url
@pytest.fixture(name="single_cell_input_file_s3")
def fixture_single_cell_input_file_s3(example_s3_endpoint) -> str:
"""
Provide a single cell input file for cell_locations test data
"""
return f"{example_s3_endpoint}/example/BR00126114_subset.sqlite" |
Meanwhile, @Arkkienkeli, can you inspect the parquet file attached in #257 (comment) to see if this kind of output works for you (for DeepProfiler) instead of creating a locations file? That is, if such a (parquet) file were available, would you no longer need to create a locations file? See the logic in https://github.com/shntnu/pycytominer/blob/bc2f01dfa0252bba5f3aa04cd8488f8dc7fe5de9/pycytominer/cyto_utils/cell_locations.py#L14-L38 to figure out what's happening |
Hi @shntnu, In the example you shared,
So either we update DeepProfiler or I will need to convert those files into our format. |
Thank you so much for the detailed review -- I learned a lot! I've resolved all your comments except this one, related to
I skimmed the SingleCells class, and my initial impression was that the overlaps were fairly minimal. But it's possible that refactoring For example, some of these Move to a utils module:
Move to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! Thank you for making changes and acknowledging my comments @shntnu ! I've provided some additional comments and would like to request further changes. I especially feel the example in the readme and anonymous S3 access items should be addressed (or documented).
Responding to your comment:
"But it's possible that refactoring SingleCells would expose more reusable functionality that CellLocations could benefit from."
Thank you for addressing this! Based on what you mentioned I feel this may be reaching the limits or out of scope for this PR. Perhaps this is a new issue for the project to explore later on?
Co-authored-by: Dave Bunten <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for the changes and for raising the additional issues @shntnu ! I only had one comment which could benefit from feedback with this review and overall felt this LGTM.
Your feedback was really motivating!
All set now |
@d33bs If there's nothing further, please go ahead and merge and we'll launch a bunch of jobs that are waiting on this. Thank you once again for all the feedback along the way! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
modified from EmbeddedArtistry
Description
This introduces a new tool to append X, Y locations of all cells in an image to each row of a LoadData CSV file. This will make it a lot easier for deep learning feature extraction workflows to do their thing (e.g. DeepProfiler will no longer need to download single-cell SQLite files to extract locations, etc.)
What is the nature of your change?
Checklist
Please ensure that all boxes are checked before indicating that a pull request is ready for review.