Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/exe 1989 iterative leiden #188

Merged
merged 33 commits into from
Oct 2, 2024
Merged

Conversation

ptajvar
Copy link
Contributor

@ptajvar ptajvar commented Sep 13, 2024

Description

To improve identification of potential technical multiplets, we are now implementing an iterative Leiden step where instead of a single leiden run on the entire sample, we run leiden iteratively. During each pass, if the subgraph is broken down into multiple communities the new communtities will be added to a queue to go through community detection again. This process goes on until a the subgraphs are no longer broken down into communities or a maximum depth given by "leiden_iterations" is reached.

In addition, the component names will now be generated using a hash of their pixels' barcodes and the generated components are deterministic. "components_modularity" is also now removed from edgelist_metrics as it is computationally expensive and the information is not frequently used.

Fixes: EXE-1989, EXE-1884, EXE-1979, EXE-2015

Type of change

Please delete options that are not relevant.

  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

How Has This Been Tested?

The existing unit tests pass. The new graph step is run on multiple samples confirming intended detection of potential multiplets.

PR checklist:

  • This comment contains a description of changes (with reason).
  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • If a new tool or package is included, I have updated poetry.lock, and cited it properly
  • I have checked my code and documentation and corrected any misspellings
  • I have documented any significant changes to the code in CHANGELOG.md

@ptajvar ptajvar force-pushed the feature/exe-1989-iterative-leiden branch from 0602bb2 to 1e44d02 Compare September 13, 2024 12:51
Copy link
Contributor

@johandahlberg johandahlberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. I had some suggestions. Just let me know if you have any questions.

communities_graph = nx.from_edgelist(connected_communities)
for cc in nx.connected_components(communities_graph):
community_serie[community_serie.isin(cc)] = min(cc)
return edgelist, community_serie


def recover_technical_multiplets(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function could benefit from being broken up into some smaller private functions to help a bit with readability.

@ptajvar ptajvar force-pushed the feature/exe-1989-iterative-leiden branch from 61caad4 to 744aa90 Compare September 17, 2024 12:27
@ptajvar ptajvar marked this pull request as ready for review September 18, 2024 08:10
Copy link
Contributor

@ambarrio ambarrio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As reviewed live I give you mysuggestions, they are mostly documentation and remarks.

@ptajvar ptajvar requested a review from ludvigla September 30, 2024 11:10
Copy link
Contributor

@ludvigla ludvigla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really good! Just some minor comments

@@ -32,7 +38,7 @@ def connect_components(
sample_name: str,
metrics_file: str,
multiplet_recovery: bool,
leiden_iterations: int = 10,
max_refinement_recursion_depth: int = 5,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 5 or 10 shouldn't make a huge difference. Perhaps it's a good idea to stick with this lower threshold of 5 for now and then we can evaluate whether this thresholds needs to be refined later.

@ptajvar ptajvar merged commit 2a43b72 into dev Oct 2, 2024
14 checks passed
@ptajvar ptajvar deleted the feature/exe-1989-iterative-leiden branch October 2, 2024 08:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants