Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Visualisation] Baysor-like output #69

Open
jpark27 opened this issue Nov 30, 2024 · 3 comments
Open

[Visualisation] Baysor-like output #69

jpark27 opened this issue Nov 30, 2024 · 3 comments

Comments

@jpark27
Copy link

jpark27 commented Nov 30, 2024

Hi, @EliHei2 ! Hope you been well.

I've been testing segger-dev on in-house Xenium dataset as purpose of head-to-head comparison with Baysor.
I have final output like following but somehow missing any polygon-masks.

segment( model, dm, save_dir='benchmarks', seg_tag='segger_embedding_1001', transcript_file='transcripts.parquet', receptive_field=receptive_field, min_transcripts=5, cell_id_col='segger_cell_id', use_cc=False, knn_method='cuda', verbose=True, )
image

I wonder is there any quick way to convert current segger output into polygon-mask as Baysor folks providing?
[segmentation_borders.html]
image

Any guidancd would be much appreciated!
J

@EliHei2
Copy link
Owner

EliHei2 commented Dec 1, 2024

Hey @jpark27 thanks for reaching out. you can use the boundary module in segger.validation as the following to generate non-convex cell boundaries, it's the same algorithm impelmented by baysor.

from segger.prediction.boundary import generate_boundary
import geopandas as gpd
import dask.dataframe as dd
from tqdm import tqdm
from pqdm.processes import pqdm  # or use pqdm.threads for threading-based parallelism

ddf = dd.read_parquet('path/to/segger_transcripts.parquet')

# Modify the function to work with a single group to use with pqdm
def process_group(group):
    cell_id, t = group
    return {
        "cell_id": cell_id,
        "length": len(t),
        "geom": generate_boundary(t, x="x_location", y="y_location")
    }

def generate_boundaries(df, x="x_location", y="y_location", cell_id="segger_cell_id", n_jobs=10):
    # Group by cell_id
    group_df = df.groupby(cell_id)
    # Use pqdm to process each group in parallel
    results = pqdm(tqdm(group_df, desc="Processing Groups"), process_group, n_jobs=n_jobs)
    # Convert results to GeoDataFrame
    return gpd.GeoDataFrame(
        data=[[res["cell_id"], res["length"]] for res in results],
        geometry=[res["geom"] for res in results],
        columns=["cell_id", "length"],
    )

bb = generate_boundaries(ddf, x="x_location", y="y_location", cell_id="segger_cell_id", n_jobs=8)

@jpark27
Copy link
Author

jpark27 commented Dec 2, 2024

Hi, @EliHei2! Many thanks for suggestion.
I followed your attached script and come across with following error that I couldn't resolve yet. Any chance you notice it before on current version of segger-dev output?

image
image

@joaolsf
Copy link

joaolsf commented Jan 17, 2025

Hi. @EliHei2 , I am running the 'Introduction to Segger" from the Tutorial section with the xenium example data. I had the exact same issue as @jpark27 when trying to plot the boundaries. I checked the values in the columns of the segger_transcripts.parquet file (loading with pandas read_parquet function) and the columns 'score', 'segger_cell_id' and "bound' had either NaN or None values across all rows I checked (including rows with assigned cell_IDs). Could this be the reason of the issue? If so, is something wrong when running the segment function that it is not outputting the segger_cell_IDs into the parquet file?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants