Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Consistent access to genome ranges #5

Open
mnshgl0110 opened this issue Nov 8, 2022 · 1 comment
Open

Feature Request: Consistent access to genome ranges #5

mnshgl0110 opened this issue Nov 8, 2022 · 1 comment

Comments

@mnshgl0110
Copy link
Member

Currently, the pansyn object treats the reference genome differently then the other genomes. Consequently, different steps are required to access the coordinates for the reference and other genomes.

print(df.iloc[0][0])
Pansyn(Range(chm13, NC_060946.1, NaN, 10588391, 12801843), {'mat': Range(mat, CM039032.1, NaN, 3783261, 5889298), 'pat': Range(pat, CM039055.1, NaN, 1, 1723416)})

Is there any specific region for this? If not, then would not it be better to treat all genomes similarly and allow consistent parsing scheme? I guess, this would also be useful when we start identifying crossyn between query genomes only.

@lrauschning
Copy link
Collaborator

Hi Manish,
the reason for this is that the reference genome has a special role in the synteny intersection algorithm, as its the genome the regions are "joined" on.
The way I thought of handling reference-free multisynteny calling would be to leave the reference field as None – which will not work with the intersection algorithm as it is implemented at the moment as that is inherently reference-based, but it would be possible to adapt that.
I've been thinking of writing an associated function that shifts the reference Range into the main dict and leaves it empty to seamlessly transition between reference-based and non-reference-based identification.
A reverse function taking an organism from the dict and shifting it to the ref field could then be used for adapting to reference-free synteny intersection, perhaps imputing the cigar strings for each entry in the Cigars dict.
Do you think this approach makes sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants