Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show the links and add results inspection #33

Open
gcroci2 opened this issue Sep 13, 2024 · 4 comments · May be fixed by #39
Open

Show the links and add results inspection #33

gcroci2 opened this issue Sep 13, 2024 · 4 comments · May be fixed by #39
Assignees

Comments

@gcroci2
Copy link
Contributor

gcroci2 commented Sep 13, 2024

The last part of the genomics -> metabolomics tab consists in showing the links to the metabolomics data and adding the results inspection:

Screenshot 2024-09-13 at 15 01 46
  • Move the button "Set Scoring" outside of the scoring container and rename it as e.g. "Show results".
  • If no rows are selected, clicking "Show spectra" will show a warning message instead of the candidate links table.

Blocked by #32

@gcroci2 gcroci2 added this to dev Sep 13, 2024
@github-project-automation github-project-automation bot moved this to Backlog in dev Sep 13, 2024
@gcroci2 gcroci2 moved this from Todo to In progress in dev Jan 21, 2025
@gcroci2 gcroci2 self-assigned this Jan 21, 2025
@gcroci2 gcroci2 linked a pull request Jan 21, 2025 that will close this issue
@gcroci2
Copy link
Contributor Author

gcroci2 commented Jan 24, 2025

Before I further finalize the table and I add the tests, I have some questions @CunliangGeng. You can already see the results table by running the app from this branch.

  • What is the cutoff value we want to have by default for METCALF? Now it's 1. [SOLVED]
  • Do we want to display items with a cutoff value higher, lower, or equal to the value set in the scoring filter? Now it's >=. [SOLVED]
  • Which of the originally designed columns do we want to include (see the image at the top of this issue)? If they're are not a lot, I could avoid implementing the "Select column" button for now.
  • Which of the original hyperlinks in the table do we want to have now (see the image at the top of this issue)? For example, for the "Score" column, it doesn't make sense now to have a hyperlink since we can have only one scoring method.

@justinjjvanderhooft
Copy link

Let me give my 5 cents here:

  • the default cut-off could be 0.05 - that would leave out the most obvious mismatches (and users can always put it on 0.00 if they think that helps their analysis)
  • I would go for items equal or higher than the specified value (as you have it now I think)
  • these 5 columns (still) look useful to me
  • true, though don't we have the regular and normalized Metcalfe score? Anyways, if we have one value, no hyperlink is needed. Let's see if we can add the normalized correlation score asap, maybe as a part of the workshop we could make a start during the parallel sessions.

Thanks for the great work @gcroci2 :-)

@gcroci2
Copy link
Contributor Author

gcroci2 commented Jan 31, 2025

  • the default cut-off could be 0.05 - that would leave out the most obvious mismatches (and users can always put it on 0.00 if they think that helps their analysis)

Updated to 0.05

  • I would go for items equal or higher than the specified value (as you have it now I think)

Indeed :)

  • these 5 columns (still) look useful to me

About the possible columns, what are "Product", "Predicted BGC Class", and "Taxonometry"? Not in general, but I mean within NPLinker code-base. So far I have inserted in the table these columns: GCF ID, # Links, Top 1 Spectrum ID, Average Score. @CunliangGeng

  • true, though don't we have the regular and normalized Metcalfe score? Anyways, if we have one value, no hyperlink is needed. Let's see if we can add the normalized correlation score asap, maybe as a part of the workshop we could make a start during the parallel sessions.

Currently, for each GCF ID with n links, I calculate the average of the Metcalf scores across all n links.
What is the difference between the regular and normalized Metcalf scores? @justinjjvanderhooft

I've noticed that LinkGraph objects (from npl.get_links(npl.gcfs, "metcalf")) can have 3 different relationship types:

  • GCF -> MolecularFamily
  • GCF -> Spectrum
  • Spectrum -> GCF

Questions:

  1. Why do we need all three types? Can't we just use GCF -> Spectrum and derive MolecularFamily relationships through Spectrum objects when needed?

  2. Analysis of the data shows:

    • Very few GCF IDs overlap between MF and Spectrum links
    • GCF IDs differ between GCF -> Spectrum and Spectrum -> GCF links (I was expecting these to match)
      What am I missing here?

For now, I'm only using GCF -> Spectrum links for the genomics -> metabolomics tab. Does this make sense?

@justinjjvanderhooft
Copy link

@gcroci2 regarding the different version of the Metcalf correlation score, this was described here, briefly: the standardised score for each prospective link takes into account the sizes of the GCF and the MF and adjusts the score accordingly, making the scores comparable between links involving strain sets of different sizes as is necessary when, for example, comparing scores for different spectra or MF for a particular GCF.

Regarding the different relationship types, I think it makes sense to keep both spectrum and MolecularFamily in, as at the level of MolecularFamily, the strain membership is different than at the level of Spectrum - akin to GCF versus BGC. Thinking of it now, we do not treat the BGCs in the same way as spectra, something we could consider in the future. Happy to explain further in person, as this can get confusing quite quickly....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In progress
Development

Successfully merging a pull request may close this issue.

2 participants