-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for the first Mascot submission #71
Comments
Great! Could you share the mzIdentML file with me, pls. There was something I wanted to check in it. (Something I thought I saw in another Mascot generated mzIdentML recently, to do with repetition of the same peptide). |
@colin-combe I copied the files to dropbox and I will share you the FTP details |
Thanks. I think there's something not right in these mzIdentML files, but not something that will stop them working in our system. The mzid specification states:
There is a complication re peptide uniqueness when it comes to the crosslinked peptides. Setting that aside and just looking at the 'linear' (uncrosslinked) peptides, it seems in the Mascot output they are not unique but instead repeated everytime they are identified. This is OK for us, it works, but its sub-optimal. It bloats the files unnecessarily, then our database, and then the xiview web page takes longer to load because it is being sent duplicates of all the peptides. I think it's worth taking this up with them to see what they say. (@vrkosk ?) |
@colin-combe Do you mean cases like:
I see what you mean. Mascot is currently taking a very PSM-centric view. The above are duplicate identifications of the same peptide in sequential Mascot queries. I agree it would be better if Mascot collated them into something like:
And where peptide_ref="peptide_162_1" is used in , replace it with peptide_ref="peptide_SPDKPGK". This would reduce duplication in elements as well, which currently repeat the start and end position and pre and post residues needlessly:
I'll add a change request. |
yes, cases like that. |
it's a little more complicated with the crosslinked peptides, where it's the crosslinked pair of peptides that is meant to be unique |
is it weird that in these files there are things like: so the rank is 3, but it has passThreshold = true? @vrkosk ? |
...i guess it's probably meant to be like this, guess there's no reason why not |
A Mascot PSM is significant if expect value < sigthreshold. This is encoded as passThreshold = true in the mzIdentML export. It's perfectly possible for the rank 1, 2 and 3 matches to have a similar score and, thus, similar expect values, all of which are statistically significant. Because the ranks are ordered by score, if rank 3 has passThreshold = true, then ranks 1 and 2 must also have passThreshold = true. (I don't think this is a rule that needs to be coded anywhere, just pointing it out here.) |
ok, thanks. |
@colin-combe @sureshhewabi, as soon as we are sure these files will work, let me know so we can prepare the submission for the PRIDE Archive. Excellent work, Thanks @vrkosk for your support, the Mascot team has always been responsive and helpful. Thanks. |
@sureshhewabi is working on the first Mascot submission with some data from the Mascot team. An issue was found while parsing the MGF, already an issue in pyteomics has been created levitsky/pyteomics#153.
This issue is related to the support of the main search engines #63
The text was updated successfully, but these errors were encountered: