-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update percolator to pepxml rewriting #917
base: develop
Are you sure you want to change the base?
Conversation
120e1d3
to
019ccd4
Compare
final String basename = remove_rank_suffix(nameWithoutExt); | ||
if(!basenames.add(basename)) | ||
//final String nameWithoutExt = FilenameUtils.removeExtension(pepxmlPath.getFileName().toString()); | ||
final String nameWithoutExt = PathUtils.removeExtension(pepxmlPath.getFileName().toString(), 2, 10); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chhh Why create a new PathUtils.removeExtension()
to replace the existing FilenameUtils.removeExtension
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one has additional parameters - how many times to remove an extension (in case of files like file.raw.pep.xml) and a limit on the length of the extension, this catches cases when somebody puts a dot in the file name.
I didn't just replace it in order to replace it, there was a real life example where the original one from apache commons failed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is why I forced MSFragger to always generate <file name>.pepXML
. If you want to support <file name>.pep.xml
, there will be a lot of places to change and test. I think many places, including the other tools used by FragPipe, assume that the extension is everything after the last dot.
Since your case will never happen if use MSFragger, I don't think it is necessary to implement this new function to make the things more complicated.
@chhh , there are a lot of changes. I need more time to review and test them. Will revisit this pull request when I have time later. Thanks, Fengchao |
@fcyu sure, the main changes are in PercolatorOutputToPepXML.percolatorToPepXML() method. The logic is kept the same, but everything is split into smaller functions. There's a "test" PercolatorOutputToPepXMLTest with which you can run the function to try it out. Comment out |
Hi @chhh , I started to review the code but got interrupted by other things. I could merge it before next year, but I think we can do it better since you are almost re-writing the whole module. I always think reading the pepXML file to a class and writing the modified file using SAX is better than manipulating the strings in an ad-hoc way. I left some comments in FragPipe/MSFragger-GUI/src/com/dmtavt/fragpipe/tools/percolator/PercolatorOutputToPepXML.java Line 146 in 3e0189e
and FragPipe/MSFragger-GUI/src/com/dmtavt/fragpipe/tools/percolator/PercolatorOutputToPepXML.java Line 218 in 3e0189e
Batmass-io has the module to read pepXML file, so I think we could use that. I discussed with Guo Ci but no one had the time to do it. Do you think it would be a better idea, especially when you want to make the pepxml rewriting robust and support other flavor? Best, Fengchao |
This conversion clearly splits this process into smaller function each related to reading information from files or writing to files.
I wouldn't use JAXB here because it has to parse the whole file into memory and is relatively slow. Also very memory intensive, the in-memory representation generated by JAXB parser is often larger than the original file on disk (because of all the lists it creates internally). And combined interact files can actually be really huge in some cases, so I'd suggest to only resort to stax parsing.
|
Hi @chhh , Thank you for the explanation and effort. I have briefly reviewed the code. I think the changes can be classified into:
I think for change type 1, need to undo them to make the other changes easy to track. For change type 2, also need to undo since FragPipe don't support Comet. Leave those code in FragPipe would confuse others, including me ;). I also don't think I have the bandwidth to support both search engines in the future. After reverting type 1 and 2, I will review the code and merge them if they pass the tests. Merry Christmas, Fengchao |
bb0d811
to
b5cfd5c
Compare
I recently got an error that seems related to this: Cannot find output_report_topN parameter from .... pepXML This was after many previous files successfully ran through percolator, making me think it is a bug. log file attached- |
Hi @asalt, I don't think your error is related to this pull request because it hasn't been merged. Please do not send questions or issues to pull requests. Please re-submit it to https://github.com/Nesvilab/FragPipe/issues Best, Fengchao |
I would greatly appreciate if you merged this change, it would make maintaining my fork a lot easier.