Update percolator to pepxml rewriting #917

chhh · 2022-12-07T23:15:19Z

I would greatly appreciate if you merged this change, it would make maintaining my fork a lot easier.

The main thing it does is add some structure to the spaghetti code in PercolatorOutputToPepXML.
Adds stox pepxml parsing and a test for what the rewriting actually does.

fcyu · 2022-12-08T01:18:30Z

MSFragger-GUI/src/com/dmtavt/fragpipe/cmd/CmdPercolator.java

-        final String basename = remove_rank_suffix(nameWithoutExt);
-        if(!basenames.add(basename))
+        //final String nameWithoutExt = FilenameUtils.removeExtension(pepxmlPath.getFileName().toString());
+        final String nameWithoutExt = PathUtils.removeExtension(pepxmlPath.getFileName().toString(), 2, 10);


@chhh Why create a new PathUtils.removeExtension() to replace the existing FilenameUtils.removeExtension?

This one has additional parameters - how many times to remove an extension (in case of files like file.raw.pep.xml) and a limit on the length of the extension, this catches cases when somebody puts a dot in the file name.
I didn't just replace it in order to replace it, there was a real life example where the original one from apache commons failed.

This is why I forced MSFragger to always generate <file name>.pepXML. If you want to support <file name>.pep.xml, there will be a lot of places to change and test. I think many places, including the other tools used by FragPipe, assume that the extension is everything after the last dot.

Since your case will never happen if use MSFragger, I don't think it is necessary to implement this new function to make the things more complicated.

fcyu · 2022-12-08T01:23:50Z

@chhh , there are a lot of changes. I need more time to review and test them. Will revisit this pull request when I have time later.

Thanks,

Fengchao

chhh · 2022-12-08T02:06:06Z

@fcyu sure, the main changes are in PercolatorOutputToPepXML.percolatorToPepXML() method. The logic is kept the same, but everything is split into smaller functions.

There's a "test" PercolatorOutputToPepXMLTest with which you can run the function to try it out. Comment out Files.deleteIfExists(path) and you will get an interact- file in resources/percolator-to-pepxml/fragpipe-search-16_PXD022287_msfragger/interact-Human-Protein-Training_Trypsin.pep.xml

chhh · 2022-12-20T18:35:57Z

@guoci @fcyu Is there any hope that you will merge this as a Christmas present?

fcyu · 2022-12-20T18:52:00Z

Hi @chhh ,

I started to review the code but got interrupted by other things. I could merge it before next year, but I think we can do it better since you are almost re-writing the whole module.

I always think reading the pepXML file to a class and writing the modified file using SAX is better than manipulating the strings in an ad-hoc way. I left some comments in

FragPipe/MSFragger-GUI/src/com/dmtavt/fragpipe/tools/percolator/PercolatorOutputToPepXML.java

Line 146 in 3e0189e

    
           for (final String e : search_hit_line.split("\\s")) { // fixme: the code assumes that all attributes are in one line, which makes it not robust

and

FragPipe/MSFragger-GUI/src/com/dmtavt/fragpipe/tools/percolator/PercolatorOutputToPepXML.java

Line 218 in 3e0189e

    
           while (iterator.hasNext()) { // fixme: the code assumes that there are always <search_hit, massdiff=, and calc_neutral_pep_mass=, which makes it not robust

Batmass-io has the module to read pepXML file, so I think we could use that. I discussed with Guo Ci but no one had the time to do it. Do you think it would be a better idea, especially when you want to make the pepxml rewriting robust and support other flavor?

Best,

Fengchao

chhh · 2022-12-20T19:54:36Z

Goal number one is to make it at least a little more readable and less error prone.
Look at current percolatorToPepxml() vs the proposed percolatorToPepxml()
Instead of one huge inline function 230 lines long which is impossible to really understand what it's doing the new version is only 70 lines despite being able to handle 2 different search engines.

This conversion clearly splits this process into smaller function each related to reading information from files or writing to files.
I suggest to first verify that the results of this conversion are correct. And merge it as is.

I wanted to remove all the string matching for xml stuff, but this was too much to do in one go. I did start though, some parameters from pepxml files are now being parsed using a stox parser, look at StoxParserPepxml (should have been Stax, it's a typo) and its usage in PercolatorOutputToPepXML.

I wouldn't use JAXB here because it has to parse the whole file into memory and is relatively slow. Also very memory intensive, the in-memory representation generated by JAXB parser is often larger than the original file on disk (because of all the lists it creates internally). And combined interact files can actually be really huge in some cases, so I'd suggest to only resort to stax parsing.

In the new code the whole "rewriting" part is contained in a single function createInteractFile(), actually just in one 80-line block inside it. So all of that "use sax/stax/jaxb etc to rewrite file" is just about updating these 80 lines to use stax instead of reading files line by line. But it still requires step 1 to be done.

fcyu · 2022-12-24T04:10:48Z

Hi @chhh ,

Thank you for the explanation and effort. I have briefly reviewed the code. I think the changes can be classified into:

Variable renaming and minor refactor
Adding Comet support
Code refactor to make it more robust (not so sure) and easy to maintain.

I think for change type 1, need to undo them to make the other changes easy to track. For change type 2, also need to undo since FragPipe don't support Comet. Leave those code in FragPipe would confuse others, including me ;). I also don't think I have the bandwidth to support both search engines in the future.

After reverting type 1 and 2, I will review the code and merge them if they pass the tests.

Merry Christmas,

Fengchao

asalt · 2024-10-14T21:40:47Z

I recently got an error that seems related to this:

Cannot find output_report_topN parameter from .... pepXML
Process 'Percolator: Convert to pepxml' finished, exit code: 1
Process returned non-zero exit code, stopping.

This was after many previous files successfully ran through percolator, making me think it is a bug.

log file attached-
log_2024-10-14_15-08-30.txt

fcyu · 2024-10-14T23:32:29Z

Hi @asalt, I don't think your error is related to this pull request because it hasn't been merged.

Please do not send questions or issues to pull requests. Please re-submit it to https://github.com/Nesvilab/FragPipe/issues

Best,

Fengchao

Bring some sanity to pin/pepxml rewriting

aaa0088

chhh requested review from guoci and fcyu December 7, 2022 23:15

chhh changed the base branch from master to develop December 7, 2022 23:15

fcyu force-pushed the develop branch 3 times, most recently from 120e1d3 to 019ccd4 Compare December 8, 2022 01:01

Remove comet test dir from test

58de27b

fcyu reviewed Dec 8, 2022

View reviewed changes

fcyu requested review from fcyu and removed request for guoci December 24, 2022 03:17

fcyu self-assigned this Dec 24, 2022

fcyu force-pushed the develop branch from 1c6d345 to dec9f54 Compare January 13, 2023 20:34

fcyu force-pushed the develop branch from d732dbb to 4a3d0d3 Compare July 3, 2023 01:08

fcyu force-pushed the develop branch 2 times, most recently from bb0d811 to b5cfd5c Compare September 29, 2023 02:51

fcyu force-pushed the develop branch from ac96509 to 3366efc Compare May 26, 2024 16:15

fcyu force-pushed the develop branch from ae89b0a to 20aba81 Compare August 18, 2024 18:38

fcyu force-pushed the develop branch from ae56090 to 56ca8c3 Compare October 11, 2024 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update percolator to pepxml rewriting #917

Update percolator to pepxml rewriting #917

chhh commented Dec 7, 2022

fcyu Dec 8, 2022

chhh Dec 8, 2022

fcyu Dec 8, 2022

fcyu commented Dec 8, 2022

chhh commented Dec 8, 2022

chhh commented Dec 20, 2022

fcyu commented Dec 20, 2022

chhh commented Dec 20, 2022

fcyu commented Dec 24, 2022

asalt commented Oct 14, 2024

fcyu commented Oct 14, 2024

Update percolator to pepxml rewriting #917

Are you sure you want to change the base?

Update percolator to pepxml rewriting #917

Conversation

chhh commented Dec 7, 2022

fcyu Dec 8, 2022

Choose a reason for hiding this comment

chhh Dec 8, 2022

Choose a reason for hiding this comment

fcyu Dec 8, 2022

Choose a reason for hiding this comment

fcyu commented Dec 8, 2022

chhh commented Dec 8, 2022

chhh commented Dec 20, 2022

fcyu commented Dec 20, 2022

chhh commented Dec 20, 2022

fcyu commented Dec 24, 2022

asalt commented Oct 14, 2024

fcyu commented Oct 14, 2024