Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update percolator to pepxml rewriting #917

Open
wants to merge 2 commits into
base: develop
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions MSFragger-GUI/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,7 @@ distributions {
dependencies {
implementation 'com.github.albfernandez:juniversalchardet:2.4.0'
compileOnly files('tools/batmass-io-1.28.8.jar')
implementation 'com.fasterxml.woodstox:woodstox-core:6.2.8'
implementation 'com.google.code.gson:gson:2.10'
implementation 'one.util:streamex:0.8.1'
implementation 'org.jooq:jool-java-8:0.9.14'
Expand Down
2 changes: 1 addition & 1 deletion MSFragger-GUI/src/com/dmtavt/fragpipe/FragpipeRun.java
Original file line number Diff line number Diff line change
Expand Up @@ -1137,7 +1137,7 @@ private static boolean configureTaskGraph(JComponent parent, Path wd, Path jarPa
addConfig.accept(cmdPercolator, () -> {
if (cmdPercolator.isRun()) {
final String percolatorCmd = percolatorPanel.getCmdOpts();
if (!cmdPercolator.configure(parent, jarPath, percolatorCmd, isCombinedPepxml_percolator, sharedPepxmlFilesBeforePeptideValidation, crystalcPanel.isRun(), percolatorPanel.getMinProb())) {
if (!cmdPercolator.configure(parent, jarPath, percolatorCmd, decoyTag, isCombinedPepxml_percolator, sharedPepxmlFilesBeforePeptideValidation, crystalcPanel.isRun(), percolatorPanel.getMinProb())) {
return false;
}
}
Expand Down
60 changes: 29 additions & 31 deletions MSFragger-GUI/src/com/dmtavt/fragpipe/cmd/CmdPercolator.java
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,8 @@
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;

import com.github.chhh.utils.PathUtils;
import org.apache.commons.io.FilenameUtils;
import org.jooq.lambda.Seq;
import org.slf4j.Logger;
Expand Down Expand Up @@ -113,7 +115,7 @@ private static String remove_rank_suffix(final String s) {
/**
* @param pepxmlFiles Either pepxml files after search or after Crystal-C.
*/
public boolean configure(Component comp, Path jarFragpipe, String percolatorCmd, boolean combine, Map<InputLcmsFile, List<Path>> pepxmlFiles, boolean hasCrystalC, double minProb) {
public boolean configure(Component comp, Path jarFragpipe, String percolatorCmd, String decoyPrefix, boolean combine, Map<InputLcmsFile, List<Path>> pepxmlFiles, boolean hasCrystalC, double minProb) {
PeptideProphetParams percolatorParams = new PeptideProphetParams();
percolatorParams.setCmdLineParams(percolatorCmd);

Expand All @@ -133,55 +135,60 @@ public boolean configure(Component comp, Path jarFragpipe, String percolatorCmd,
for (Entry<InputLcmsFile, List<Path>> e : pepxmlFiles.entrySet()) {
for (Path pepxmlPath : e.getValue()) {
final Path pepxmlDir = pepxmlPath.getParent();
final String nameWithoutExt = FilenameUtils.removeExtension(pepxmlPath.getFileName().toString());
final String basename = remove_rank_suffix(nameWithoutExt);
if(!basenames.add(basename))
//final String nameWithoutExt = FilenameUtils.removeExtension(pepxmlPath.getFileName().toString());
final String nameWithoutExt = PathUtils.removeExtension(pepxmlPath.getFileName().toString(), 2, 10);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chhh Why create a new PathUtils.removeExtension() to replace the existing FilenameUtils.removeExtension?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one has additional parameters - how many times to remove an extension (in case of files like file.raw.pep.xml) and a limit on the length of the extension, this catches cases when somebody puts a dot in the file name.
I didn't just replace it in order to replace it, there was a real life example where the original one from apache commons failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is why I forced MSFragger to always generate <file name>.pepXML. If you want to support <file name>.pep.xml, there will be a lot of places to change and test. I think many places, including the other tools used by FragPipe, assume that the extension is everything after the last dot.

Since your case will never happen if use MSFragger, I don't think it is necessary to implement this new function to make the things more complicated.

final String fnBase = remove_rank_suffix(nameWithoutExt);
if(!basenames.add(fnBase))
continue;
// Percolator
List<String> cmdPp = new ArrayList<>();
final String percolator_bin = OsUtils.isUnix() ? "percolator-305/percolator" :
OsUtils.isWindows() ? "percolator-305/percolator.exe" : null;
cmdPp.add(FragpipeLocations.checkToolsMissing(Seq.of(percolator_bin)).get(0).toString());

String strippedBaseName;
final String fnStripped;
if (hasCrystalC) {
strippedBaseName = basename.replaceFirst("_c$", "");
fnStripped = fnBase.replaceFirst("_c$", "");
} else {
strippedBaseName = basename;
fnStripped = fnBase;
}

addFreeCommandLineParams(percolatorParams, cmdPp);
TabWorkflow tabWorkflow = Fragpipe.getStickyStrict(TabWorkflow.class);
cmdPp.add("--num-threads");
cmdPp.add("" + tabWorkflow.getThreads());
cmdPp.add("--protein-decoy-pattern");
cmdPp.add(decoyPrefix);
cmdPp.add("--results-psms");
cmdPp.add(strippedBaseName + "_percolator_target_psms.tsv");
cmdPp.add(fnStripped + "_percolator_target_psms.tsv");
cmdPp.add("--decoy-results-psms");
cmdPp.add(strippedBaseName + "_percolator_decoy_psms.tsv");
cmdPp.add(fnStripped + "_percolator_decoy_psms.tsv");

if (msboosterPanel.isRun()) {
cmdPp.add(Paths.get(strippedBaseName + "_edited.pin").toString());
cmdPp.add(Paths.get(fnStripped + "_edited.pin").toString());
} else {
cmdPp.add(Paths.get(strippedBaseName + ".pin").toString());
cmdPp.add(Paths.get(fnStripped + ".pin").toString());
}

ProcessBuilder pbPp = new ProcessBuilder(cmdPp);
setupEnv(pepxmlDir, pbPp);
pbisParallel.add(new PbiBuilder()
.setPb(pbPp)
.setParallelGroup(basename).create());
.setParallelGroup(fnBase).create());

// convert the percolator output tsv to PeptideProphet's pep.xml format
ProcessBuilder pbRewrite = pbConvertToPepxml(jarFragpipe, "interact-" + basename, strippedBaseName, basename, e.getKey().getDataType().contentEquals("DDA"), minProb);
final String fnOutBase = "interact-" + fnBase;
final boolean isDda = e.getKey().getDataType().contentEquals("DDA");
ProcessBuilder pbRewrite = pbConvertToPepxml(jarFragpipe, fnOutBase, fnStripped, fnBase, isDda, minProb);
pbRewrite.directory(pepxmlPath.getParent().toFile());
pbisPostParallel.add(new PbiBuilder().setName("Percolator: Convert to pepxml").setPb(pbRewrite).setParallelGroup(ProcessBuilderInfo.GROUP_SEQUENTIAL).create());

// delete intermediate files
PercolatorPanel percolatorPanel = Fragpipe.getStickyStrict(PercolatorPanel.class);
if (!percolatorPanel.isKeepTsvFiles()) {
final List<Path> temp = new ArrayList<>();
temp.add(pepxmlDir.resolve(strippedBaseName + "_percolator_target_psms.tsv"));
temp.add(pepxmlDir.resolve(strippedBaseName + "_percolator_decoy_psms.tsv"));
temp.add(pepxmlDir.resolve(fnStripped + "_percolator_target_psms.tsv"));
temp.add(pepxmlDir.resolve(fnStripped + "_percolator_decoy_psms.tsv"));
List<ProcessBuilder> pbsDeleteTemp = ToolingUtils
.pbsDeleteFiles(jarFragpipe, temp);
pbisPostParallel.addAll(pbsDeleteTemp.stream()
Expand Down Expand Up @@ -221,26 +228,17 @@ public ProcessBuildersDescriptor getBuilderDescriptor() {
return b;
}

private static ProcessBuilder pbConvertToPepxml(Path jarFragpipe, String outBaseName, String stripedBasename, String basename, boolean isDDA, double minProb) {
private static ProcessBuilder pbConvertToPepxml(Path jarFragpipe, String fnOutBase, String fnStripped, String fnBase, boolean isDDA, double minProb) {
if (jarFragpipe == null) {
throw new IllegalArgumentException("jar can't be null");
}
final List<String> cmd = new ArrayList<>();
cmd.add(Fragpipe.getBinJava());
cmd.add("-cp");
Path root = FragpipeLocations.get().getDirFragpipeRoot();
String libsDir = root.resolve("lib") + "/*";
if (Files.isDirectory(jarFragpipe)) {
libsDir = jarFragpipe.getParent().getParent().getParent().getParent().resolve("build/install/fragpipe/lib") + "/*";
log.debug("Dev message: Looks like FragPipe was run from IDE, changing libs directory to: {}", libsDir);
}
cmd.add(libsDir);
final List<String> cmd = ToolingUtils.cmdStubForJar(jarFragpipe);
cmd.add(PercolatorOutputToPepXML.class.getCanonicalName());
cmd.add(stripedBasename + ".pin");
cmd.add(basename);
cmd.add(stripedBasename + "_percolator_target_psms.tsv");
cmd.add(stripedBasename + "_percolator_decoy_psms.tsv");
cmd.add(outBaseName);
cmd.add(fnStripped + ".pin");
cmd.add(fnBase);
cmd.add(fnStripped + "_percolator_target_psms.tsv");
cmd.add(fnStripped + "_percolator_decoy_psms.tsv");
cmd.add(fnOutBase);
cmd.add(isDDA ? "DDA" : "DIA");
cmd.add(minProb + "");
return new ProcessBuilder(cmd);
Expand Down
Loading