Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Nmr bucketing3 #243

Open
wants to merge 47 commits into
base: master
Choose a base branch
from
Open

Nmr bucketing3 #243

wants to merge 47 commits into from

Conversation

lecorguille
Copy link
Member

Merge the repo https://github.com/workflow4metabolomics/nmr_bucketing to tools-metabolomics

Replace: #187

FOR CONTRIBUTOR:

  • - I have read the CONTRIBUTING.md document and this tool is appropriate for the tools-iuc repo.
  • - License permits unrestricted use (educational + commercial)
  • - This PR adds a new tool or tool collection
  • - This PR updates an existing tool or tool collection
  • - This PR does something else (explain below)

lecorguille and others added 30 commits July 26, 2023 17:51
Fix a syntax error
remove R deps
bump version
add log change
@lecorguille lecorguille requested a review from mtremblayfr as a code owner July 26, 2023 16:00
@lecorguille lecorguille mentioned this pull request Jul 26, 2023
@lecorguille
Copy link
Member Author

ping @mtremblayfr

Copy link
Contributor

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lecorguille and @mtremblayfr.

I have added some comments inline. Greeting from the Australian Cofest!

@@ -0,0 +1,302 @@
<tool id="NmrBucketing" name="NMR_Bucketing" version="2.0.3">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<tool id="NmrBucketing" name="NMR_Bucketing" version="2.0.3">
<tool id="NmrBucketing" name="NMR_Bucketing" version="2.0.3" profile="21.05">

@@ -0,0 +1,302 @@
<tool id="NmrBucketing" name="NMR_Bucketing" version="2.0.3">

<description> Bucketing and integration of NMR Bruker raw data</description>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<description> Bucketing and integration of NMR Bruker raw data</description>
<description>Bucketing and integration of NMR Bruker raw data</description>

Comment on lines +10 to +14
<stdio>
<exit_code range="1:" level="fatal" />
</stdio>

<command>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
<stdio>
<exit_code range="1:" level="fatal" />
</stdio>
<command>
<command detect_errors="exit_code">

<inputs>
<conditional name="inputs">
<param name="input" type="select" label="Choose your inputs method" >
<option value="zip_file" selected="true">Zip file from your history containing your Bruker directories</option>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

</when>
</conditional>

<param name="bucket_width" label="Bucket width" type="float" value="0.04" help="Default value is 0.04 ppm"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add a min/max attribute here and below?


<outputs>
<data format="txt" name="logOut" label="${tool.name}_log" />
<data format="tabular" name="sampleOut" label="${tool.name}_sampleMetadata" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


<tests>
<test>
<param name="inputs|input" value="zip_file" />
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use the more explicit way of defining nested structures. E.g.

Suggested change
<param name="inputs|input" value="zip_file" />
<conditional name="inputs">
<param name="input" value="zip_file" />
</conditional>


.. class:: infomark

**Authors** Marie Tremblay-Franco ([email protected]), Marion Landi ([email protected]) and Franck Giacomoni ([email protected])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use the more explicit schema.org annotations for this: https://docs.galaxyproject.org/en/latest/dev/schema.html#tool-creator-person

The advantage is that Galaxy can nicely render this content and you can access it via the API and such ....


---------------------------------------------------

Changelog/News
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the changelog its already in the Readme file and its hard to keep it in sync.

@@ -0,0 +1,4 @@
<?xml version="1.0"?>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove this file

Copy link
Contributor

@hechth hechth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see that the metabolomics tools are being consolidated in one place! I think this is also a great opportunity for some overall cleanup and refreshing the tools!

long_description: 'Part of the W4M project: http://workflow4metabolomics.org'
name: nmr_bucketing
owner: marie-tremblay-metatoul
remote_repository_url: https://github.com/workflow4metabolomics/nmr_bucketing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be updated to now point to this repository?

@@ -0,0 +1,7 @@
categories: [Metabolomics]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
categories: [Metabolomics]
categories: Metabolomics

Comment on lines +1 to +13
################################################################################################
# SPECTRA BUCKETING AND INTEGRATION FROM RAW BRUKER FILES #
# User : Galaxy #
# Original data : -- #
# Starting date : 20-10-2014 #
# Version 1 : 18-12-2014 #
# Version 2 : 07-01-2015 #
# Version 3 : 24-10-2016 #
# #
# Input files : modification on october 2016 #
# - Raw bruker files included in user-defined fileName #
# - Preprocessed files (alignment, ...) included in p x n dataframe #
################################################################################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
################################################################################################
# SPECTRA BUCKETING AND INTEGRATION FROM RAW BRUKER FILES #
# User : Galaxy #
# Original data : -- #
# Starting date : 20-10-2014 #
# Version 1 : 18-12-2014 #
# Version 2 : 07-01-2015 #
# Version 3 : 24-10-2016 #
# #
# Input files : modification on october 2016 #
# - Raw bruker files included in user-defined fileName #
# - Preprocessed files (alignment, ...) included in p x n dataframe #
################################################################################################

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please run some auto formatting tool on this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could overall be cleaned up into some minimal CRAN package as it has minimal dependencies etc. which would also mkae the Galaxy tool a bit nicer and include all the R functions. That would also make this functionality available to more users or other workflow engines.

Comment on lines +212 to +236
if (fileType=="tsv")
{
FileNames <- colnames(fileName)
n <- length(FileNames)

for (i in 1:ncol(fileName))
{
orderedSpectrum <- cbind(as.numeric(rownames(fileName)),fileName[,i])
orderedSpectrum <- orderedSpectrum[order(orderedSpectrum[,1],decreasing=T), ]

truncatedSpectrum <- orderedSpectrum[orderedSpectrum[,1] < leftBorder & orderedSpectrum[,1] > rightBorder, ]
truncatedSpectrum[,1] <- round(truncatedSpectrum[,1],3)

# Bucketing
spectrum.bucket <- NmrBrucker_bucket(truncatedSpectrum)
ppm <- spectrum.bucket[,1]

# spectrum Concatenation
if (i == 1)
bucketedSpectra <- spectrum.bucket
if (i > 1)
bucketedSpectra <- cbind(bucketedSpectra,spectrum.bucket[,2])
colnames(bucketedSpectra)[i+1] <- colnames(fileName)[i]
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this - would be cool if this was its own function.

Comment on lines +9 to +29
runExampleL <- FALSE

if(runExampleL) {
##------------------------------
## Example of arguments
##------------------------------
argLs <- list(StudyDir = "Tlse_BPASourisCerveau",
upper = "10.0",
lower = "0.50",
bucket.width = "0.01",
exclusion = "TRUE",
exclusion.zone = list(c(6.5,4.5)),
graph="Overlay")

argLs <- c(argLs,
list(dataMatrixOut = paste(directory,"_NmrBucketing_dataMatrix.tsv",sep=""),
sampleMetadataOut = paste(directory,"_NmrBucketing_sampleMetadata.tsv",sep=""),
variableMetadataOut = paste(directory,"_NmrBucketing_variableMetadata.tsv",sep=""),
graphOut = paste(directory,"_NmrBucketing_graph.pdf",sep=""),
logOut = paste(directory,"_NmrBucketing_log.txt",sep="")))
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
runExampleL <- FALSE
if(runExampleL) {
##------------------------------
## Example of arguments
##------------------------------
argLs <- list(StudyDir = "Tlse_BPASourisCerveau",
upper = "10.0",
lower = "0.50",
bucket.width = "0.01",
exclusion = "TRUE",
exclusion.zone = list(c(6.5,4.5)),
graph="Overlay")
argLs <- c(argLs,
list(dataMatrixOut = paste(directory,"_NmrBucketing_dataMatrix.tsv",sep=""),
sampleMetadataOut = paste(directory,"_NmrBucketing_sampleMetadata.tsv",sep=""),
variableMetadataOut = paste(directory,"_NmrBucketing_variableMetadata.tsv",sep=""),
graphOut = paste(directory,"_NmrBucketing_graph.pdf",sep=""),
logOut = paste(directory,"_NmrBucketing_log.txt",sep="")))
}

This should be covered in a test case ideally.

Comment on lines +41 to +55
# For parseCommandArgs function
library(batch)
# For cumtrapz function
library(pracma)

# R script call
source_local <- function(fname)
{
argv <- commandArgs(trailingOnly = FALSE)
base_dir <- dirname(substring(argv[grep("--file=", argv)], 8))
source(paste(base_dir, fname, sep="/"))
}
#Import the different functions
source_local("NmrBucketing_script.R")
source_local("DrawSpec.R")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to define one function in a file which takes all arguments, then call source on that file in the galaxy wrapper and then call that function. This would remove the entire dependency and command line passing logic. Another options would be to use configfiles.

Comment on lines +57 to +134
##------------------------------
## Errors ?????????????????????
##------------------------------


##------------------------------
## Constants
##------------------------------
topEnvC <- environment()
flagC <- "\n"


##------------------------------
## Script
##------------------------------
if(!runExampleL)
argLs <- parseCommandArgs(evaluate=FALSE)


## Parameters Loading
##-------------------
# Inputs
if (!is.null(argLs[["zipfile"]])){
fileType="zip"
zipfile= argLs[["zipfile"]]
directory=unzip(zipfile, list=F)
directory=paste(getwd(),strsplit(directory[1],"/")[[1]][2],sep="/")
} else if (!is.null(argLs[["tsvfile"]])){
fileType="tsv"
directory <- read.table(argLs[["tsvfile"]],check.names=FALSE,header=TRUE,sep="\t")
}

leftBorder <- argLs[["left_border"]]
rightBorder <- argLs[["right_border"]]
bucketSize <- argLs[["bucket_width"]]
exclusionZones <- argLs[["zone_exclusion_choices.choice"]]

exclusionZonesBorders <- NULL
if (!is.null(argLs$zone_exclusion_left))
{
for(i in which(names(argLs)=="zone_exclusion_left"))
{
exclusionZonesBorders <- c(exclusionZonesBorders,list(c(argLs[[i]],argLs[[i+1]])))
}
}

graphique <- argLs[["graphType"]]

# Outputs
nomGraphe <- argLs[["graphOut"]]
dataMatrixOut <- argLs[["dataMatrixOut"]]
logFile <- argLs[["logOut"]]
if (fileType=="zip")
{
sampleMetadataOut <- argLs[["sampleOut"]]
variableMetadataOut <- argLs[["variableOut"]]
}


## Checking R packages
##--------------------
sink(logFile)
cat("\t PACKAGE INFO \n")
pkgs=c("batch","pracma")
for(pkg in pkgs) {
suppressPackageStartupMessages( stopifnot( library(pkg, quietly=TRUE, logical.return=TRUE, character.only=TRUE)))
cat(pkg,"\t",as.character(packageVersion(pkg)),"\n",sep="")
}
cat("\n")


## Checking arguments
##-------------------
error.stock <- "\n"

if(length(error.stock) > 1)
stop(error.stock)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this whole section could be removed without the command line parsing section and if following the approach recommended above.

data_sample <- outputs[[2]]
data_variable <- outputs[[3]]
ppm <- outputs[[4]]
ppm <- round(ppm,2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be a tool parameter?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants