Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sirius bindings #2

Merged
merged 59 commits into from
Feb 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
59 commits
Select commit Hold shift + click to select a range
b7366d9
sirius builder structure
oolonek Jan 25, 2024
173e7a6
parameters options
oolonek Jan 25, 2024
42cd551
tested sirius config
oolonek Jan 25, 2024
ef497b4
to_default edit
oolonek Jan 25, 2024
29848d0
pathes
oolonek Jan 25, 2024
f7ec3a0
up config info
oolonek Jan 25, 2024
dd1c0a5
added --FormulaSearchDB
oolonek Jan 25, 2024
cda1bef
started adding FormulaSearchDB options
mvisani Jan 25, 2024
21bdcc5
Merge branch 'sirius-bindings' of github.com:earth-metabolome-initiat…
mvisani Jan 25, 2024
4561276
imported path
oolonek Jan 25, 2024
ee8ad8d
started adding FormulaSearchDB options
mvisani Jan 25, 2024
a3c2098
Merge branch 'sirius-bindings' of github.com:earth-metabolome-initiat…
mvisani Jan 25, 2024
9fabe03
created versioning of Sirius. This allows for having different parame…
mvisani Jan 26, 2024
06efe6f
added test for running sirius
mvisani Jan 26, 2024
e26d263
launching test with sirius
oolonek Jan 26, 2024
d77d681
formula db options
oolonek Jan 26, 2024
0e7ebc3
up
oolonek Jan 26, 2024
f5dc764
Refactored parameters configuration
LucaCappelletti94 Jan 26, 2024
b27c4a5
Added support for canopus
LucaCappelletti94 Jan 26, 2024
4e99eca
Added examples of grouped configs
LucaCappelletti94 Jan 26, 2024
821051d
Chained canopus
LucaCappelletti94 Jan 26, 2024
1805465
Added contrib
LucaCappelletti94 Jan 26, 2024
d840638
up
oolonek Jan 26, 2024
c9a6a85
added some paramters to Sirius config
mvisani Jan 30, 2024
73b3d13
added some paramters to Sirius config
mvisani Jan 30, 2024
c96a4bd
calling SIRIUS_USERNAME in the env
oolonek Jan 30, 2024
7c6ea3f
formula bindings
oolonek Jan 30, 2024
a1fb3c9
Zodiac bindings
oolonek Jan 30, 2024
b6bde0d
Fingerprint bindings added
oolonek Jan 30, 2024
a762aee
cargo fmt
oolonek Jan 30, 2024
96879c1
ignoring .vscode folder
oolonek Jan 30, 2024
4bd7838
Structure bindings
oolonek Jan 30, 2024
e66ce96
\o/ working Sirius job
oolonek Jan 30, 2024
cbb5be9
adding template env
oolonek Jan 31, 2024
ddef246
advanced in Sirius config parameters
mvisani Jan 31, 2024
f1b3e25
added funtion to set parameters in scr/builder.rs
mvisani Feb 1, 2024
e867902
finished adding parameters into builder
mvisani Feb 2, 2024
5656016
finished adding parameters into builder
mvisani Feb 2, 2024
cb6123b
advanced in config params
mvisani Feb 2, 2024
ac6bcdf
fixed error for removing directory in tests. Started implementation o…
mvisani Feb 5, 2024
8ec9632
added fuzzer to the tests, changed crate from FormulaSearchDB to Sear…
mvisani Feb 5, 2024
61737bd
write-summaries bindings
oolonek Feb 5, 2024
62bf8fb
added documentation for function in builder
mvisani Feb 6, 2024
e8dd1f1
Added flag to force missing docs
LucaCappelletti94 Feb 6, 2024
4f22d94
Now including README as lib documentation
LucaCappelletti94 Feb 6, 2024
3f1a70c
added doc for everything. Probably needs some changes and clarificati…
mvisani Feb 6, 2024
22e0284
added doc for everything. Probably needs some changes and clarificati…
mvisani Feb 6, 2024
d838de6
fixed probelm in tests where pipeline got stuck
mvisani Feb 6, 2024
a762cf5
polished documentation, added 'none' as default search DB
mvisani Feb 7, 2024
655e6b9
fixed typo and added fuzzing explanation to README.md
mvisani Feb 7, 2024
0c33622
fixed typo (again)
mvisani Feb 7, 2024
542492a
Merge branch 'main' into sirius-bindings
mvisani Feb 8, 2024
d431ecc
removed 'bindings/sirius/tests/data/*' from root of gitignore since i…
mvisani Feb 8, 2024
d353bec
Merge branch 'main' into sirius-bindings
mvisani Feb 8, 2024
dc95336
added test sirius for CI
mvisani Feb 8, 2024
1d912dc
fixed typo in CI
mvisani Feb 8, 2024
20435f9
typo fixed
mvisani Feb 8, 2024
a156771
fixed error of wrong install script
mvisani Feb 8, 2024
757acd6
edit .gitignore
mvisani Feb 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 2 additions & 6 deletions .github/workflows/rust.yml → .github/workflows/clippy.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Rust
name: Clippy


on:
Expand All @@ -11,7 +11,7 @@ env:
CARGO_TERM_COLOR: always

jobs:
build:
clippy:
runs-on: ubuntu-latest

steps:
Expand All @@ -24,10 +24,6 @@ jobs:
with:
toolchain: nightly
override: true
- name: Build
run: cargo build --verbose
# - name: Run tests
# run: cargo test --release --features=std
- name: Run clippy
run: cargo clippy

38 changes: 38 additions & 0 deletions .github/workflows/sirius.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Sirius


on:
push:
paths:
- "bindings/sirius/**"
branches: ["main"]
pull_request:
paths:
- "bindings/sirius/**"
branches: ["main"]

env:
CARGO_TERM_COLOR: always

jobs:
sirius:
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v2
- name: Retrieve Sirius install script
run: |
wget https://raw.githubusercontent.com/enpkg/enpkg_full/emikg-adapt/src/install_sirius.sh
chmod +x install_sirius.sh
- name: Install Sirius
run: |
bash ./install_sirius.sh
chmod +x ./sirius/bin
- name: Set up Rust
uses: actions-rs/toolchain@v1
with:
toolchain: nightly
override: true
- name: Run tests
run: cargo test --test test_sirius_panic

9 changes: 5 additions & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,10 +14,11 @@ Cargo.lock
*.pdb
.env

bindings/sirius/tests/data/*

# Ignore vs code files
# Ingnoring .vscode folder
.vscode

# Ignore .DS_Store
.DS_Store
.DS_Store

# Ignore pychache
__pycache__/
2 changes: 1 addition & 1 deletion Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
[workspace]
resolver = "2"

members = ["web/backend", "web/portal", "web/web_common"]
members = [ "bindings/sirius","web/backend", "web/portal", "web/web_common"]
Binary file added bindings/sirius/.DS_Store
Binary file not shown.
2 changes: 2 additions & 0 deletions bindings/sirius/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
tests/data/output_sirius
tests/data/output_sirius_default
513 changes: 513 additions & 0 deletions bindings/sirius/CONTRIB.md

Large diffs are not rendered by default.

13 changes: 13 additions & 0 deletions bindings/sirius/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
[package]
name = "sirius"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
dotenvy = "0.15.7"
arbitrary = { version = "1", optional = true, features = ["derive"] }
# we only require the arbitrary derivable crate when fuzz is enabled
[features]
fuzz = ["dep:arbitrary"]
211 changes: 211 additions & 0 deletions bindings/sirius/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,211 @@
# Sirius
SIRIUS is a java-based software framework for the analysis of LC-MS/MS data of metabolites and other "small molecules of biological interest". SIRIUS integrates a collection of tools, including CSI:FingerID (with [COSMIC](https://bio.informatik.uni-jena.de/software/cosmic/), [ZODIAC](https://bio.informatik.uni-jena.de/software/zodiac/) and [CANOPUS](https://bio.informatik.uni-jena.de/software/canopus/). In particular, both the graphical user interface and the command line version of SIRIUS seamlessly integrate the CSI:FingerID and CANOPUS web services.

For further reading we recommend you to refer to the official [Sirius website](https://bio.informatik.uni-jena.de/software/sirius/).

## Installation
Since version 5.7.0 SIRIUS is officially available via conda ([conda-forge](https://conda-forge.org/)) under the package name [sirius-ms](https://anaconda.org/conda-forge/sirius-ms). Native MacOS arm64 (Apple Silicon) builds are solely available via conda.

Additionally, you can install Sirius via their [GitHub repository](https://github.com/boecker-lab/sirius).

# Sirius binding
Here we present a binding for Sirius in [Rust](https://www.rust-lang.org/). This binding is a wrapper around the Sirius command line interface (CLI) and provides a more user-friendly interface for running Sirius. It also provides a safer way to run Sirius by using type safety and error handling before running the Sirius CLI.

## Usage
First you need to have Sirius installed in your system. Then you also need the following variables in your `.env` file:
```bash
SIRIUS_PATH=/path/to/sirius_executable (on macOS it should be something like `/Applications/sirius.app/Contents/MacOS/sirius`)
SIRIUS_USERNAME=your_username
SIRIUS_PASSWORD=your_password
```

Then you can use the Sirius binding in your Rust project. To do so add this to your `Cargo.toml`:
```toml
[dependencies]
sirius = "0.1"
```
and this to your crate root:
```rust
use sirius::prelude::*;
```

## Examples
Here is an example of running Sirius with the default parameters:
```bash
sirius -i tests/data/input_sirius.mgf --output tests/data/output_sirius_default --maxmz=800.0 formula zodiac fingerprint structure canopus write-summaries
```

The equivalent Rust code is:
```rust
use sirius::prelude::*;
use std::path::Path;
let sirius = SiriusBuilder::<Version5>::default()
.maximal_mz_default().unwrap()
.enable_formula().unwrap()
.enable_zodiac().unwrap()
.enable_fingerprint().unwrap()
.enable_structure().unwrap()
.enable_canopus().unwrap()
.enable_write_summaries().unwrap()
.build();
let input_file_path = Path::new("tests/data/input_sirius.mgf");
let output_file_path = Path::new("tests/data/output_sirius_default");
// Check if the path exists before attempting to remove it
if output_file_path.exists() {
let _ = std::fs::remove_dir_all(output_file_path);
}
sirius.run(input_file_path, output_file_path).unwrap();
```

You can also be more specific and add other parameters. The following example uses the parameters used for the [ENPKG pipeline](https://github.com/enpkg/enpkg_full/blob/c8e649290ee72f000c3385e7669b5da2215abad8/params/user.yml#L60):

```bash
sirius -i tests/data/input_sirius.mgf --output tests/data/output_sirius --maxmz 800 \
config --IsotopeSettings.filter=true --FormulaSearchDB=BIO --Timeout.secondsPerTree=0 \
--FormulaSettings.enforced=H,C,N,O,P --Timeout.secondsPerInstance=0 \
--AdductSettings.detectable='[[M+H]+,[M-H4O2+H]+,[M+Na]+,[M+K]+,[M+H3N+H]+,[M-H2O+H]+]' \
--UseHeuristic.mzToUseHeuristicOnly=650 --AlgorithmProfile=orbitrap --IsotopeMs2Settings=IGNORE \
--MS2MassDeviation.allowedMassDeviation=5.0ppm --NumberOfCandidatesPerIon=1 \
--UseHeuristic.mzToUseHeuristic=300 --FormulaSettings.detectable=B,Cl,Br,Se,S \
--NumberOfCandidates=10 --ZodiacNumberOfConsideredCandidatesAt300Mz=10 \
--ZodiacRunInTwoSteps=true --ZodiacEdgeFilterThresholds.minLocalConnections=10 \
--ZodiacEdgeFilterThresholds.thresholdFilter=0.95 --ZodiacEpochs.burnInPeriod=2000 \
--ZodiacEpochs.numberOfMarkovChains=10 --ZodiacNumberOfConsideredCandidatesAt800Mz=50 \
--ZodiacEpochs.iterations=20000 --AdductSettings.enforced=, \
--AdductSettings.fallback='[[M+H]+,[M+Na]+,[M+K]+]' --FormulaResultThreshold=true \
--InjectElGordoCompounds=true --StructureSearchDB=BIO \
--RecomputeResults=false formula zodiac fingerprint structure canopus write-summaries
```

The equivalent Rust code is:
```rust
use sirius::prelude::*;
use std::path::Path;
let sirius = SiriusBuilder::default()
.maximal_mz(800.0).unwrap()
.isotope_settings_filter(true).unwrap()
.formula_search_db(SearchDB::Bio).unwrap()
.timeout_seconds_per_tree(0).unwrap()
.formula_settings_enforced(AtomVector::from(vec![
Atoms::H,
Atoms::C,
Atoms::N,
Atoms::O,
Atoms::P,
])).unwrap()
.timeout_seconds_per_instance(0).unwrap()
.adduct_settings_detectable(AdductsVector::from(vec![
Adducts::MplusHplus,
Adducts::MplusHminusTwoH2Oplus,
Adducts::MplusNaplus,
Adducts::MplusKplus,
Adducts::MplusH3NplusHplus,
Adducts::MplusHminusH2Oplus,
])).unwrap()
.use_heuristic_mz_to_use_heuristic_only(650).unwrap()
.algorithm_profile(Instruments::Orbitrap).unwrap()
.isotope_ms2_settings(IsotopeMS2Settings::Ignore).unwrap()
.ms2_mass_deviation_allowed_mass_deviation(MassDeviation::Ppm(5.0)).unwrap()
.number_of_candidates_per_ion(1).unwrap()
.use_heuristic_mz_to_use_heuristic(300).unwrap()
.formula_settings_detectable(AtomVector::from(vec![
Atoms::B,
Atoms::Cl,
Atoms::Se,
Atoms::S,
])).unwrap()
.number_of_candidates(10).unwrap()
.zodiac_number_of_considered_candidates_at_300_mz(10).unwrap()
.zodiac_run_in_two_steps(true).unwrap()
.zodiac_edge_filter_thresholds_min_local_connections(10).unwrap()
.zodiac_edge_filter_thresholds_threshold_filter(0.95).unwrap()
.zodiac_epochs_burn_in_period(2000).unwrap()
.zodiac_epochs_number_of_markov_chains(10).unwrap()
.zodiac_number_of_considered_candidates_at_800_mz(50).unwrap()
.zodiac_epochs_iterations(20000).unwrap()
.adduct_settings_enforced_default().unwrap()
.adduct_settings_fallback(AdductsVector::from(vec![
Adducts::MplusHplus,
Adducts::MplusNaplus,
Adducts::MplusKplus,
])).unwrap()
.formula_result_threshold(true).unwrap()
.inject_el_gordo_compounds(true).unwrap()
.structure_search_db(SearchDB::Bio).unwrap()
.recompute_results(false).unwrap()
.enable_formula().unwrap()
.enable_zodiac().unwrap()
.enable_fingerprint().unwrap()
.enable_structure().unwrap()
.enable_canopus().unwrap()
.enable_write_summaries().unwrap()
.build();

let input_file_path = Path::new("tests/data/input_sirius.mgf");
let output_file_path = Path::new("tests/data/output_sirius");
// Check if the path exists before attempting to remove it
if output_file_path.exists() {
let _ = std::fs::remove_dir_all(output_file_path);
}
sirius.run(input_file_path, output_file_path).unwrap();
```

## Error cases
This binding also provides error handling before running the Sirius CLI.

The following example will throw an error because the `maximal_mz` is added twice:
```should_panic
use sirius::prelude::*;
use std::path::Path;
let sirius = SiriusBuilder::<Version5>::default()
.maximal_mz_default().unwrap()
.maximal_mz(70.6).unwrap()
.enable_formula().unwrap()
.enable_zodiac().unwrap()
.enable_fingerprint().unwrap()
.enable_structure().unwrap()
.enable_canopus().unwrap()
.enable_write_summaries().unwrap()
.build();
let input_file_path = Path::new("tests/data/input_sirius.mgf");
let output_file_path = Path::new("tests/data/output_sirius_default");
// Check if the path exists before attempting to remove it
if output_file_path.exists() {
let _ = std::fs::remove_dir_all(output_file_path);
}
sirius.run(input_file_path, output_file_path).unwrap();
```

Will result in the following error:
```bash
Error: "The core parameter MaximalMz(70.6) cannot be added to the configuration. There is already an existing parameter which is MaximalMz(800.0). You cannot add it twice."
```

## Limitations
For now some *config* parameters are not fully implemented and only the default values are used.

If you are interested in looking at the default values you can either run `sirius config --help`. Here we prensent is a non-exhaustive list of the parameters where only the default values are used:
* **PossibleAdductsSwitches** default is `[M+Na]+:[M+H]+,[M+K]+:[M+H]+,[M+Cl]-:[M-H]-`
* **AdductSettingsEnforced** default is `,`
* **FormulaResultRankingScore** default is `AUTO`
* **IsotopeMS2Settings** default is `IGNORE`
* **NoiseThresholdSettingsBasePeak** default is `NOT_PRECURSOR`
* **Adducts** don't have default, but some adducts are probably not included in the enumeration.

In the future, we will add the possibility to add custom values for these parameters. In case you need to add custom values for these parameters, do not hesitate to open an issue or a pull request.


## Fuzzing
Fuzzing is a technique for finding security vulnerabilities and bugs in software by providing random input to the code. It can be an effective way of uncovering issues that might not be discovered through other testing methods. In our library, we take fuzzing seriously, and we use the [cargo fuzz](https://github.com/rust-fuzz/cargo-fuzz) tool to ensure our code is robust and secure. cargo fuzz automates the process of generating and running randomized test inputs, and it can help identify obscure bugs that would be difficult to detect through traditional testing methods. We make sure that our fuzz targets are continuously updated and run against the latest versions of the library to ensure that any vulnerabilities or bugs are quickly identified and addressed.

You can learn more about fuzzing [here](https://github.com/earth-metabolome-initiative/emi-monorepo/tree/sirius-bindings/bindings/sirius/fuzz).

<!--begin cite-->
# Citing Sirius

Kai Dührkop, Markus Fleischauer, Marcus Ludwig, Alexander A. Aksenov, Alexey V. Melnik, Marvin Meusel, Pieter C. Dorrestein, Juho Rousu, and Sebastian Böcker,
[SIRIUS 4: Turning tandem mass spectra into metabolite structure information.](https://doi.org/10.1038/s41592-019-0344-8)
*Nature Methods* 16, 299–302, 2019.
<!--end cite-->


3 changes: 3 additions & 0 deletions bindings/sirius/dot_env_template
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
SIRIUS_PATH="path/to/your/sirius/app"
SIRIUS_USERNAME="sirius_login"
SIRIUS_PASSWORD="sirius_password"
4 changes: 4 additions & 0 deletions bindings/sirius/fuzz/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
target
corpus
artifacts
coverage
30 changes: 30 additions & 0 deletions bindings/sirius/fuzz/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
[package]
name = "sirius-fuzz"
version = "0.0.0"
publish = false
edition = "2021"

[package.metadata]
cargo-fuzz = true

[dependencies]
libfuzzer-sys = "0.4"
arbitrary = { version = "1", features = ["derive"] }
rand = { version = "0.8", features = ["small_rng"] }

[dependencies.sirius]
path = ".."
features = ["fuzz"]

# Prevent this from interfering with workspaces
[workspace]
members = ["."]

[profile.release]
debug = 1

[[bin]]
name = "random"
path = "fuzz_targets/random.rs"
test = false
doc = false
15 changes: 15 additions & 0 deletions bindings/sirius/fuzz/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# How to fuzz
The fuzzer allows you to test the bindings with random inputs. It uses the `cargo-fuzz` crate to generate random inputs and test the bindings with them.

## Install cargo-fuzz
```bash
cargo install cargo-fuzz
```

## Run the fuzzer
```bash
cargo fuzz run random
```

You can stop the fuzzer at any time by pressing `Ctrl+C`. The fuzzer will print the inputs that caused the crash.

Loading
Loading