diff --git a/CHANGELOG.md b/CHANGELOG.md
index 062971847..add0e5f38 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -1,5 +1,21 @@
# Changelog
+## [v2.0.0-alpha.7](https://github.com/NPLinker/nplinker/tree/v2.0.0-alpha.7) (2024-11-28)
+
+[Full Changelog](https://github.com/NPLinker/nplinker/compare/v2.0.0-alpha.6...v2.0.0-alpha.7)
+
+**Closed issues:**
+
+- Incorrect precursor m/z when loading MGF file from GNPS [\#282](https://github.com/NPLinker/nplinker/issues/282)
+- Use bigscape version in loaders [\#271](https://github.com/NPLinker/nplinker/issues/271)
+
+**Merged pull requests:**
+
+- remove default config file to make all settings explicit [\#287](https://github.com/NPLinker/nplinker/pull/287) ([CunliangGeng](https://github.com/CunliangGeng))
+- add support of mibig v4.0 [\#286](https://github.com/NPLinker/nplinker/pull/286) ([CunliangGeng](https://github.com/CunliangGeng))
+- fix the resolving of genbank and jgi IDs [\#285](https://github.com/NPLinker/nplinker/pull/285) ([CunliangGeng](https://github.com/CunliangGeng))
+- Precursor m/z value fix [\#283](https://github.com/NPLinker/nplinker/pull/283) ([liannette](https://github.com/liannette))
+
## [v2.0.0-alpha.6](https://github.com/NPLinker/nplinker/tree/v2.0.0-alpha.6) (2024-09-17)
[Full Changelog](https://github.com/NPLinker/nplinker/compare/v2.0.0-alpha.5...v2.0.0-alpha.6)
diff --git a/CITATION.cff b/CITATION.cff
index f4d42db61..0029adb18 100644
--- a/CITATION.cff
+++ b/CITATION.cff
@@ -45,7 +45,7 @@ authors:
-
given-names: Marnix
family-names: Medema
-version: "2.0.0-alpha.6"
+version: "2.0.0-alpha.7"
repository-code: "https://github.com/NPLinker/nplinker"
keywords:
- Genome
diff --git a/README.md b/README.md
index 6afc95870..7580e160d 100644
--- a/README.md
+++ b/README.md
@@ -16,7 +16,8 @@
| Citation data consistency | [![cffconvert](https://github.com/NPLinker/nplinker/actions/workflows/cffconvert.yml/badge.svg)](https://github.com/NPLinker/nplinker/actions/workflows/cffconvert.yml) |
-# Natural Products Linker (NPLinker)
+![NPLinker Logo](./docs/images/NPLinker_standard_black.svg)
+
NPLinker is a python framework for data mining microbial natural products by integrating genomics and metabolomics data.
Original paper: [Ranking microbial metabolomic and genomic links in the NPLinker framework using complementary scoring functions](https://doi.org/10.1371/journal.pcbi.1008920).
diff --git a/docs/concepts/config_file.md b/docs/concepts/config_file.md
index d7e8f4315..e2f74ffc6 100644
--- a/docs/concepts/config_file.md
+++ b/docs/concepts/config_file.md
@@ -4,13 +4,9 @@
--8<-- "src/nplinker/data/nplinker.toml"
```
+## Example Configuration
-## Default Configurations
-The default configurations are automatically used by NPLinker if you don't set them in your config file.
-
-```toml
---8<-- "src/nplinker/nplinker_default.toml"
-```
+For a full example of a configuration file, see [here](../quickstart.md#3-prepare-config-file).
## Config loader
diff --git a/docs/images/NPLinker_icon_black.svg b/docs/images/NPLinker_icon_black.svg
new file mode 100644
index 000000000..b1808e1d2
--- /dev/null
+++ b/docs/images/NPLinker_icon_black.svg
@@ -0,0 +1,25 @@
+
+
\ No newline at end of file
diff --git a/docs/images/NPLinker_icon_white.svg b/docs/images/NPLinker_icon_white.svg
new file mode 100644
index 000000000..3b2e95308
--- /dev/null
+++ b/docs/images/NPLinker_icon_white.svg
@@ -0,0 +1,25 @@
+
+
\ No newline at end of file
diff --git a/docs/images/NPLinker_standard_black.svg b/docs/images/NPLinker_standard_black.svg
new file mode 100644
index 000000000..38cec3178
--- /dev/null
+++ b/docs/images/NPLinker_standard_black.svg
@@ -0,0 +1,71 @@
+
+
\ No newline at end of file
diff --git a/docs/images/NPLinker_standard_white.svg b/docs/images/NPLinker_standard_white.svg
new file mode 100644
index 000000000..2da5590f0
--- /dev/null
+++ b/docs/images/NPLinker_standard_white.svg
@@ -0,0 +1,71 @@
+
+
\ No newline at end of file
diff --git a/docs/index.md b/docs/index.md
index f443c12cc..681f61b13 100644
--- a/docs/index.md
+++ b/docs/index.md
@@ -1,4 +1,5 @@
-# NPLinker
+#
+
NPLinker is a python framework for data mining microbial natural products by integrating genomics and metabolomics data.
diff --git a/docs/quickstart.md b/docs/quickstart.md
index 1d918c83a..de215f038 100644
--- a/docs/quickstart.md
+++ b/docs/quickstart.md
@@ -166,16 +166,27 @@ is recommended to put it in the working directory created in step 2.
The details of all settings can be found at this page [Config File](./concepts/config_file.md).
-To keep it simple, [default settings](./concepts/config_file.md#default-configurations) will be used
-automatically by NPLinker if you don't set them in your `nplinker.toml` config file.
-
-What you need to do is to set the `root_dir` and `mode` in the `nplinker.toml` file.
+Here are some example values for the `nplinker.toml` file:
=== "`local` mode"
```toml title="nplinker.toml"
root_dir = "absolute/path/to/working/directory" # (1)!
mode = "local"
- # and other settings you want to override the default settings
+
+ [log]
+ level = "DEBUG"
+ use_console = true
+
+ [mibig]
+ to_use = true
+ version = "3.1"
+
+ [bigscape]
+ version = 1
+ cutoff = "0.30"
+
+ [scoring]
+ methods = ["metcalf"]
```
1. Replace `absolute/path/to/working/directory` with the **absolute** path to the working directory
@@ -187,7 +198,22 @@ What you need to do is to set the `root_dir` and `mode` in the `nplinker.toml` f
root_dir = "absolute/path/to/working/directory" # (1)!
mode = "podp"
podp_id = "podp_id" # (2)!
- # and other settings you want to override the default settings
+
+ [log]
+ level = "DEBUG"
+ use_console = true
+
+ [mibig]
+ to_use = true
+ version = "3.1"
+
+ [bigscape]
+ version = 2
+ cutoff = "0.30"
+ parameters = "--mibig_version 3.1 --include_singletons --gcf_cutoffs 0.30"
+
+ [scoring]
+ methods = ["metcalf"]
```
1. Replace `absolute/path/to/working/directory` with the **absolute** path to the working directory
diff --git a/docs/scripts/extra.js b/docs/scripts/extra.js
new file mode 100644
index 000000000..f0b72b388
--- /dev/null
+++ b/docs/scripts/extra.js
@@ -0,0 +1,21 @@
+document.addEventListener("DOMContentLoaded", function () {
+ const img = document.querySelector(".theme-toggle-image");
+
+ if (!img) return; // Exit if no image is found
+
+ // Function to update the image based on the current theme
+ function updateImage() {
+ const theme = document.body.getAttribute("data-md-color-scheme");
+ img.src = theme === "slate" ? "images/NPLinker_standard_white.svg" : "images/NPLinker_standard_black.svg";
+ }
+
+ // Initial update
+ updateImage();
+
+ // Observe changes to the `data-md-color-scheme` attribute
+ const observer = new MutationObserver(updateImage);
+ observer.observe(document.body, {
+ attributes: true,
+ attributeFilter: ["data-md-color-scheme"], // Monitor changes to the theme attribute
+ });
+});
\ No newline at end of file
diff --git a/mkdocs.yml b/mkdocs.yml
index a3aa6e8f7..5ba372a81 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -35,7 +35,8 @@ theme:
logo: 'material/library-outline'
previous: fontawesome/solid/angle-left
next: fontawesome/solid/angle-right
- favicon: 'favicon.png'
+ favicon: images/NPLinker_icon_black.svg
+ logo: images/NPLinker_icon_white.svg
repo_name: nplinker/nplinker
repo_url: https://github.com/nplinker/nplinker
@@ -47,6 +48,9 @@ extra:
extra_css:
- css/extra.css
+extra_javascript:
+ - scripts/extra.js
+
# https://www.mkdocs.org/user-guide/configuration/#validation
validation:
omitted_files: warn
diff --git a/pyproject.toml b/pyproject.toml
index c627f6ca9..74d050a05 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -6,7 +6,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "nplinker"
-version = "2.0.0-alpha.6"
+version = "2.0.0-alpha.7"
description = "Natural Products Linker"
readme = "README.md"
requires-python = ">=3.9"
diff --git a/src/nplinker/__init__.py b/src/nplinker/__init__.py
index e7efe61e9..c80be90cc 100644
--- a/src/nplinker/__init__.py
+++ b/src/nplinker/__init__.py
@@ -7,7 +7,7 @@
__author__ = "Cunliang Geng"
__email__ = "c.geng@esciencecenter.nl"
-__version__ = "2.0.0-alpha.6"
+__version__ = "2.0.0-alpha.7"
__all__ = ["NPLinker", "setup_logging"]
diff --git a/src/nplinker/config.py b/src/nplinker/config.py
index d1ade8bae..a9491799f 100644
--- a/src/nplinker/config.py
+++ b/src/nplinker/config.py
@@ -1,6 +1,5 @@
from __future__ import annotations
from os import PathLike
-from pathlib import Path
from dynaconf import Dynaconf
from dynaconf import Validator
from nplinker.utils import transform_to_full_path
@@ -25,11 +24,8 @@ def load_config(config_file: str | PathLike) -> Dynaconf:
if not config_file.exists():
raise FileNotFoundError(f"Config file '{config_file}' not found")
- # Locate the default config file
- default_config_file = Path(__file__).resolve().parent / "nplinker_default.toml"
-
# Load config files
- config = Dynaconf(settings_files=[config_file], preload=[default_config_file])
+ config = Dynaconf(settings_files=[config_file])
# Validate configs
config.validators.register(*CONFIG_VALIDATORS)
@@ -61,7 +57,7 @@ def load_config(config_file: str | PathLike) -> Dynaconf:
is_in=["NOTSET", "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"],
),
Validator("log.file", is_type_of=str),
- Validator("log.use_console", is_type_of=bool),
+ Validator("log.use_console", required=True, is_type_of=bool),
# Mibig
Validator("mibig.to_use", required=True, is_type_of=bool),
Validator(
@@ -71,9 +67,9 @@ def load_config(config_file: str | PathLike) -> Dynaconf:
when=Validator("mibig.to_use", eq=True),
),
# BigScape
- Validator("bigscape.parameters", required=True, is_type_of=str),
+ Validator("bigscape.parameters", is_type_of=str),
Validator("bigscape.cutoff", required=True, is_type_of=str),
- Validator("bigscape.version", required=True, is_type_of=int),
+ Validator("bigscape.version", required=True, is_type_of=int, is_in=[1, 2]),
# Scoring
## `scoring.methods` must be a list of strings and must contain at least one of the
## supported scoring methods.
diff --git a/src/nplinker/data/nplinker.toml b/src/nplinker/data/nplinker.toml
index 6a9f0d8da..f069de474 100644
--- a/src/nplinker/data/nplinker.toml
+++ b/src/nplinker/data/nplinker.toml
@@ -2,76 +2,87 @@
# NPLinker configuration file
#############################
-# The root directory of the NPLinker project. You need to create it first.
-# The value is required and must be a full path.
root_dir = ""
+# [REQUIRED] The value is required and must be a full path.
+# The root directory of the NPLinker project. You need to create it first.
+
+mode = "podp"
+# [REQUIRED] Available values are "podp" and "local".
# The mode for preparing dataset.
-# The available modes are "podp" and "local".
# "podp" mode is for using the PODP platform (https://pairedomicsdata.bioinformatics.nl/) to prepare the dataset.
-# "local" mode is for preparing the dataset locally. So uers do not need to upload their data to the PODP platform.
-# The value is required.
-mode = "podp"
-# The PODP project identifier.
-# The value is required if the mode is "podp".
+# "local" mode is for preparing the dataset locally. So users do not need to upload their data to the PODP platform.
+
podp_id = ""
+# [REQUIRED-UNDER-CONDITIONS] The value is required if the mode is "podp".
+# The PODP project identifier.
+# Example: The identifier is "4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4" for the project
+# https://pairedomicsdata.bioinformatics.nl/projects/4b29ddc3-26d0-40d7-80c5-44fb6631dbf9.4
[log]
-# Log level. The available levels are same as the levels in python package `logging`:
-# "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL".
-# The default value is "INFO".
+# Settings for logging.
+
level = "INFO"
+# [REQUIRED] Available values are "DEBUG", "INFO", "WARNING", "ERROR", and "CRITICAL".
+# Log level.
+
+use_console = true
+# [REQUIRED] Available values are "true" and "false".
+# Whether to write log messages to console.
+
+file = "path/to/logfile"
+# [OPTIONAL]
# The log file to append log messages.
-# The value is optional.
# If not set or use empty string, log messages will not be written to a file.
# The file will be created if it does not exist. Log messages will be appended to the file if it exists.
-file = "path/to/logfile"
-# Whether to write log meesages to console.
-# The default value is true.
-use_console = true
[mibig]
-# Whether to use mibig metadta (json).
-# The default value is true.
+# Settings for MIBiG.
+
to_use = true
-# The version of mibig metadata.
-# Make sure using the same version of mibig in bigscape.
-# The default value is "3.1"
+# [REQUIRED] Available values are `true` and `false`.
+# Whether to use MIBiG annotations/metadata data for the analysis.
+
version = "3.1"
+# [REQUIRED-UNDER-CONDITIONS] The version must be same as the version of MIBiG used in BiG-SCAPE.
+# The version of MIBiG data to use.
+# Check all available versions at https://mibig.secondarymetabolites.org/download.
[bigscape]
-# The parameters to use for running BiG-SCAPE.
-# Version of BiG-SCAPE to run. Make sure to change the parameters property below as well
-# when changing versions.
+# Settings for BiG-SCAPE.
+
version = 1
-# Required BiG-SCAPE parameters.
+# [REQUIRED] Available values are 1 and 2. 1 for version 1.x series and 2 for version 2.x series.
+# The version of BiG-SCAPE to use.
+
+cutoff = "0.30"
+# [REQUIRED] The value must be a string.
+# Which cutoff to use for the analysis.
+# There might be multiple cutoffs in the BiG-SCAPE output and this value must be one of them.
+
+parameters = "version1_parameters_or_version2_parameters"
+# [REQUIRED-UNDER-CONDITIONS] It's required when you want to run BiG-SCAPE in NPLinker.
+# Parameters for running BiG-SCAPE.
# --------------
# For version 1:
# -------------
-# Required parameters are: `--mix`, `--include_singletons` and `--cutoffs`. NPLinker needs them to run the analysis properly.
-# Do NOT set these parameters: `--inputdir`, `--outputdir`, `--pfam_dir`. NPLinker will automatically configure them.
-# If parameter `--mibig` is set, make sure to set the config `mibig.to_use` to true and `mibig.version` to the version of mibig in BiG-SCAPE.
-# The default value is "--mibig --clans-off --mix --include_singletons --cutoffs 0.30".
+# The parameters MUST contain `--mix`, `--include_singletons` and `--cutoffs`. NPLinker needs them to run the analysis properly.
+# The parameters must NOT contain `--inputdir`, `--outputdir`, `--pfam_dir`. NPLinker will automatically configure them.
+# An example value could be: "--mibig --clans-off --mix --include_singletons --cutoffs 0.30".
# --------------
# For version 2:
# --------------
-# Note that BiG-SCAPE v2 has subcommands. NPLinker requires the `cluster` subcommand and its parameters.
+# BiG-SCAPE v2 has subcommands. NPLinker requires the `cluster` subcommand and its parameters.
# Required parameters of `cluster` subcommand are: `--mibig_version`, `--include_singletons` and `--gcf_cutoffs`.
# DO NOT set these parameters: `--pfam_path`, `--inputdir`, `--outputdir`. NPLinker will automatically configure them.
# BiG-SCPAPE v2 also runs a `--mix` analysis by default, so you don't need to set this parameter here.
-# Example parameters for BiG-SCAPE v2: "--mibig_version 3.1 --include_singletons --gcf_cutoffs 0.30"
-parameters = "--mibig --clans-off --mix --include_singletons --cutoffs 0.30"
-# Which bigscape cutoff to use for NPLinker analysis.
-# There might be multiple cutoffs in bigscape output.
-# Note that this value must be a string.
-# The default value is "0.30".
-cutoff = "0.30"
+# An example value could be: "--mibig_version 3.1 --include_singletons --gcf_cutoffs 0.30"
[scoring]
-# Scoring methods.
-# Valid values are "metcalf" and "rosetta".
-# The default value is "metcalf".
+# Settings for scoring.
methods = ["metcalf"]
+# [REQUIRED] Available values are "metcalf" and "rosetta".
+# Scoring methods to use for the analysis.
\ No newline at end of file
diff --git a/src/nplinker/genomics/antismash/podp_antismash_downloader.py b/src/nplinker/genomics/antismash/podp_antismash_downloader.py
index 3973d2208..efbbffacd 100644
--- a/src/nplinker/genomics/antismash/podp_antismash_downloader.py
+++ b/src/nplinker/genomics/antismash/podp_antismash_downloader.py
@@ -2,7 +2,6 @@
import json
import logging
import re
-import time
import warnings
from collections.abc import Mapping
from collections.abc import Sequence
@@ -10,8 +9,6 @@
from pathlib import Path
import httpx
from bs4 import BeautifulSoup
-from bs4 import NavigableString
-from bs4 import Tag
from jsonschema import validate
from nplinker.defaults import GENOME_STATUS_FILENAME
from nplinker.genomics.antismash import download_and_extract_antismash_data
@@ -20,7 +17,6 @@
logger = logging.getLogger(__name__)
-NCBI_LOOKUP_URL = "https://www.ncbi.nlm.nih.gov/assembly/?term={}"
JGI_GENOME_LOOKUP_URL = (
"https://img.jgi.doe.gov/cgi-bin/m/main.cgi?section=TaxonDetail&page=taxonDetail&taxon_oid={}"
)
@@ -251,90 +247,49 @@ def get_best_available_genome_id(genome_id_data: Mapping[str, str]) -> str | Non
return best_id
-def _ncbi_genbank_search(genbank_id: str, retry_times: int = 3) -> Tag | NavigableString | None:
- url = NCBI_LOOKUP_URL.format(genbank_id)
- retry = 1
- while retry <= retry_times:
- logger.info(f"Looking up GenBank data for {genbank_id} at {url}")
- resp = httpx.get(url, follow_redirects=True)
- if resp.status_code == httpx.codes.OK:
- # the page should contain a
element with class "assembly_summary_new". retrieving
- # the page seems to fail occasionally in the middle of lengthy sequences of genome
- # lookups, so there might be some throttling going on. this will automatically retry
- # the lookup if the expected content isn't found the first time
- soup = BeautifulSoup(resp.content, "html.parser")
- # find the