Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Edta 2.2.2 - fix path to LTR_FINDER_PARALLEL #52404

Merged
merged 15 commits into from
Dec 16, 2024
Merged

Update Edta 2.2.2 - fix path to LTR_FINDER_PARALLEL #52404

merged 15 commits into from
Dec 16, 2024

Conversation

Juke34
Copy link
Contributor

@Juke34 Juke34 commented Nov 27, 2024

and fix version of ltr_retriever (>=3.0.0)


Please read the guidelines for Bioconda recipes before opening a pull request (PR).

General instructions

  • If this PR adds or updates a recipe, use "Add" or "Update" appropriately as the first word in its title.
  • New recipes not directly relevant to the biological sciences need to be submitted to the conda-forge channel instead of Bioconda.
  • PRs require reviews prior to being merged. Once your PR is passing tests and ready to be merged, please issue the @BiocondaBot please add label command.
  • Please post questions on Gitter or ping @bioconda/core in a comment.

Instructions for avoiding API, ABI, and CLI breakage issues

Conda is able to record and lock (a.k.a. pin) dependency versions used at build time of other recipes.
This way, one can avoid that expectations of a downstream recipe with regards to API, ABI, or CLI are violated by later changes in the recipe.
If not already present in the meta.yaml, make sure to specify run_exports (see here for the rationale and comprehensive explanation).
Add a run_exports section like this:

build:
  run_exports:
    - ...

with ... being one of:

Case run_exports statement
semantic versioning {{ pin_subpackage("myrecipe", max_pin="x") }}
semantic versioning (0.x.x) {{ pin_subpackage("myrecipe", max_pin="x.x") }}
known breakage in minor versions {{ pin_subpackage("myrecipe", max_pin="x.x") }} (in such a case, please add a note that shortly mentions your evidence for that)
known breakage in patch versions {{ pin_subpackage("myrecipe", max_pin="x.x.x") }} (in such a case, please add a note that shortly mentions your evidence for that)
calendar versioning {{ pin_subpackage("myrecipe", max_pin=None) }}

while replacing "myrecipe" with either name if a name|lower variable is defined in your recipe or with the lowercase name of the package in quotes.

Bot commands for PR management

Please use the following BiocondaBot commands:

Everyone has access to the following BiocondaBot commands, which can be given in a comment:

@BiocondaBot please update Merge the master branch into a PR.
@BiocondaBot please add label Add the please review & merge label.
@BiocondaBot please fetch artifacts Post links to CI-built packages/containers.
You can use this to test packages locally.

Note that the @BiocondaBot please merge command is now depreciated. Please just squash and merge instead.

Also, the bot watches for comments from non-members that include @bioconda/<team> and will automatically re-post them to notify the addressed <team>.

@Juke34
Copy link
Contributor Author

Juke34 commented Nov 27, 2024

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

No artifacts found on the most recent builds. Either the builds failed, the artifacts have been removed due to age, or the recipe was blacklisted/skipped.

@Juke34 Juke34 changed the title Edta222 - fix path to LTR_FINDER_PARALLEL Update Edta 2.2.2 - fix path to LTR_FINDER_PARALLEL Nov 27, 2024
@oushujun
Copy link
Contributor

oushujun commented Dec 3, 2024

The LTR_retriever 3.0 recipe has the blast - rmblast conflict. Below is a piece of error message from the PR check / Linux Tests:

18:25:06 BIOCONDA INFO (OUT) ClobberWarning: This transaction has incompatible packages due to a shared path.
18:25:06 BIOCONDA INFO (OUT) packages: bioconda/linux-64::blast-2.16.0-hc155240_3, bioconda/linux-64::rmblast-2.14.1-h4565617_0
18:25:06 BIOCONDA INFO (OUT) path: 'bin/blast_formatter'

This is because LTR_retriever 3.0 includes TEsorter in the dependency, which requires blast in its recipe. Further checking other EDTA dependencies found that TIR-Learner 3.0 also requires blast. Replacing blast with rmblast in the TEsorter and TIR-Learner recipes can probably solve the issue since rmblast provides the same functionality as blast but is compatible with repeatmasker.

@oushujun
Copy link
Contributor

oushujun commented Dec 3, 2024

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built are ready for inspection:

Arch Package Zip File / Repodata CI Instructions
noarch edta-2.2.2-hdfd78af_1.tar.bz2 noarch.zip GitHub Actions
showYou may also use conda to install after downloading and extracting the zip file. conda install -c ./packages <package name>

Docker image(s) built:

Package Tag CI Install with docker
edta 2.2.2--hdfd78af_1 GitHub Actions
showImages are in the linux-64 zip file above.gzip -dc images/edta---2.2.2--hdfd78af_1.tar.gz | docker load

@Juke34 Juke34 added the WIP label Dec 3, 2024
@Juke34
Copy link
Contributor Author

Juke34 commented Dec 3, 2024

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built are ready for inspection:

Arch Package Zip File / Repodata CI Instructions
noarch edta-2.2.2-hdfd78af_1.tar.bz2 noarch.zip GitHub Actions
showYou may also use conda to install after downloading and extracting the zip file. conda install -c ./packages <package name>

Docker image(s) built:

Package Tag CI Install with docker
edta 2.2.2--hdfd78af_1 GitHub Actions
showImages are in the linux-64 zip file above.gzip -dc images/edta---2.2.2--hdfd78af_1.tar.gz | docker load

@Juke34
Copy link
Contributor Author

Juke34 commented Dec 3, 2024

This latest artefacts fix the path to LTR_FINDER_parallel

@oushujun
Copy link
Contributor

oushujun commented Dec 3, 2024

The latest artifact still solved to use ltr_retriever 2.9.4, but when I pin version for LTR_retriever, it can solve:
mamba create -c ./packages/ -n test3 edta 'ltr_retriever>3.0', but with the blast-rmblast conflicts:

Linking blast-2.16.0-hc155240_3
Linking rmblast-2.14.1-h4565617_0
warning libmamba [rmblast-2.14.1-h4565617_0] The following files were already present in the environment:
- bin/blast_formatter
- bin/blastdb_aliastool
- bin/blastdbcheck
- bin/blastdbcmd
- bin/blastn
- bin/blastp
- bin/blastx
- bin/cleanup-blastdb-volumes.py
- bin/convert2blastmask
- bin/datatool
- bin/deltablast
- bin/dustmasker
- bin/get_species_taxids.sh
- bin/legacy_blast.pl
- bin/makeblastdb
- bin/makembindex
- bin/makeprofiledb
- bin/project_tree_builder
- bin/psiblast
- bin/rpsblast
- bin/rpstblastn
- bin/run_with_lock
- bin/segmasker
- bin/tblastn
- bin/tblastx
- bin/test_pcre
- bin/update_blastdb.pl
- bin/windowmasker
- bin/windowmasker_2.2.22_adapter.py
- lib/ncbi-blast+/libblast_app_util-static.a
- lib/ncbi-blast+/libblast_app_util.a
- lib/ncbi-blast+/libdbapi_driver-static.a
- lib/ncbi-blast+/libdbapi_driver.a
- lib/ncbi-blast+/libncbi_xloader_genbank-static.a
- lib/ncbi-blast+/libncbi_xloader_genbank.a
- lib/ncbi-blast+/libncbi_xreader-static.a
- lib/ncbi-blast+/libncbi_xreader.a
- lib/ncbi-blast+/libncbi_xreader_cache-static.a
- lib/ncbi-blast+/libncbi_xreader_cache.a
- lib/ncbi-blast+/libncbi_xreader_id1-static.a
- lib/ncbi-blast+/libncbi_xreader_id1.a
- lib/ncbi-blast+/libncbi_xreader_id2-static.a
- lib/ncbi-blast+/libncbi_xreader_id2.a

It installs well and seems like repeatmasker 4.1.2.p1 can work without rmblast being corrected linked. Maybe try to pin repeatmasker=4.1.2.p1 and see if it can solve?

Another idea is to update recipes of tesorter and tir-learner to replace blast with rmblast, so that all EDTA dependencies are blast-free.

@Juke34
Copy link
Contributor Author

Juke34 commented Dec 6, 2024

@oushujun ClobberWarning are not necessarily problematic/fatal.
Is there any reason why ltr_retriever recipe has repeatmasker pinned to <4.1.5? Maybe relaxing here may help.

I'm more skeptical by this message error:

11:12:36 BIOCONDA INFO (ERR) [Dec  3 11:12:36] SERR ERROR conda.core.link:_execute(950): An error occurred while installing package 'bioconda::repeatmasker-4.1.2.p1-pl5321hdfd78af_1'.
11:12:49 BIOCONDA INFO (ERR) [Dec  3 11:12:49] SOUT Rolling back transaction: ...working... done
11:12:49 BIOCONDA INFO (ERR) [Dec  3 11:12:49] SERR
11:12:49 BIOCONDA INFO (ERR) [Dec  3 11:12:49] SERR [Errno 28] No space left on device

@oushujun
Copy link
Contributor

oushujun commented Dec 6, 2024 via email

@oushujun
Copy link
Contributor

oushujun commented Dec 6, 2024

I remotely remember the later version of repeatmasker (>4.1.5) needs to download repbase or dfam for classification that made it too big to pass the automatic test.

@Juke34
Copy link
Contributor Author

Juke34 commented Dec 6, 2024

I have updated lt_retriever and tesorter.
Sounds fine anyway now What do you you think. Can you check the artefact?

@oushujun
Copy link
Contributor

oushujun commented Dec 9, 2024

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built are ready for inspection:

Arch Package Zip File / Repodata CI Instructions
noarch edta-2.2.2-hdfd78af_1.tar.bz2 noarch.zip GitHub Actions
showYou may also use conda to install after downloading and extracting the zip file. conda install -c ./packages <package name>

Docker image(s) built:

Package Tag CI Install with docker
edta 2.2.2--hdfd78af_1 GitHub Actions
showImages are in the linux-64 zip file above.gzip -dc images/edta---2.2.2--hdfd78af_1.tar.gz | docker load

@oushujun
Copy link
Contributor

oushujun commented Dec 9, 2024

I tested the new LTR_retriever and TEsorter recipes, they both worked as original, so great!
I then test installed the artefact:
mamba create -n EDTA2_test -c ./packages edta -c bioconda -c conda-forge
It could solve with the following packages:

  • blast 2.16.0 hc155240_3 bioconda/linux-64 141 MB
  • edta 2.2.2 hdfd78af_1 ~/bin/EDTA/test/packages/noarch 41 MB
  • repeatmasker 4.1.7p1 pl5321hdfd78af_1 bioconda/noarch Cached
  • repeatmodeler 1.0.8 0 bioconda/linux-64 97 KB
  • repeatscout 1.0.7 h031d066_0 bioconda/linux-64 46 KB
  • rmblast 2.14.1 h91eb8de_1 bioconda/linux-64 Cached
  • tesorter 1.4.7 pyhdfd78af_1 bioconda/noarch Cached
  • tir-learner 3.0.3 hdfd78af_0 bioconda/noarch Cached

I encountered the same old ClobberError due to installation of both blast and rmblast:

ClobberError: This transaction has incompatible packages due to a shared path.
packages: bioconda/linux-64::blast-2.16.0-hc155240_3, bioconda/linux-64::rmblast-2.14.1-h91eb8de_1
path: 'bin/update_blastdb.pl'
ClobberError: This transaction has incompatible packages due to a shared path.
packages: bioconda/linux-64::blast-2.16.0-hc155240_3, bioconda/linux-64::rmblast-2.14.1-h91eb8de_1
path: 'bin/windowmasker'
ClobberError: This transaction has incompatible packages due to a shared path.
packages: bioconda/linux-64::blast-2.16.0-hc155240_3, bioconda/linux-64::rmblast-2.14.1-h91eb8de_1
path: 'bin/windowmasker_2.2.22_adapter.py'

Installation finished and the test run was OK. Finally! I am not sure if the released recipe will also work the same, but I would love to resolve ClobberError by replacing blast with rmblast in the annosine2 recipe. I think that's the last dependency that uses blast.

Shujun

@Juke34
Copy link
Contributor Author

Juke34 commented Dec 9, 2024

I do not see blast in the annosine2 recipe...

@oushujun
Copy link
Contributor

oushujun commented Dec 9, 2024

You are right, sorry, it's tir-learner.

@mencian
Copy link
Contributor

mencian commented Dec 9, 2024

  • repeatmodeler 1.0.8 0 bioconda/linux-64 97 KB

@Juke34 @oushujun EDTA is pulling in an ancient version of RepeatModeler (v1.0.8), is this intended? May need to pin RepeatModeler to a more recent version.

@oushujun
Copy link
Contributor

oushujun commented Dec 9, 2024

@mencian yeah I noticed that. Generally speaking, EDTA does not need the functional addition to RepeatModeler2, which is the structural search using LTR_retriever. EDTA is already doing this internally. I am trying to pin it to >2 and also evaluating the old version's performance.

@mencian
Copy link
Contributor

mencian commented Dec 9, 2024

I've rebuilt TIR-Learner #52687 to depend on rmblast instead of blast.

@oushujun
Copy link
Contributor

oushujun commented Dec 9, 2024

@Juke34 I tested the v1 RepeatModeler, it did not work correctly. Please help to pin it to >=2.0. Thanks!

@mencian
Copy link
Contributor

mencian commented Dec 9, 2024

@BiocondaBot please fetch artifacts

@BiocondaBot
Copy link
Collaborator

Package(s) built are ready for inspection:

Arch Package Zip File / Repodata CI Instructions
noarch edta-2.2.2-hdfd78af_1.tar.bz2 noarch.zip GitHub Actions
showYou may also use conda to install after downloading and extracting the zip file. conda install -c ./packages <package name>

Docker image(s) built:

Package Tag CI Install with docker
edta 2.2.2--hdfd78af_1 GitHub Actions
showImages are in the linux-64 zip file above.gzip -dc images/edta---2.2.2--hdfd78af_1.tar.gz | docker load

@mencian
Copy link
Contributor

mencian commented Dec 9, 2024

@oushujun EDTA now installs without clobber warnings and pulls in the latest RepeatModeler, could you please test the built artifact to see if everything works as intended?

@Juke34
Copy link
Contributor Author

Juke34 commented Dec 10, 2024

Nothing related to EDTA but to be "clean" this is the RMBlast recipe that has to be modified to be installed using a dedicated directory and linking the exec with a different name (e.g. adding prefix rm_) than those from blast. Then all tools using rmblast should specifically called rm_exec to be sure to call the exec from rmblast. As it is Currently is really bad for reproducibility because there is plenty of recipe that may have rmblast and blast installed together. And we don't know from which the exec. come from finally.
https://www.ncbi.nlm.nih.gov/IEB/ToolBox/CPP_DOC/lxr/source/scripts/projects/rmblastn/README

@oushujun
Copy link
Contributor

@Juke34 That's a great point! I am unsure if it's as easy as modifying the conda recipe, but seems like the conflict has been one of the main causes of our troubles. Can we modify it or it has to come from NCBI?

@Juke34
Copy link
Contributor Author

Juke34 commented Dec 10, 2024

@oushujun this is feasible, we just have to modify the RMblast recipe, but we have to keep track and deal with all tools using rmblast in their recipe, because their code have to be updated to call the new exec names.
Here the list:

  • repeatmasker
  • ltr_retriever
  • rmblast (here recipe, build and test have to be modifier, not the tool)
  • repeatmodeler
  • telr
  • mirnature
  • maker

We should notify the dev of these softwares to see if it would be an issue to change the executable name to call in their software.

@oushujun
Copy link
Contributor

@Juke34 I only have control over ltr_retriever, not the remaining tools on the list. That's the part that I think will be difficult. Recipe updates can be done by us, but code updates probably involve the respective developer(s).

@oushujun
Copy link
Contributor

I tested the latest artifact, and the blast/rmblast conflict went away!

@mencian
Copy link
Contributor

mencian commented Dec 16, 2024

I'll merge this for now; improving the RMBlast recipe can take place in another PR.

@mencian mencian merged commit 25c0930 into master Dec 16, 2024
5 checks passed
@mencian mencian deleted the EDTA222 branch December 16, 2024 08:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants