Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: create antismash downloader module and move there inherent code #119

Closed
wants to merge 20 commits into from

Conversation

gcroci2
Copy link
Contributor

@gcroci2 gcroci2 commented Feb 28, 2023

  • I moved popd antismash download and extraction related stuff to nplinker/pairedomics/
  • I moved the code regarding antismash data download and extraction from pairedomics/downloader.py to genomics/antismash/downloader.py. What about using only downloader.py as the module name? It's already in antismash/ folder, so calling it antismash_downloader.py seems redundant to me.
  • I refactored old function download_antismash_data, which now is deprecated, and created a new one download_and_extract_antismash_data, which takes as input a single item id to be used to retrieve Antismash data. In the refactored downloader for PoDP it will be inserted in a for loop to iterate over all the items available. The downloader module will be refactored in a different PR (see issue Refactor Downloader class to have a specific PoDP downloader #121).
  • I moved antismash tests to tests/antismash folder, calling them test_downloader.py and test_loader.py, to not repeat myself. But this is related to the comment above, so let me know if we want to keep antismash_[...].py in the naming or if we want to remove that because everything is already in antismash/ folders.
  • I've run prospector and static typing checks, and reordered imports on the new antismash files
  • I've added a proper docstring to the antismash downloader file

Notes

  • Two tests are failing in the dev branch, and thus also here: tests/test_nplinker.py::test_get_links and pytest tests/test_nplinker.py::test_load_data; it would be handy to fix them before merging this PR, just to be sure that it doesn't introduce any issues. In both cases, the error given is TypeError: unhashable type: 'Strain'. Should I open a new issue about the failing tests? Something similar but different was happening in the same tests (see issue Fix failing tests #96), but should have been already solved.

@gcroci2 gcroci2 linked an issue Feb 28, 2023 that may be closed by this pull request
4 tasks
@gcroci2 gcroci2 self-assigned this Feb 28, 2023
@gcroci2 gcroci2 added the GEN genomics related issues label Feb 28, 2023
@gcroci2 gcroci2 requested a review from CunliangGeng February 28, 2023 15:47
Copy link
Member

@CunliangGeng CunliangGeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The antismash_downloader.py handles with the downloading of antismash data based on PODP input. If the input changes, then it's still not easy to use this downloader. We need to further refactor this part, and let's have a chat about it tomorrow.

@gcroci2 gcroci2 requested a review from CunliangGeng March 3, 2023 16:52
@hechth
Copy link
Collaborator

hechth commented Mar 8, 2023

@gcroci2 did you branch from the most up-to-date version of dev? I introduced the hashing function for strains in #118 I think.

@gcroci2
Copy link
Contributor Author

gcroci2 commented Mar 8, 2023

@gcroci2 did you branch from the most up-to-date version of dev? I introduced the hashing function for strains in #118 I think.

I've just merged dev branch :)

import pytest
from nplinker.genomics.antismash import download_and_extract_antismash_metadata

class TestDownloadAndExtractAntismashData():
Copy link
Contributor Author

@gcroci2 gcroci2 Mar 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I would like to implement a fixture method for use download_root, extract_root, and refseq_assembly_id across the entire class. I tried to look it up online but I don't get how to do it for multiple variables.
  • Apparently, passing in tmp_path creates a default testing dir which is deleted at the end of the test. Is this the best practice though? Would be nice to create temporary folders using tmpdir in a fixture method, and then retain the temp folders across the entire class. How should I implement that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For refseq_assembly_id you can directly use refseq_assembly_id="GCF_004339725.1" as a global variable in the class, then you can directly use the variable in functions.

If it's really necessary to share the folders across class, you can see the example using fixture tmp_path_factory: https://pytest.org/en/latest/how-to/tmp_path.html#the-tmp-path-factory-fixture

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read the docs there already, but if I try to implement it by creating a temporary folder and yielding it, then the folder is unknown to the other methods of the class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the tmp_dir be a class attribute? So that the whole class just has one link to that which is created when parsing the class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, this would be the alternative. But I was looking for using the fixture thing. Anyway it was more for my personal curiosity, it's not needed at all in this specific test :)

Copy link
Contributor Author

@gcroci2 gcroci2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on the top of the PR @CunliangGeng

@CunliangGeng
Copy link
Member

See my comment on the top of the PR @CunliangGeng

@gcroci2 The failed tests have been solved and the latest commit of dev branch passed all tests.
Please give me a signal when the PR is ready to review again.

@gcroci2
Copy link
Contributor Author

gcroci2 commented Mar 9, 2023

See my comment on the top of the PR @CunliangGeng

@gcroci2 The failed tests have been solved and the latest commit of dev branch passed all tests. Please give me a signal when the PR is ready to review again.

I've merged with dev but they're still failing on my local machine, different error now:

=========================================================== short test summary info ===========================================================
FAILED tests/test_nplinker.py::test_load_data - assert 364 == 390
FAILED tests/test_nplinker.py::test_get_links - assert 58 == 101

@gcroci2 gcroci2 requested a review from CunliangGeng March 9, 2023 16:31
@CunliangGeng
Copy link
Member

FAILED tests/test_nplinker.py::test_load_data - assert 364 == 390
FAILED tests/test_nplinker.py::test_get_links - assert 58 == 101

@gcroci2 The branch of this PR passed all tests on my laptop. So there must be something wrong with your data. You could remove all data in ~/nplinker_data folder but keep bigscape data (to save time), and then run the test again.

@hechth
Copy link
Collaborator

hechth commented Mar 10, 2023

FAILED tests/test_nplinker.py::test_load_data - assert 364 == 390
FAILED tests/test_nplinker.py::test_get_links - assert 58 == 101

@gcroci2 The branch of this PR passed all tests on my laptop. So there must be something wrong with your data. You could remove all data in ~/nplinker_data folder but keep bigscape data (to save time), and then run the test again.

Maybe you ran an older version of bigscape? might also be possible :/

Copy link
Member

@CunliangGeng CunliangGeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last commit "xMerge branch 'dev' into 98_add_antismash_downloader_gcroci2
" is wrong and it removed all the latest updates of dev branch.

Please revert this commit, and make sure the file changes of this PR only involves the target files (for this PR they should only be downloader files).

gcroci2 added 2 commits March 15, 2023 12:47
This reverts commit f64a17c, reversing
changes made to 4879181.

revert merge
…croci2""

This reverts commit 0c5b6bd.

revert last merge
@gcroci2
Copy link
Contributor Author

gcroci2 commented Mar 15, 2023

Because of reverting merged commit issues, I've opened a new PR (#127)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GEN genomics related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create antismash_downloader module
3 participants