-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add class DatasetArranger #215
Conversation
This stack of pull requests is managed by Graphite. Learn more about stacking. Join @CunliangGeng and the rest of your teammates on Graphite |
- Add class `DatasetArranger` - Add dataset validation functions `validate_gnps`, `validate_antismash` and `validate_bigscape`
- remove function `podp_run_bigscape` - updated function `run_bigscape`
remove invalid steps
f7ab6ce
to
5400394
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great refactor :D
- Just for me to know, did you use a tool for creating the md diagram displayed in PODP mode and local data mode #117?
- I haven't run local tests myself, should I?
"DatasetLoader({}, {}, {})".format(self._root, self.dataset_id, self._remote_loading) | ||
) | ||
|
||
def __repr__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure we don't want to implement this anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not get which part of code you're talking about. The loader.py still needs further refactoring. In this PR I just cleaned it to make the arranger work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant the __repr__
magic method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still there, I did not remove it. Github does not render the changes correctly...
this is a workaround to solve the issues in `tests/conftest.py`: it copies example data for each process if multi-process test is enabled
I talked about that on the sprint meeting last week ;-) Two ways:
I first used the live editor and then copied the raw code to github that renders it as diagram automatically.
Not necessary for this PR. |
Merge activity
|
Nice thanks I was looking for something like 2. |
This is a big PR to implement the pipelines of data arranging, which enables the local and podp modes.
Arranging data means
root_dir
Basically, it means all steps needed to make data ready for loading.
The pipelines of arranging data for different types of data are displayed in the diagram of #117.
To keep the data arranging workflow simple, we use fixed project directory structure (see #163) with fixed dir and file names (see
globals.py
).To use nplinker, users are required to
root_dir
manually and use it as the root directory of the nplinker projectnplinker.toml
and put it in theroot_dir
Major changes
Added file
arranger.py
including the classDatasetArranger
and some validation functions, which implement the pipelines of arranging dataClean/remove/update some files to make the arrangers work (some may need further refactoring in future PRs)
runbigscape.py
downloader.py
and its tests, which is replaced byDatasetArranger
loader.py
andnplinker.py
to use theDatasetArranger
Added integration tests for the arranger (tests passed)
nplinker_local_mode.toml
tests/conftest.py
test_nplinker_local.py
to test thelocal mode
Tests on podp mode also passed on my local machine. Due to the cost of running bigscape, the tests will be added to the codebase in next PRs.