Simpler download of databases and more robust COG2KO conversion
Much simpler download of databases
reCOgnizer relied on --download-resources
and --skip-downloaded
parameters for setting up its databases.
--download-resources
instructed reCOgnizer to download the files required for its execution, and --skip-downloaded
instructed it to ignore already downloaded files, if there had simply been the mistake of removing one file.
Now, reCOGnizer relies on the recognizer_dwnl.timestamp
to check if databases have already been downloaded. If the file exists, it skips installation. If the file doesn't exist, reCOGnizer will remove all available files, and download everything.
COG2KO conversion more reliable
Previously, reCOGnizer built the cog2ko conversion as a collection of all KOs available for each protein mapping to the specific COG.
Now, reCOGnizer uses a similar approach to cog2ec conversion, where it will only assign a KO to a COG where over half of instances of that COG have that particular KO.
This obtains a more reliable COG2KO conversion, while keeping KOs for a considerable number of COGs.
Also removes the intermediate ssv
files outputted during construction of the cog2ko database.
New parameters --test-run and --output-rpsbproc-columns will usually not be needed
--test-run
parameter had to be implemented as consequence of a simpler database downloading. When set, reCOGnizer runs in an abnormal fashion, which is required for the tests at GitHub. reCOGnizer will move the cdd.tar.gz
file available in the repo, and use it as a valid cdd.tar.gz
file.
--output-rpsbproc-columns
will output the Superfamilies
, Sites
, Motifs
columns, which are usually empty for almost all annotations.
Removed some unnecessary files
recognizer.log
was produced at working directory. It only included rpsblast
outputs, mainly for error assessment. Users can obtain that information by running reCOGnizer with the --debug
parameter, and manually running the faulty commands.
taxonomy.rdf
was obtained as part of building taxonomy.tsv
. Now, reCOgnizer removes it after it outlived its usefulness.
Some fixes
reCOGnizer was not reporting the download of files when the --quiet
flag was set, except when the files had already been downloaded, and it removed them.
Also updated regexes to new format, the r'regex'
format.