Releases: iquasere/UPIMAPI
UPIMAPI now run as "upimapi"
Changed the symbolic link from upimapi.py
to upimapi
.
Also changed TaxIDs database name to "taxids_database.fasta".
Fixes on partitioning taxonomic columns
1.8.14 Fixes on partitioning taxonomic columns
Fix when merging with previous ID mapping
Taxonomic columns were messing things up, and becoming repeated.
Now, it first produces these columns, and only then it merges with previous result.
Also fixes unpaired columns, as those become NAs.
Fixes on parsing taxonomy
Taxonomy was not considering some taxa have commas. Now, it parses the Taxonomic lineage
column fine.
Also set new default columns, concerning the seven most popular levels of taxonomy - Superkingdom, Phylum, Class, Order, Family, Genus, Species. These are extracted from the Taxonomic lineage
and the Organism
columns.
These are all exported as Taxonomic lineage (taxon level)
, as was previously the case in the old version of UniProt's API (the one that worked fine until it was ruined by idiotic development).
Correct handling of FASTA input when only ID mapping
Correct handling of FAST input when only ID mapping
When inputting a FASTA file solely for ID mapping, UPIMAPI was not parsing the file correctly. It was not getting the IDs correctly (was retrieving the sequences alongside them) and trying to parse the IDs as "full IDs" was breaking UPIMAPI.
Now, UPIMAPI gets only the names of the sequences, and correctly parses them.
Removed unneccessary user input to check if annotation should be performed
Also removed unneccessary user input to check if annotation should be performed if the user inputs a FASTA file and specifies --no-annotation
.
Users know what they want, and the default is to perform annotation. This was a leftover from when ID mapping was the main feature of UPIMAPI, and now is removed.
Accesses columns through API
No more need for apt-get packages! UPIMAPI now obtains available columns of the API through the API itself!
Also, it checks for valid and invalid columns, ignoring and reporting on the incorrect columns. Bit of an input sanitization.
Deal with dotted IDs
Dotted IDs (e.g. A1ZAI5.1
) are identified by UniProt as valid IDs. However, mapping them will return a 400
error.
IDs are now split by the dot to return truly valid IDs (e.g. A1ZAI5
).
Maaaaajor speed improvement
On taxonomy parsing for columns Taxonomic lineage
and Taxonomic lineage IDs
.
Changed to using pandas
methods.
Removed testing artifact
Limiting ID mapping to only first 1000 IDs.
Taxonomic lineage columns reestablished
Provides Taxonomic lineage
and Taxonomic lineage IDs
for all levels of taxonomy.
If some field of Taxonomic lineage
or Taxonomic lineage IDs
is specified, UPIMAPI will retrieve the corresponding column of ID mapping, i.e., Taxonomic lineage
and Taxonomic lineage (IDs)
, respectively, and parse them to obtain the request information.
E.g., if Taxonomic lineage (SPECIES)
information was requested, UPIMAPI will search in Taxonomic lineage
column for some example of Species name (species)
, and retrieve the relevant information.
Not requested taxonomic information (other levels of taxonomy) are discarded.
This follows previous behavior of UPIMAPI (before version 1.8), and closes the adaptations that were necessary because of UniProt's update this year.