Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handful of (unnecessary) non-ASCII chars (and other formatting issues) in biomappings #172

Open
matentzn opened this issue Nov 6, 2024 · 0 comments

Comments

@matentzn
Copy link

matentzn commented Nov 6, 2024

There are a handful of (IMO unnecessary) non-ASCII chars in biomappings, eg

CHEBI:202852	5-heptadeca-8′Z,11′Z,16-trienylresorcinol	skos:exactMatch	MESH:C511035	5-heptadeca-8'Z,11'Z,16-trienylresorcinol	semapv:LexicalMatching		https://creativecommons.org/publicdomain/zero/1.0/	https://github.com/biomappings/biomappings/blob/4324c1/scripts/import_gilda_mappings.py	0.95

The back tick seems to have a non ASCII encoding, I think for no good reason - they are just a handful, and they could cause a little risk for data corruption when translating back and forth between formats.

A full set can easily determined like this:

pip install tsvalid
tsvalid biomappings.sssom.tsv
tmp/biomappings.sssom.tsv:4551:6: W1: Non ASCII character in column 6 at line number 4551.
tmp/biomappings.sssom.tsv:6559:6: E3: Redundant trailing whitespace in column 6 at line number 6559.
tmp/biomappings.sssom.tsv:8164:6: E3: Redundant trailing whitespace in column 6 at line number 8164.
tmp/biomappings.sssom.tsv:10255:2: W1: Non ASCII character in column 2 at line number 10255.
tmp/biomappings.sssom.tsv:10933:2: W1: Non ASCII character in column 2 at line number 10933.
tmp/biomappings.sssom.tsv:24900:6: W1: Non ASCII character in column 6 at line number 24900.
tmp/biomappings.sssom.tsv:25292:6: E3: Redundant trailing whitespace in column 6 at line number 25292.
tmp/biomappings.sssom.tsv:25455:6: E3: Redundant trailing whitespace in column 6 at line number 25455.
tmp/biomappings.sssom.tsv:25698:6: E3: Redundant trailing whitespace in column 6 at line number 25698.
tmp/biomappings.sssom.tsv:25809:6: E3: Redundant trailing whitespace in column 6 at line number 25809.
tmp/biomappings.sssom.tsv:26360:6: W1: Non ASCII character in column 6 at line number 26360.
tmp/biomappings.sssom.tsv:26558:6: E3: Redundant trailing whitespace in column 6 at line number 26558.
tmp/biomappings.sssom.tsv:27695:6: E3: Redundant trailing whitespace in column 6 at line number 27695.
tmp/biomappings.sssom.tsv:28221:6: E3: Redundant trailing whitespace in column 6 at line number 28221.
tmp/biomappings.sssom.tsv:28339:6: E3: Redundant trailing whitespace in column 6 at line number 28339.
tmp/biomappings.sssom.tsv:28462:6: E3: Redundant trailing whitespace in column 6 at line number 28462.
tmp/biomappings.sssom.tsv:29023:6: E3: Redundant trailing whitespace in column 6 at line number 29023.
tmp/biomappings.sssom.tsv:29509:6: E3: Redundant trailing whitespace in column 6 at line number 29509.
tmp/biomappings.sssom.tsv:29861:6: E3: Redundant trailing whitespace in column 6 at line number 29861.
tmp/biomappings.sssom.tsv:31155:6: E3: Redundant trailing whitespace in column 6 at line number 31155.
tmp/biomappings.sssom.tsv:31198:6: E3: Redundant trailing whitespace in column 6 at line number 31198.
tmp/biomappings.sssom.tsv:31369:6: E3: Redundant trailing whitespace in column 6 at line number 31369.
tmp/biomappings.sssom.tsv:32152:6: E3: Redundant trailing whitespace in column 6 at line number 32152.
tmp/biomappings.sssom.tsv:32406:6: E3: Redundant trailing whitespace in column 6 at line number 32406.
tmp/biomappings.sssom.tsv:34116:6: E3: Redundant trailing whitespace in column 6 at line number 34116.
tmp/biomappings.sssom.tsv:34168:6: E3: Redundant trailing whitespace in column 6 at line number 34168.
tmp/biomappings.sssom.tsv:37980:6: E3: Redundant trailing whitespace in column 6 at line number 37980.
tmp/biomappings.sssom.tsv:39409:6: E3: Redundant trailing whitespace in column 6 at line number 39409.
tmp/biomappings.sssom.tsv:39936:6: W1: Non ASCII character in column 6 at line number 39936.
tmp/biomappings.sssom.tsv:44879:6: W1: Non ASCII character in column 6 at line number 44879.
tmp/biomappings.sssom.tsv:45306:6: W1: Non ASCII character in column 6 at line number 45306.
tmp/biomappings.sssom.tsv:45307:6: W1: Non ASCII character in column 6 at line number 45307.
tmp/biomappings.sssom.tsv:45689:6: W1: Non ASCII character in column 6 at line number 45689.
tmp/biomappings.sssom.tsv:46979:6: E3: Redundant trailing whitespace in column 6 at line number 46979.
tmp/biomappings.sssom.tsv:47835:6: W1: Non ASCII character in column 6 at line number 47835.
tmp/biomappings.sssom.tsv:48631:6: W1: Non ASCII character in column 6 at line number 48631.
tmp/biomappings.sssom.tsv:50335:6: W1: Non ASCII character in column 6 at line number 50335.
tmp/biomappings.sssom.tsv:50686:6: W1: Non ASCII character in column 6 at line number 50686.
tmp/biomappings.sssom.tsv:50943:6: W1: Non ASCII character in column 6 at line number 50943.
tmp/biomappings.sssom.tsv:54443:6: W1: Non ASCII character in column 6 at line number 54443.
tmp/biomappings.sssom.tsv:57979:6: W1: Non ASCII character in column 6 at line number 57979.
tmp/biomappings.sssom.tsv:58500:6: W1: Non ASCII character in column 6 at line number 58500.
tmp/biomappings.sssom.tsv:65909:2: E3: Redundant trailing whitespace in column 2 at line number 65909.
tmp/biomappings.sssom.tsv:65909:6: E3: Redundant trailing whitespace in column 6 at line number 65909.
tmp/biomappings.sssom.tsv:65910:2: E3: Redundant trailing whitespace in column 2 at line number 65910.
tmp/biomappings.sssom.tsv:65910:6: E3: Redundant trailing whitespace in column 6 at line number 65910.
tmp/biomappings.sssom.tsv:65912:2: E3: Redundant trailing whitespace in column 2 at line number 65912.
tmp/biomappings.sssom.tsv:65912:6: E3: Redundant trailing whitespace in column 6 at line number 65912.
tmp/biomappings.sssom.tsv:65936:2: E3: Redundant trailing whitespace in column 2 at line number 65936.
tmp/biomappings.sssom.tsv:65936:6: E3: Redundant trailing whitespace in column 6 at line number 65936.

This is for example Gene regulatory network modelling somitogenesis which has an unnecessary space in the end:

tmp/biomappings.sssom.tsv:65912:2: E3: Redundant trailing whitespace in column 2 at line number 65912. 
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant