Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Required information for ENST00000305799.8 is missing from the Universal Transcript Archive #629

Open
ifokkema opened this issue Jun 24, 2024 · 6 comments

Comments

@ifokkema
Copy link
Collaborator

Describe the bug
When checking ENST00000305799.1:c.22G>A, VV replies:

The transcript ENST00000305799.1 is not in our database. Please check the transcript ID
The following versions of the requested transcript are available in our database: ENST00000305799.8

But when I then try that transcript, I get:

Required information for ENST00000305799.8 is missing from the Universal Transcript Archive
Query gene2transcripts with search term ENST00000305799 for available transcripts

Note that the gene2transcripts output seems to be just fine for ENST00000305799.8.

To Reproduce
Steps to reproduce the behavior:

  1. https://rest.variantvalidator.org/VariantValidator/variantvalidator_ensembl/GRCh38/ENST00000305799.1%3Ac.22G%3EA/all/?content-type=application%2Fjson
  2. https://rest.variantvalidator.org/VariantValidator/variantvalidator_ensembl/GRCh38/ENST00000305799.8%3Ac.22G%3EA/all/?content-type=application%2Fjson

Expected behavior
When I get a transcript suggested that I can use, I expect it to work.

@ifokkema
Copy link
Collaborator Author

There's actually a very large list of Ensembl IDs where this happens. First, VV suggested using it, but then when I tried, it didn't work. So far, I haven't been able to generate genomic mappings for Ensembl transcripts at all. I can provide a (long) list if needed.

@Peter-J-Freeman
Copy link
Collaborator

This could be a database issue. Will come back to this when we update the databases @ifokkema . Want to make sure it is not data before we mess with code

@ifokkema
Copy link
Collaborator Author

To Reproduce Steps to reproduce the behavior:

1. https://rest.variantvalidator.org/VariantValidator/variantvalidator_ensembl/GRCh38/ENST00000305799.1%3Ac.22G%3EA/all/?content-type=application%2Fjson

2. https://rest.variantvalidator.org/VariantValidator/variantvalidator_ensembl/GRCh38/ENST00000305799.8%3Ac.22G%3EA/all/?content-type=application%2Fjson

These links no longer work. Error:

{
  "message": "Requested Endpoint not found: See the documentation at https://rest.variantvalidator.org"
}

The trick is to remove the last slash. That's a bit odd, but I remember this happened before. That also disabled the API, if I remember correctly. It's better not to make that kind of changes to production.

Anyway, the issue itself seems to be fixed. Leaving this open in case you'd like to have this as a reminder that the API links changed.

@Peter-J-Freeman
Copy link
Collaborator

So its working but was a change @ifokkema. We do not try and change this sort of thing, so not sure why it happened. Will add some tests to make sure it is avoided

@leicray
Copy link
Contributor

leicray commented Nov 16, 2024

Here is some background that might help.

In HTML, the slash character (/) is defined as "Directory separator for resource or folder paths.". The question mark character (?) is defined as "Query string separator" but I usually think of it being used as a prefix to indicate the start of a data string (parameters) to be passed to a function on the server. It's a feature used by online retailers in URLs (or URIs for the pedants out there) to allow them to identify the original source of a URL which a potential customer has clicked upon.

Since the slash is used in defining a path, the next part of the URL would be expected to be some resource (an HTML file or an executable) in the sub-directory of the directory defined immediately before the slash. Note that a URL ought not to end in a slash, but browsers are tolerant of this error and automatically strip the offending slash. For example the URL https://variantvalidator.org/ is automatically corrected to https://variantvalidator.org by Chrome, and probably by all other browsers.

Hence. the string "/?" should not appear in a URL as the "/" is redundant but cannot be removed automatically as it's not at the end of the URL.

Two URLs were provided by @ifokkema as examples, the second of which is:

https://rest.variantvalidator.org/VariantValidator/variantvalidator_ensembl/GRCh38/ENST00000305799.1%3Ac.22G%3EA/all/?content-type=application%2Fjson

If this is submitted as is, it generates an error message as expected:

{
"message": "Requested Endpoint not found: See the documentation at https://rest.variantvalidator.org"
}`

However, removing the invalid slash immediately before the question mark yields the expected output.

@ifokkema
Copy link
Collaborator Author

(...) I usually think of it being used as a prefix to indicate the start of a data string (parameters) to be passed to a function on the server.

That's exactly what it is, actually. Any sort of data can be sent over the query string. Most APIs actually have all of their input there. The URL is historically meant for the selection of the target of the data, not for providing input. That target "should" then be physically present on the server. Input "should" be in the query string. E.g., /search?build=hg19&chromosome=X&position=123456. But, nowadays, following developments in webserver software and new frameworks, APIs have sometimes moved from there to make the URLs prettier and to simplify input, e.g., /search/hg19/X:123456 is much shorter and no longer requires any of these strings to actually exist on the server. It breaks all old standards, but those aren't really relevant anymore, and this works well. Well, normally.

Note that a URL ought not to end in a slash (...)

I disagree; as long as the last part of the URL tree is an actual directory, the URL should end in a slash. E.g., this URL MUST end with a slash: https://api.lovd.nl/swagger/

Hence. the string "/?" should not appear in a URL as the "/" is redundant but cannot be removed automatically as it's not at the end of the URL.

It can appear there, and it should be able to appear there. However, when needed, it's super easy to remove it automatically since it is the last character of the path.

Note that traditionally, these two notations mean two different things. file?data should be used for a file that is processing data, and directory/?data should be used to have the index file in that directory pick up the data and process it.

Traditionally, whether a URL ended in a slash or not, with physically present files and folders in the URLs, the file system had only one resource that this query could match, as files and folders can not coexist in the same location with the same name.

However, this is not a traditional URL but a framework-processed URL. As such, it processed the URL to find what resource was meant, incorrectly used the last slash as part of the input, and didn't match the URL format with any programmed endpoints. However, just like https://rest.variantvalidator.org/VariantValidator///////variantvalidator_ensembl/GRCh38/ENST00000305799.1%3Ac.22G%3EA/all?content-type=application%2Fjson works as expected, having a slash or not (or 20, for that matter) at the end of the API URL shouldn't matter at all. The framework should know that the slash is irrelevant. That's also why https://api.lovd.nl/v1/checkHGVS/NM_002225.3%3Ac.157C%3ET and https://api.lovd.nl/v1//////checkHGVS/NM_002225.3%3Ac.157C%3ET////// both work. This is a framework (URL-matching) issue, and should be fixed there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants