-
Notifications
You must be signed in to change notification settings - Fork 6
resultSet wrapper object for all responses #68
Comments
@Tom-Shorter @mbaudis @sdelatorrep as mentioned in the mail thread, I would like to see an example of the above suggestion. |
I understand this - we're doing something similar for cases where the objects are in a defined scope (e.g. retrieving dataset statistics, where the identifier then is the |
@jrambla @mbaudis @Tom-Shorter @sdelatorrep So for us something like below would be ok to give us the splits of information per dataset whilst retaining a single section for results, however I'm still not sure whether dividing results into resultSets would better, what do you think @Tom-Shorter ? "response": {
"exists": true,
"error": {
"errorCode": "same as HTTP status code",
"errorMessage": "string"
},
"numTotalResults": 100,
"results": [
{
"response": {
"exists": true,
"error": {
"errorCode": "same as HTTP status code",
"errorMessage": "string"
},
"numTotalResults": 0,
"datasets":[
{
"datasetA": {
"exists":true,
"count":50,
"link":"",
"contact":"",
"etc":""
}
},{
"datasetB": {
"exists":false,
"count":0,
"link":"",
"contact":"",
"etc":""
}
}
],
"results": [
{}] |
Yes, this is one of the options which IMO work; having the per- Notes:
Still, the tagging of the individual response elements for their parent |
I was going to say that this depends entirely on how the unique ID for a result is created but if the unique ID is created from a concatenation of a dataset ID and the result ID then if querying a cohort endpoint the unique ID would then not suffice as it would not contain the cohort ID so tagging would still be needed. I think many users wont use beacon as a way of obtaining data, at least not through the results array, but the data handover may be more popular due to the data being in a standard format and not a beacon result format which will require more work to parse and may or may not contain the information you actually want. Instead users might use beacon as a way to locate where they can find suitable data, using the the results as an indicator of what data is actually available and then use the data sources own interface to query the data in a more customised way. No matter how well the beacon spec is designed a beacon query will not out perform a custom interface for the querying of data and the beacon results response will unlikely be as detailed or as easy to parse as one of the many standards already out there The below is what I envisioned when I first started talking about
This approach allows you to pull out results from any dataset/cohort you want without having to loop through each result and find out which dataset it comes from by looking at a tag. As mentioned in my original comment this could be slightly modified to have an object of resultSets identified by their ID, this allows for more precise data look ups as instead of having to loop through a list of
Another consideration is the aggregation of common data, such as is seen in g_variants. It does make sense to only display this information once but if I query two datasets, one 0 based and the other 1 based then the position of the same variant will differ and the common data will not be common. In the |
Ahem:
This case should not be supported. |
I agree, it shouldn't be supported in an ideal world but the beacon owner might not have control over the data in the beacon so they can't standardise it. Does the beacon owner then return all results using the same 0 or 1 base (I forget which is the beacon standard) knowing that the positions provided for variants would be misleading if the user was to use the data sources own query interface or do they return results that remain true to the data, eventhough this means possibly making more than one |
So in summary for me any of these three would be fine, with a preference for the latter two:
|
@colinveal @Tom-Shorter @jrambla Can live with any of those, too, with |
^ ditto |
Lovely discussion!
Let’s refine it, or not ;-) , in the meeting.
Jordi
From: Tom Shorter ***@***.***
Sent: viernes, 14 de mayo de 2021 10:43
To: ga4gh-beacon/specification-v2 ***@***.***>
Cc: Jordi Rambla ***@***.***>; Mention ***@***.***>
Subject: Re: [ga4gh-beacon/specification-v2] resultSet wrapper object for all responses (#68)
^ ditto
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub<#68 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AB5SEOQ252EVT7426KARQQDTNTPBFANCNFSM44KFH5YA>.
|
Hi @Tom-Shorter! Could you create a PR introducing this change? Many thanks! |
hi @sdelatorrep, I made a pull request last week : #72 I probably haven't done everything correctly if you haven't been notified of it though so let me know. |
Ah, sorry, I didn't see it. I'll review it asap. Thanks! |
But @Tom-Shorter , this PR is only about filters, isn't it? I don't see anything about |
Oh sorry @sdelatorrep , I'm an idiot.I saw pull request and immediately thought of the filters scout group pull request I made. In the beacon call it was agreed that it would be best for someone from your group to add the resultsets in to the spec, I'm familiar with the g_variants endpoint responses but I haven't had to implement the other endpoints yet so I thought it best that someone who was familiar with the endpoints should update the spec. We don't mind which properties are added to the |
AFAIK, the winner was option 2 (see all here): response:
exists: true
numTotalResults: 100
resultSets:
- id: "datasetA"
type: "dataset"
exists: true
resultsCount: 116'232
dataHandover:
results: []
...
- id: datasetB
... I'll add this to the spec. |
Hi @Tom-Shorter! I've just created the PR implementing this. Would you mind taking a look at it? I couldn't add you as a reviewer, but I've sent you an invitation to the repo, if you want to join :) Thanks! |
@Tom-Shorter @sdelatorrep I've just approved, but with a comment, especially regarding additional parameters for
|
This is solved. |
There are a few problems with the current implementation of results, for
g_variants
there is a problem with pagination #26, forbiosamples
there is a problem with aggregation/dataset identification #67. There are a few other problems but instead of going over them in detail I would like to propose a newresultSet
wrapper class and allresponse.results
are always a list ofresultSet
objects.@mbaudis has also mentioned this concept in #26, #67, #63 (and possibly elsewhere that I am missing) and his description of the concept from #26 (quoted below) is almost exactly what I was envisioning.
The only change I would make is that results would not be an array of
resultSet
objects but instead an object containingresultSet
objects where eachresultSet
object is identified by a unique key. This change is purely for making life easier for UI implementers as eachresultSet
can now be identified without having to look at a property of theresultSet
. The unique key will likely be the datasetID or other such ID. I dont think this is possible to describe within openAPI however so I am more than happy to have an array ofresultSets
for simplicity as it makes little difference to the final outcomeA
resultSet
object can be formed from any responseObject, such as agenomicVariantResponse
or acohortResponse
as well as any additional response objects which will be added in the future.The below is an idea of what properties a
resultSet
object might have:resultSets
is chosen over an object ofresultSets
, each given a unique keyresultSet
, such as a datasetIDresultSet
object/ identifier property or from the request that was made to the beacon but I do not want to assume that is always the caseresultSet
which details the applied selection parameters would likely be usefulresponse.results
The inclusion of this wrapper will also be of benefit when incorporating AAI into beacon v2, @jrambla has proposed a 4 stage access model as described below:
The current response model does not allow for the above as there is no field for dataset exists/count other than within the
datasetAlleleResponse
object as far as I am aware. Adding in theresultSet
wrapper will provide a logical place for these fields to be inserted and stage 2 - "dataset exists/count" will become "resultSet
exists/count". If the user is able to access stages 3 and 4 of data then theresultSet
object just needs to be expanded with the results property. EachresultSet
object can also provide different information for stages 3 and 4 when compared to otherresultSet
objects of the same type as these can be tailored to each dataset etc.This wrapper also solves a possible implementation problem with the aggregation of data from different datasets which should not be aggregated, such as individuals and biosamples. A beacon serving 4 datasets submitted from the same research group could come across a problem where the groups standard naming convention for individuals is "ind{X}" where x is an auto-incrementing integer, therefore there would be four version of "ind1" within the beacon, if the beacon applies aggregation to these individuals then a single result could be returned that is present within the 4 datasets but upon further inspection you realise that "ind1" is actually a different individual within each of the datasets and as such shouldn't have been aggregated. This issue should not arise with variants as there exist standardised naming conventions for variants, such as rsid's and HGVS.
This situation is also possible without aggregation due to the lack of a datasetID within
/biosample
responses, if a request was sent to /individuals/ind1/biosamples then all responses would look like they come from the same individual whereas they actually come from 4 individuals.A lot of thought will need to be put into how this would actually be implemented and the properties to be included in the new object.
The text was updated successfully, but these errors were encountered: