diff --git a/.nojekyll b/.nojekyll new file mode 100644 index 0000000..e69de29 diff --git a/404.html b/404.html new file mode 100644 index 0000000..d1a4805 --- /dev/null +++ b/404.html @@ -0,0 +1,633 @@ + + + +
+ + + + + + + + + + + + + + + + + + +Citation
+Beacon v2 and Beacon Networks: a "lingua franca" for federated data discovery in biomedical genomics, and beyond. +Jordi Rambla, Michael Baudis, Tim Beck, Lauren A. Fromont, Arcadi Navarro, Manuel Rueda, Gary Saunders, Babita Singh, J.Dylan Spalding, Juha Tornroos, Claudia Vasallo, Colin D.Veal, Anthony J.Brookes. Human Mutation (2022) DOI.
+Beacon
or beacon
?The uppercase Beacon
is used to label API, framework or protocol and their
+components - while lower case beacons
are instances of these, i.e. individual
+resources using the protocol.
The Beacon Framework describes the overall structure of the API +requests, responses, parameters etc. One can implement e.g. a Boolean beacon (cf. the +original protocol) without any use of the model, just by providing a well-formed +JSON response upon a request very similar to the (pre-)v1 allele request.
+This example is for a minimal SNV-type variant query.
+/beacon/g_variants/?referenceName=refseq:NC_000017.11&start=7577120&referenceBases=G&alternateBases=A
+
In this minimal response to the query above the beacon indicates that its default
+response is Boolean and that it could interpreted it against the genomicVariant
entity and in the context of the same Beacon version.
In principle one could launch a Beacon instance using the example response document as a template
+in whatever server environment one has at hand. However, a proper Beacon v2
+installation also has to provide informational endpoints (/info
, /map
...)
+to allow it's integration through aggregators.
{
+ "meta": {
+ "apiVersion": "v2.0.0",
+ "beaconId": "org.progenetix.beacon",
+ "receivedRequestSummary": {
+ "apiVersion": "v2.0.0",
+ "pagination": {
+ "limit": 2000,
+ "skip": 0
+ },
+ "requestedGranularity": "boolean",
+ "requestedSchemas": [
+ {
+ "entityType": "genomicVariant",
+ "schema": "https://progenetix.org/services/schemas/genomicVariant/"
+ }
+ ],
+ "requestParameters": {
+ "alternateBases": "A",
+ "referenceBases": "G",
+ "referenceName": "refseq:NC_000017.11",
+ "start": [
+ 7577120
+ ]
+ }
+ },
+ "returnedGranularity": "boolean",
+ "returnedSchemas": [
+ {
+ "entityType": "genomicVariant",
+ "schema": "https://progenetix.org/services/schemas/genomicVariant/"
+ }
+ ]
+ },
+ "responseSummary": {
+ "exists": true
+ }
+}
+
An implementation of a Beacon must implement the Global Alliance for Genomics and Health (GA4GH) Beacon standard. The V2 standard has been approved by both the Regulatory and Ethics, and Data Security foundational workstreams.
+The Beacon uses a 3-tiered access model - anonymous, registered, and controlled access:
+Note that a Beacon may contain datasets (or collections of individuals) whose data is only accessible at specified tiers within the Beacon. This tiered access model allows the owner or controller of a Beacon to determine which responses are returned to whom depending on the query and the user who is making the request, for example to ensure the response respects the consent under which the data were collected. The ELIXIR Beacon network supports Beacons which respond at different tiers, for example only Beacons which have a response to anonymous queries need respond to an anonymous request.
+As part of the ELIXIR 2019-21 Beacon Network Implementation Study deliverable D3.3 a document has been written to describe security best practice for users interested in deploying or running a Beacon or users who govern data hosted within a Beacon, and the requirements for adding the Beacon to the ELIXIR Beacon network. As the Beacon standard extends in V2 towards supporting phenotype and range queries, the tiered access model becomes more important to ensure the Beacon response is appropriate to the underlying data.
+Security attributes are part of the Beacon v2 Framework. The file beaconConfiguration.json
defines the schema of the JSON file that includes core aspects of a Beacon instance configuration. Its third section, called securityAttributes, defines the security.
Check out the securityAttributes section on the Beacon Documentation website.
+As a Beacon is designed to support data discoverability of controlled access datasets, it is recommended that synthetic or artificial data is used for testing and initial deployment of Beacon instances. The use of synthetic data for testing is important in that it ensures that the full functionality of a Beacon can be tested and / or demonstrated without risk of exposing data from individuals. In addition to testing or demonstrating a deployment, synthetic data should be used for development, for example adding new features. Additionally, these data can also be used to demonstrate the access levels and data governance procedures for loading data to a Beacon to build trust with data controllers or data access committees who may be considering loading data to a Beacon. An example dataset that contains chromosome specific vcf files is hosted at EGA under dataset accession EGAD00001006673. While this dataset requires a user to log in to get access, the EGA test user can access this dataset.
+Beacon v2.0 does not provide a mechanism to detect what types of genomic variant +queries are supported by a given instance.
+Beacon had been originally designed to handle the "simplest" type of genomic
+variant queries in which a position
, alternateBases
(i.e. one or more base
+sequence of the variant at the position) and - sometimes optional - the reference
+sequence at this position (necessary e.g. for small deletions).
Beacon v1.1 in principle supported "bracketed" queries and a variantType
parameter
+(pointing to the VCF use) - see the current documentation for details. However, the support & interpretation was - and still is (2022-12-13) -
+left to implementers. Similar for Beacon Range Queries.
However, the Beacon documentation
+provides information about use and expected interpretation of variantType
values, specifically
+for copy number variations.
Ages are queried as ISO8601 durations
+such as P65Y
(i.e. 65 years) with a comparator (=
, <=
, >
...). However,
+the value needs an indication of what the duration refers to and resources
+may provide different ways to indicate this (as then shown in their /filtering_terms
)
+endpoint).
We recommend that all Beacon instances that support age queries support at
+minimum the syntax of age:<=P65Y
and map such values to the internal datapoint
+most relevant for the resource's context (in most cases probably corresponding
+to "age at diagniosis").
However, different scenarios may be supported (e.g. EFO_0005056:<=P1Y6M
for
+an "age at death" scenario).
The Beacon framework currently (v2.0 and earlier) considers genomic
+variants to be allelic and does not support the query for multiple alleles
+or "haplotype shorthand expressions" (e.g. C,T
).
Workarounds In case of a specific need for haplotype queries implementers +of a given beacon with control of its data content in principle can extend their +query model to support shorthand haploype expressions, as long as they support +the standard format, too. However, such an approach may be superseeded or in conflict +with future direct protocol support.
+An approach in line with the current protocol would be to query for one allelic
+variant with a record-level genomicVariation
response, and then query the
+retrieved variants individually by their id
in combination with the second
+allele.
As with queries the Beacon "legacy" format does not support haplotype representation +but would represent each allelic variation separately. The same is true for the +VRSified variant representation which for v2.0 corresponds to VRS v1.2. +However, draft versions of the VRS standard (will) address haplotype and genotype +representations and will be adopted by Beacon v2.n after reaching a release state.
+No (...but). Beacon queries as of v2 always assume a logical AND between query parameters
+and individual filters, i.e. all conditions have to be met. There is currently
+no support for Boolean expressions.
+However, a logical exception is the use of multiple filters for the same parameter which
+a Beacon implementation should treat as a logical OR since they otherwise
+would fail in most instances. E.g. the query using NCIT:C3493
and NCIT:C2926
+(mapped against biosample.histological_diagnosis.id
) would match both
+Lung Non-Small Cell Carcinoma (NCIT:C2926) and Lung Squamous Cell Carcinoma
+(NCIT:C3493) which are exclusive diagnoses.
A filter which does not exist should lead to "no match" response. There is no
+dedicated mechanism to disambiguate between a "the filter
+is understood but there is no hit for this particular query" in contrast to
+"no idea what this filter value means". However, the /filtering_terms
endpoint
+should provide all supported filters and this can be used to check the two
+possibilities if needed.
For sparse data (e.g. a value being available only for a subset of samples; think
+about "genetic sex" not being available or disclosed in a subset of individuals)
+normally only the positive matches would be returned. Evaluation if what the
+base of these numbers would be can be achieved through discretionary queries
+(i.e. evaluating the alternative options) or through additional informational
+responses (e.g. adding the overall observation count of a filter value to the
+objects in the /filtering_terms
response or aggregate information to a
+dataset respone).