Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

import AIRR Data Model into AKC LinkML #28

Open
7 tasks done
schristley opened this issue Feb 16, 2024 · 43 comments · May be fixed by airr-knowledge/ak-schema#18
Open
7 tasks done

import AIRR Data Model into AKC LinkML #28

schristley opened this issue Feb 16, 2024 · 43 comments · May be fixed by airr-knowledge/ak-schema#18

Comments

@schristley
Copy link
Contributor

schristley commented Feb 16, 2024

AKC will extend and integrate many of the classes/objects in the AIRR Data Model. LinkML has an importer for JSON schema. We want to automate the import/translation process so that we can run it when the AIRR Data Model changes.

  • Automate import/translation of AIRR Standards into AKC LinkML
  • Define mapping from x-airr properties to LinkML. For example, identifier is related LinkML's identifier
  • Determine what x-airr properties need to be added to AIRR Standards for LinkML. #54
  • enums for array, e.g. keywords_study in the Study do not generate a LinkML enum
  • use yaml library for output
  • read AIRR schema from input file vs using schema from installed airr library
  • generate "AIRR versioned" linkml schema (meaning support for multiple versions of the AIRR Data Model that co-exist in the linkml schema, specifically v1.5 and v2.0)
@schristley
Copy link
Contributor Author

Hi @bcorrie so when I mentioned I'd like you to work with LinkML, I was really thinking about this. We played with that schema-automator tool, and it seemed like if we process the AIRR schema file a little bit (maybe by extracting each object separately), this might be an easy way to get the AIRR stuff into LinkML.

@schristley schristley moved this to Project Year 2 in AIRR Knowledge Mar 26, 2024
@bcorrie
Copy link

bcorrie commented Apr 4, 2024

@schristley where is the schema-automater tool? Is it in one of the git hub repositories? I can't find it.

@schristley
Copy link
Contributor Author

@bcorrie when I was playing around with it, I was just manually installing in the ak-schema docker. If I remember, the pip install doesn't install all dependencies, there was one that was missing.

I haven't put it in the ak-schema docker yet because I'm not sure if it conflicts with the linkml stuff or not

@schristley
Copy link
Contributor Author

@bcorrie here it is, needed to also do pip install appengine-python-standard

@bcorrie
Copy link

bcorrie commented Apr 4, 2024

@schristley the above install downgrades the urllib3 version from urllib3-2.2.1 to urllib3-1.26.18.

poetry.lock states that it requires urllib3 = ">=1.21.1,<3" so this should be fine.

Should we add this to the docker file? I have a patch that adds the following to the end of the Dockerfile after the poetry update:

RUN pip install appengine-python-standard
RUN pip install quantulum3[classifier]

@schristley
Copy link
Contributor Author

schristley commented Apr 22, 2024

@bcorrie Walking through the objects in the AIRR Schema to understand what we can auto-generate into LinkML and what we cannot, here's my assessment of a few things. Can you please review?

  • With the new study design schema, the AIRR Repertoire object and it's sub-objects like Subject, SampleProcessing, etc., will be de-normalized (and re-structured) into the CDM. We won't attempt to auto-generate LinkML, instead just perform mapping as part of the data integration.
  • According to William, Germline and Genotype objects are complete, so we can transform these into LinkML as is.
  • Rearrangements, i.e. receptor chain, can be transformed into LinkML as is.
  • Clone seems to be still undergoing changes to support single-cell.
  • Cell looks mostly complete and can be transformed into LinkML as is?
  • CellExpression is Assay output. CDM still needs to harmonize all of the different assay types.
  • We will hold off on receptor and reactivity until we get further into understanding the integration.
  • MHC Allele and Genotype ???

So based upon this, my thought is to do a quick "bootstrap" conversion of a few of the AIRR objects into LinkML. Germline, Genotype, Rearrangements and (maybe) Cell? That is, we won't worry about automation right now.

However as part #44 , we will need to consider how to manage AIRR schema changes and come up with an automation mechanism.

@bcorrie
Copy link

bcorrie commented Apr 22, 2024

@bcorrie Walking through the objects in the AIRR Schema to understand what we can auto-generate into LinkML and what we cannot, here's my assessment of a few things. Can you please review?

I think my fundamental comment is that a mapping of any AIRR field to a LinkML slot definition is probably pretty straightforward. I think in all cases the relationships between the LinkML classes are what is going to be challenging.

For example, Germline and Genotype from a field perspective might be well defined and complete (you could generate a LinkML slot definition for each field), but the relationships between these classes and other classes in the AKC CDM are likely much less understood (at least I don't understand). Germline is presumably linked to the AKC CDM equivalent of DataProcessing and Genotype is presumable linked to the AKC CDM equivalent of Participant. These are ones that at least in some fashion already exist in the AIRR schema. What other links makes sense for these objects in the broader AKC CDM???

  • With the new study design schema, the AIRR Repertoire object and it's sub-objects like Subject, SampleProcessing, etc., will be de-normalized (and re-structured) into the CDM. We won't attempt to auto-generate LinkML, instead just perform mapping as part of the data integration.

Yes, I think so. Like most of the objects below, the field to slot mapping between AIRR and AKC is pretty straightforward. It is the relationships that are messy. The question around this we were discussing is is there some sort of automated tool that we might be able to use to help with this. The complicated part is going to be mapping the relationships (the de-normalization and re-structure) and I am not aware of anything that would help with this, if you know of anything let me know.

The reason this is hard is that these objects are at the core of the AKC CDM AND the relationships in the AIRR Standard do not map particularly well to the AKC CDM. We can think of things like Genotype and Rearrangement being transformed into LinkML easily because their relationship with other things in the AKC CDM are "simple" as we understand them today.

As we consider complex use cases, I would not be surprised if the requirements for complex relationships blows up. That is why I advocate for keeping the relationships in the AKC CDM as simple and basic as possible, with the anticipation that specific use cases are going to need much more complicated "knowledge graphs" overlaid on top of this simple set of relationships. If we try and capture all relationships for all things in the AKC CDM we will literally go insane 8-)

[Stuff Deleted]

So based upon this, my thought is to do a quick "bootstrap" conversion of a few of the AIRR objects into LinkML. Germline, Genotype, Rearrangements and (maybe) Cell? That is, we won't worry about automation right now.

As I mentioned, I was thinking of using the AIRRMap python class (https://github.com/sfu-ireceptor/dataloading-mongo/blob/master/dataload/airr_map.py) from the iReceptor data loader combined with the AIRR Spec Flatten tool in the iReceptor sandbox (https://github.com/sfu-ireceptor/sandbox/tree/master/airr-spec-flatten) as a first attempt at this.

I think it should be pretty easy to combine these to traverse any AIRR JSON object (e.g. Subject, Genotype, Rearrangement, ...) and generate a bunch of LinkML slots based on the AIRR definition.

I think for me the question is, are we talking about generating LinkML definitions for these objects, or actually generating LinkML compliant data from the equivalent definitions. I think generating LinkML definitions for slots should be pretty simple.

@bcorrie
Copy link

bcorrie commented Apr 22, 2024

  • We will hold off on receptor and reactivity until we get further into understanding the integration.

I would argue that like all of the other AIRR schema objects, converting the fields for Receptor and Reactivity are pretty straightforward. I don't see any reason not to convert these like any other objects.

As I say above, the reason this is "complicated" is that these objects have complex relationships not only with the AIRR Standard but across the other repositories as well (IEDB, iRAD). But it is the relationships we don't understand, I think we understand pretty well the actual field names.

@schristley
Copy link
Contributor Author

I think my fundamental comment is that a mapping of any AIRR field to a LinkML slot definition is probably pretty straightforward. I think in all cases the relationships between the LinkML classes are what is going to be challenging.

Yes, I agree, thanks, I should be more clear. Let's not worry about trying to bring the relationships forward, just the slots and the classes (where class is the same as the AIRR JSON object).

@bcorrie
Copy link

bcorrie commented Apr 22, 2024

  • MHC Allele and Genotype ???

Genotype is essentially done as stated above no?

MHCGenotype I would suggest is "equally done" in the sense that the fields are defined and understood - they mirror Genotype for the most part except they are more simple. There is only a set of alleles MHCAllele (rather than alleles, deleted alleles, undocumented alleles as there are in Genotype).

Again, the relationships between these objects and other objects in the AKC CDM may be less defined, but mapping the fields I would suggest is pretty straightforward.

@schristley
Copy link
Contributor Author

As we consider complex use cases, I would not be surprised if the requirements for complex relationships blows up. That is why I advocate for keeping the relationships in the AKC CDM as simple and basic as possible, with the anticipation that specific use cases are going to need much more complicated "knowledge graphs" overlaid on top of this simple set of relationships. If we try and capture all relationships for all things in the AKC CDM we will literally go insane 8-)

The point is well taken. With a "data model", we have well-defined relationships, often then translated into a static database design/schema, which then enforces how queries are done. A "knowledge model" needs to be more flexible to handle the complex use cases. LinkML is for data models, so we don't want to overload it and try to make it do too much. So in essence I'm agreeing with you, we'll keep the relationships in the CDM to the simple, basic and "obvious" ones.

@schristley
Copy link
Contributor Author

So based upon this, my thought is to do a quick "bootstrap" conversion of a few of the AIRR objects into LinkML. Germline, Genotype, Rearrangements and (maybe) Cell? That is, we won't worry about automation right now.

As I mentioned, I was thinking of using the AIRRMap python class (https://github.com/sfu-ireceptor/dataloading-mongo/blob/master/dataload/airr_map.py) from the iReceptor data loader combined with the AIRR Spec Flatten tool in the iReceptor sandbox (https://github.com/sfu-ireceptor/sandbox/tree/master/airr-spec-flatten) as a first attempt at this.

I understand this as doing the actual data integration, versus the schema. I agree that a mapping approach like this should work very well for us.

@bcorrie
Copy link

bcorrie commented Apr 22, 2024

However as part #44 , we will need to consider how to manage AIRR schema changes and come up with an automation mechanism.

The approach we have taken with the AIRR Config file and the use of AIRR Flatten should go a fairly long way to making this work. We can essentially turn an iReceptor Turnkey repository or an iReceptor Gateway so that it supports different versions of the AIRR Standard just by changing the AIRR Config file.

There are of course always special cases (e.g. string change to Ontology term) which can't be handled by a mapping, but this has gotten us a long way to making schema changes "relatively painless".

@schristley
Copy link
Contributor Author

I think for me the question is, are we talking about generating LinkML definitions for these objects, or actually generating LinkML compliant data from the equivalent definitions. I think generating LinkML definitions for slots should be pretty simple.

Just the definitions, the slots and the classes.

I think it should be pretty easy to combine these to traverse any AIRR JSON object (e.g. Subject, Genotype, Rearrangement, ...) and generate a bunch of LinkML slots based on the AIRR definition.

@bcorrie Great! Can I give you the task to take an initial stab at writing this?

@bcorrie
Copy link

bcorrie commented Apr 22, 2024

I think it should be pretty easy to combine these to traverse any AIRR JSON object (e.g. Subject, Genotype, Rearrangement, ...) and generate a bunch of LinkML slots based on the AIRR definition.

@bcorrie Great! Can I give you the task to take an initial stab at writing this?

Yep, no problem...

@bcorrie
Copy link

bcorrie commented May 3, 2024

Initial version implemented.

Code is here: https://github.com/airr-knowledge/ak-schema/tree/airr-export/src/scripts/airr2akc

Initial exported schemas (a subset, although a pretty decent subset):

https://github.com/airr-knowledge/ak-schema/tree/airr-export/src/ak_schema/schema/airr

I essentially reused https://github.com/sfu-ireceptor/sandbox/tree/master/airr-spec-flatten and mostly just changed the output generation. I disabled recursion as well so it doesn't process Objects within Objects.

It isn't handling arrays correctly (although not sure how we do that in LinkML).

It also needs to have a mapping step so we have control if a field's attributes (e.g. name, type, range) are not the default that would be generated from the AIRR spec. I should be able to reuse the iReceptor data loader's AIRR Map capability to implement that pretty easily.

@bcorrie
Copy link

bcorrie commented May 3, 2024

It looks like we should be able to use this to generate the Enums for fields as well, you will notice in the export I have created LinkML fields that capture either the AIRR ontology root node (for Ontologies) and the enum values for controlled vocabulary fields.

See the Ontology and Enum fields in the subject export: https://github.com/airr-knowledge/ak-schema/blob/airr-export/src/ak_schema/schema/airr/ak_airr_subject.yaml

@bcorrie
Copy link

bcorrie commented May 6, 2024

I have changed the code so you can ask it to generate either the LinkML Slots or LinkML enums for the AIRR Schema Object of choice. For Ontology terms it just outputs the expected root node of the enum, we would still need a way to generate all of the children node for that enum.

For example, for Subject it generates this:

Species:
  name: Species
  permissible_values:
    Gnathostomata:
      text: Gnathostomata
      meaning: NCBITAXON:7776
Sex:
  name: Sex
  permissible_values:
    male:
      text: male
    female:
      text: female
    pooled:
      text: pooled
    hermaphrodite:
      text: hermaphrodite
    intersex:
      text: intersex
    null:
      text: null
AgeUnit:
  name: AgeUnit
  permissible_values:
    time unit:
      text: time unit
      meaning: UO:0000003

@bcorrie
Copy link

bcorrie commented May 6, 2024

If you ask for LinkML slots, it generates this - referring to the correct Enums above in the range attribute.

subject_id:
  name: subject_id
  description: Subject ID assigned by submitter, unique within study. If possible, a persistent subject ID linked to an INSDC or similar repository study should be used.
  range: string
synthetic:
  name: synthetic
  description: TRUE for libraries in which the diversity has been synthetically generated (e.g. phage display)
  range: boolean
species:
  name: species
  description: Binomial designation of subject's species
  range: Species
sex:
  name: sex
  description: Biological sex of subject
  range: Sex
age_min:
  name: age_min
  description: Specific age or lower boundary of age range.
  range: number
age_max:
  name: age_max
  description: Upper boundary of age range or equal to age_min for specific age. This field should only be null if age_min is null.
  range: number
age_unit:
  name: age_unit
  description: Unit of age range
  range: AgeUnit
age_event:
  name: age_event
  description: Event in the study schedule to which `Age` refers. For NCBI BioSample this MUST be `sampling`. For other implementations submitters need to be aware that there is currently no mechanism to encode to potential delta between `Age event` and `Sample collection time`, hence the chosen events should be in temporal proximity.
  range: string

[Rest of the slots deleted]

@bcorrie
Copy link

bcorrie commented May 6, 2024

Files generated for most (all?) AIRR schema objects of importance to AKC here:

https://github.com/airr-knowledge/ak-schema/tree/airr-export/src/ak_schema/schema/airr

Note some enum files are empty because there are no enums/ontologies in that particular class.

@bcorrie
Copy link

bcorrie commented Jul 22, 2024

@schristley I think the following x-airr attributes are relevant:

  • identifier: maps to the LinkML identifier concept
  • nullable: whether a field can be nullable or not, not sure if LinkML has this
  • required: whether a field is required or not
  • format: some how this should map to objects in the LinkML definition (the enum list or something)

Is there anything else that we need to worry about. My conversion tool now takes these into accounts and generates these as part of the AIRR field LinkML specification.

@bcorrie
Copy link

bcorrie commented Jul 22, 2024

I have created a separate issue for figuring out what, if anything from our LinkML experience should be moved back into the AIRR Standard in #54

@bcorrie
Copy link

bcorrie commented Jul 22, 2024

I think we can mark this as Done?

@bcorrie
Copy link

bcorrie commented Aug 9, 2024

@schristley any objections to closing this issue?

@schristley
Copy link
Contributor Author

@schristley any objections to closing this issue?

Lonneke is going to continue work on this.

@bcorrie
Copy link

bcorrie commented Oct 22, 2024

@LonnekeScheffer I am doing a refactor/cleanup of the code. I reused code from an iReceptor tool we had, and some of the old, no longer used code had not yet been deleted. I should have this done by the end of the week...

@bcorrie
Copy link

bcorrie commented Oct 22, 2024

@LonnekeScheffer clean up done on airr-export branch (https://github.com/airr-knowledge/ak-schema/tree/airr-export)

@LonnekeScheffer
Copy link

@bcorrie would it make sense for me to start working on (a subbranch of) your airr-export branch in that case?

@LonnekeScheffer
Copy link

Hi both,

I've been playing around a bit at the airr2akc.py script, for now branching off of Brian's airr-export branch.

Just wanted to check with you guys: I see that the generated files under "src/ak_schema/schema/airr" that are stored on this (and master) branch are not the same as the files generated by airr2akc.py when running Makefile.AIRR; I have for now presumed the files in the folder are an 'old format' whereas the script generates the 'new format'.

I also seem to notice few bugs in the script output:

  • For enums, the indentation is incorrect. See how 'Sex' is not indented at the right level, placing it on the same level as 'enums'
  • Furthermore, 'permissible values' have only keys but no values, making this an invalid YAML file, which will probably cause issues later on
id: https://github.com/airr-knowledge/ak-schema
name: ak-schema

enums:
  Species:
    name: Species
    reachable_from:
      source_nodes:
        - NCBITAXON:7776
      include_self: true
      relationship_types:
        - rdfs:subClassOf
Sex:
  name: Sex
  permissible_values:
    male:
    female:
    pooled:
    hermaphrodite:
    intersex:
    null:
  AgeUnit:
    name: AgeUnit
    reachable_from:
      source_nodes:
        - UO:0000003
      include_self: true
      relationship_types:
        - rdfs:subClassOf

I understand the first bug must have been introduced at the moment it was decided to add the new 'enums' base level. While it can be fixed easily, I do think the current code with its explicit hardcoded print statements with fixed spaces inside highly nested loops/if statements is prone to such errors in the future.

In immuneML we used yaml input/output a lot. You can make nested dictionaries/lists, and export them directly using yaml.dump(). No need to keep track of the number of spaces, only of how to nest the dictionaries and lists. This furthermore ensures the output file is valid YAML, which I think will greatly improve the readability and maintainability of the code. If it's ok with you, I can rewrite (or make a second version of) this script according to these suggestions. I think it'll also help me generally with 'getting into' the project and familiarizing myself with this code.

@schristley
Copy link
Contributor Author

In immuneML we used yaml input/output a lot. You can make nested dictionaries/lists, and export them directly using yaml.dump(). No need to keep track of the number of spaces, only of how to nest the dictionaries and lists. This furthermore ensures the output file is valid YAML, which I think will greatly improve the readability and maintainability of the code. If it's ok with you, I can rewrite (or make a second version of) this script according to these suggestions. I think it'll also help me generally with 'getting into' the project and familiarizing myself with this code.

I agree, that sounds like a good place to start.

@schristley
Copy link
Contributor Author

I've been playing around a bit at the airr2akc.py script, for now branching off of Brian's airr-export branch.

@bcorrie Are you using this branch for your Repertoire conversion work?

@LonnekeScheffer that's fine for now, though I had already merged this branch to main, so I'd prefer if you branch off of main. Shall I merge Brian's recent changes and then you can switch over?

  • Furthermore, 'permissible values' have only keys but no values, making this an invalid YAML file, which will probably cause issues later on

Yeah, that seemed odd to me when I saw linkml yaml with that.

@LonnekeScheffer
Copy link

Would be great if you could merge it into main then, Scott! I just want to make sure I'm working with the latest version of airr2akc.py before I rewrite it, and that I'm not getting in @bcorrie's way.

@bcorrie
Copy link

bcorrie commented Oct 24, 2024

@LonnekeScheffer a couple of comments:

  1. This code was a bit of an experimental hack to see how hard this would be - so don't be surprised if there are glitches (like the invalid yaml for the permissible values) - feel free to fix anything that seems broken, the code has not really been tested in terms of confirming it actually generates valid YAML files... 8-)
  2. The yaml output was custom generated as you say. I agree completely, using something like yaml.dump() would make way more sense than what is being done now. This is what I am doing in the akc_convert code but I hadn't gotten there with this code yet.
  3. I believe you are correct, the YAML spec generation (this code) is ahead of the actual generated files in "src/ak_schema/schema/airr". The schema files need to be re-generated once the code further developed...

@bcorrie
Copy link

bcorrie commented Oct 24, 2024

I just want to make sure I'm working with the latest version of airr2akc.py before I rewrite it, and that I'm not getting in @bcorrie's way.

No problem. Once Scott merges to master, you can go ahead and create a branch and the code is all yours. I am going to be working on akc_convert, so we shouldn't conflict, but I will continue to do that on the airr_export branch.

@schristley
Copy link
Contributor Author

@LonnekeScheffer ok, merged, you should be good to go! I've added some TODO items in the first comment so you can think more strategically with planning code changes.

@LonnekeScheffer
Copy link

Thanks Scott! Finished the "writing output with yaml library" part. Where can I find AIRR schema input files?

@schristley
Copy link
Contributor Author

Thanks Scott! Finished the "writing output with yaml library" part. Where can I find AIRR schema input files?

the airr standards repository, here is the v1.5 schema

@bcorrie
Copy link

bcorrie commented Oct 26, 2024

  • read AIRR schema from input file vs using schema from installed airr library

FYI, the use of the AIRR Schema from the installed library was intentional in that it is simple to change generation by running a different docker container. Presumably when dealing with AIRR schema, there are benefits to using the AIRR Library. I have no strong objections to removing the tight coupling of the schema that is generated being tied to AIRR python version (basically provide the schema file as input), but there may be some benefits that are lost in doing so (not that I can think of any huge ones off of the top of my head).

@schristley
Copy link
Contributor Author

  • read AIRR schema from input file vs using schema from installed airr library

FYI, the use of the AIRR Schema from the installed library was intentional in that it is simple to change generation by running a different docker container.

Yes, I understand, but I don't really want to deal with multiple docker images, and this will explode out as we deal with schema changes/versioning from the other repositories (if I need an image for each version combination, plus the complexity to run them all then merge the results into a single linkml schema). So I prefer that we are able to generate everything from a single docker. We will use git submodule to bring in both the 1.5 and 2.0 versions.

@LonnekeScheffer are you familiar with git submodule, or worked with a repository that used them? There's an extra step after doing git clone and a couple of tricks to remember, but it will give us finer control of the airr-standards version.

@bcorrie
Copy link

bcorrie commented Oct 28, 2024

So I prefer that we are able to generate everything from a single docker. We will use git submodule to bring in both the 1.5 and 2.0 versions.

No worries, just wanted to point out the rationale. On the iReceptor side, we only have one standard to deal with, so this isn't a big issue. We have one container per AIRR Standard release...

@LonnekeScheffer
Copy link

I haven't used git submodule before, but I'll read up on it!

so having an "AIRR versioned" LinkML schema, would this mean: one can specify different airr .yaml input files (for v1.5 or v2.0), and but it will always generate the same LinkML output yaml? Or would there be anything different about these LinkML yawls of different AIRR versions?

@schristley
Copy link
Contributor Author

I haven't used git submodule before, but I'll read up on it!

Ok, that's good. I'll do the initial setup then...

so having an "AIRR versioned" LinkML schema, would this mean: one can specify different airr .yaml input files (for v1.5 or v2.0), and but it will always generate the same LinkML output yaml? Or would there be anything different about these LinkML yawls of different AIRR versions?

The LinkML output would be different, to reflect the differences in the AIRR schema versions. By "AIRR versioned" LinkML schema, I mean the ability for (say) the v1.5 Repertoire and the v2.0 Repertoire definitions to co-exist together in the schema. LinkML has been discussing namespace support, which would be perfect use case for this, I think.

With both versions available, we could perform various inferences and algorithms based upon the changes/differences. This goes towards Aim 2.6 of the grant.

Though I'm saying "AIRR schema", technically the AIRR schema is written in OpenAPI3, which is a superset of JSON schema. So even though we are currently using it for AIRR, we will also use it for the repositories that provide an OpenAPI3 service, which currently is all repositories except IEDB.

@LonnekeScheffer
Copy link

LonnekeScheffer commented Nov 12, 2024

Deleted my previous comment because most of it was raising an issue I resolved in the meantime..

Current status of airr2akc conversion:

  • The updated script is available on https://github.com/airr-knowledge/ak-schema/tree/airr2akc_refactor
  • For now, the script aims to always produce a valid LinkML output. If the input yaml file contains missing or contradicting info, the issue is reported, and some reasonable decision is made. This should help debug the schema yamls
  • 'bugs' in the airr input yaml files have been reported: bugs in airr-schema.yaml / airr-schema-openapi3.yaml airr-community/airr-standards#813
  • A bit ugly, but functional: the AIRR version is concatenated to the classes as a prefix -> allowing multiple versions of airr to exist
  • The class is concatenated to the slot name -> this was necessary to allow the same slot to be used across multiple classes when things like 'description', 'identifier', 'required' and 'nullable' differ across classes
    -> the airr2akc script automatically reports if any 'unexpected'/'buggy' input is provided, but in principle should still produce valid LinkML output (occasionally with some missing information)
  • Field 'identifier' (True/False) is not used in LinkML for now, as its interpretation is different in AIRR schema in the following ways: in LinkML: 'identifier'=T fields must always be required, while in AIRR they are not. And in LinkML, only one slot per class may have identifier=T, the rest must have =F.

@LonnekeScheffer LonnekeScheffer linked a pull request Nov 12, 2024 that will close this issue
7 tasks
@schristley schristley linked a pull request Nov 24, 2024 that will close this issue
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Project Year 2
Development

Successfully merging a pull request may close this issue.

3 participants