Skip to content

AllenInstitute/AllenInstituteTaxonomy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

66 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Allen Institute Taxonomy (AIT)

To distribute Allen Institute Taxonomies (AIT) we define an anndata .h5ad file which encapsulates the essential components of a taxonomy required for downstream analysis such as cell type mapping with a formalized schema.

Overview

One major challenge in creating a cell type taxonomy schema is in definition of terms such as "taxonomy", "dataset", "annotation", "metadata", and "data". It is becoming increasingly important to separate out the data from the other components, and compartmentalize all components to avoid the need to download, open, or upload huge and unweildy files.

Taxonomy_overview

That said, it is still important for many use cases to have an option of including all of the information listed above in a single h5ad file for use with CELLxGENE, scrattch.mapping, analysis tools, and for ease of sharing in a single file format.

Related efforts

Several competing schema have been created for packaging of taxonomies, data sets, and associated metadata and annotations. This document aims to align three such schema and propose a way of integrating them into the Allen Institute Taxonomies (AIT) .h5ad file format presented as part of this GitHub repository. The three standards are:

  1. AIT (described herein)
  2. Cell Annotation Schema (CAS): this schema is becoming more widely-used in the cell typing field as a whole because it is largely compatible with the CZ CELLxGENE schema. It is also compabible with Cell Annotation Platform (CAP) and with Taxonomy Development Tools (TDT). CAS has both a general schema and a BICAN-associated schema, both of which are considered herein. CAS can be embedded in the header (uns) of an AIT/Scraatch.taxonomy file, where it functions as a store of extended information about an annotation, including ontology term mappings, evidence for annotation (from annotation transfer and marker expression).
  3. Brain Knowledge Platform (BKP): this schema isn't publicly laid out anywhere that I can find, but this is the data model used for Jupyter Notebooks associated with the Allen Brain Cell (ABC) Atlas. More generally, any data sets to be included in ABC Atlas, MapMyCells, or other related BKP resources will eventually need to conform to this format.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published