diff --git a/.github/workflows/fmudataio-documention.yml b/.github/workflows/fmudataio-documention.yml index 96044d97c..cc1ab92bf 100644 --- a/.github/workflows/fmudataio-documention.yml +++ b/.github/workflows/fmudataio-documention.yml @@ -17,16 +17,18 @@ jobs: - name: Set up Python uses: actions/setup-python@v5 with: - python-version: "3.10" + python-version: "3.11" - - name: Install and build docs + - name: Install fmu-dataio run: | - pip install pip -U && pip install wheel -U + pip install -U pip pip install .[docs] - pip install xtgeo - pip install git+https://github.com/equinor/fmu-config - sh examples/run_examples.sh - sphinx-build -b html docs build/docs/html + + - name: Generate examples + run: sh examples/run_examples.sh + + - name: Build documentation + run: sphinx-build -b html docs build/docs/html - name: Update GitHub pages if: github.repository_owner == 'equinor' && github.ref == 'refs/heads/main' diff --git a/.gitignore b/.gitignore index eacbc4c46..a4d2673c2 100644 --- a/.gitignore +++ b/.gitignore @@ -50,7 +50,8 @@ coverage.xml # Sphinx documentation docs/_build/ -docs/apiref/ +docs/src/apiref/ +docs/src/datamodel/model # PyBuilder target/ diff --git a/.readthedocs.yml b/.readthedocs.yml index 16896c970..858eb9df9 100644 --- a/.readthedocs.yml +++ b/.readthedocs.yml @@ -3,7 +3,7 @@ version: 2 build: os: "ubuntu-22.04" tools: - python: "3.10" + python: "3.11" jobs: post_install: - bash examples/run_examples.sh diff --git a/docs/contributing.rst b/docs/contributing.rst deleted file mode 100644 index 6889b7c46..000000000 --- a/docs/contributing.rst +++ /dev/null @@ -1,2 +0,0 @@ -.. include:: ../CONTRIBUTING.md - :parser: myst_parser.sphinx_ diff --git a/docs/datamodel.rst b/docs/datamodel.rst deleted file mode 100644 index bce872622..000000000 --- a/docs/datamodel.rst +++ /dev/null @@ -1,707 +0,0 @@ -The FMU results data model -########################## - -This section describes the data model used for FMU results when exporting with -fmu-dataio. For the time being, the data model is hosted as part of fmu-datio. - -The data model described herein is new and shiny, and experimental in many aspects. -Any feedback on this is greatly appreciated. The most effective feedback is to apply -the data model, then use the resulting metadata. - -The FMU data model is described using a `JSON Schema `__ which -contains rules and definitions for all attributes in the data model. This means, in -practice, that outgoing metadata from FMU needs to comply with the schema. If data is -uploaded to e.g. Sumo, validation will be done on the incoming data to ensure consistency. - - -About the data model -==================== - -Why is it made? ---------------- - -FMU is a mighty system developed by and for the subsurface community in Equinor, to -make reservoir modeling more efficient, less error-prone and more repeatable with higher quality, -mainly through automation of cross-disciplinary workflows. It combines off-the-shelf software -with in-house components such as the ERT orchestrator. - -FMU is defined more and more by the data it produces, and direct and indirect dependencies on -output from FMU is increasing. When FMU results started to be regularly transferred to cloud -storage for direct consumption from 2017/2018 and outwards, the need for stable metadata on -outgoing data became immiment. Local development on Johan Sverdrup was initiated to cater -for the digital ecosystem evolving in and around that particular project, and the need for -generalizing became apparent with the development of Sumo, Webviz and other initiatives. - -The purpose of the data model is to cater for the existing dependencies, as well as enable -more direct usage of FMU results in different contexts. The secondary objective of this -data model is to create a normalization layer between the components that create data -and the components that use those data. The data model is designed to also be adapted -to other sources of data than FMU. - -Scope of this data model ------------------------- -This data model covers data produced by FMU workflows. This includes data generated by -direct runs of model templates, data produced by pre-processing workflows, data produced -in individual realizations or hooked workflows, and data produced by post-processing workflows. - -.. note:: - An example of a pre-processing workflow is a set of jobs modifying selected input data - for later use in the FMU workflows and/or for comparison with other results in a QC context. - -.. note:: - An example of a post-processing workflow is a script that aggregates results across many - realizations and/or iterations of an FMU case. - -This data model covers data that, in the FMU context, can be linked to a specific case. - -Note that e.g. ERT and other components will, and should, have their own data models to -cater for their needs. It is not the intention of this data model to cover all aspects -of data in the FMU context. The scope is primarily data going *out* of FMU to be used elsewhere. - - -A denormalized data model -------------------------- - -The data model used for FMU results is a denormalized data model, at least to a certain -point. This means that the static data will be repeated many times. Example: Each exported data object contains -basic information about the FMU case it belongs to, such as a unique ID for this case, -its name, the user that made it, which model template was used, etc. This information -if stored in *every* exported .yml file. This may seem counterintuitive, and differs -from a relational database (where this information would typically be stored once, and -referred to when needed). - -There are a few reasons for choosing a denormalized data model: - -First, the components for creating a relational database containing these data is not and would -be extremely difficult to implement fast. Also, the nature of data in an FMU context is very distributed, -with lots of files spread across many files and folders (currently). - -Second, a denormalized data model enables us to utilize search engine technologies for -for indexing. This is not efficient for a normalized data model. The penalty for -duplicating metadata across many individual files is returned in speed and ease-of-use. - -.. note:: - The data model is only denormalized *to a certain point*. Most likely, it is better - described as a hybrid. Example: The concept of a *case* is used in FMU context. In the - outgoing metadata for FMU results, some information about the current case is included. - However, *details* about the case is out of scope. For this, a consumer would have to - refer to the owner of the *case* definition. In FMU contexts, this will be the workflow - manager (ERT). - - -Standardized vs anarchy ------------------------ - -Creating a data model for FMU results brings with it some standard. In essence, this -represents the next evolution of the existing FMU standard. We haven't called it "FMU standard 2.0" -because although this would ressonate with many people, many would find it revolting. But, -sure, if you are so inclined you are allowed to think of it this way. The FMU standard 1.0 -is centric around folder structure and file names - a pre-requisite for standardizing for -the good old days when files where files, folders were folders, and data could be consumed -by double-clicking. Or, by traversing the mounted file system. - -With the transition to a cloud-native state comes numerous opportunities - but also great -responsibilities. Some of them are visible in the data model, and the data model is in itself -a testament to the most important of them: We need to get our data straight. - -There are many challenges. Aligning with everyone and everything is one. We probably don't -succeed with that in the first iteration(s). Materializing metadata effectively, and without -hassle, during FMU runs (meaning that *everything* must be *fully automated* is another. This -is what fmu-dataio solves. But, finding the balance between *retaining flexibility* and -*enforcing a standard* is perhaps the most tricky of all. - -This data model has been designed with the great flexibility of FMU in mind. If you are -a geologist on an asset using FMU for something important, you need to be able to export -any data from *your* workflow and *use that data* without having to wait for someone else -to rebuild something. For FMU, one glove certainly does not fit all, and this has been -taken into account. While the data model and the associated validation will set some requirements -that you need to follow, you are still free to do more or less what you want. - -We do, however, STRONGLY ENCOURAGE you to not invent too many private wheels. The risk -is that your data cannot be used by others. - -The materialized metadata has a nested structure which can be represented by Python -dictionaries, yaml or json formats. The root level only contains key attributes, where -most are nested sub-dictionaries. - - -Relations to other data models ------------------------------- - -The data model for FMU results is designed with generalization in mind. While in practice -this data model cover data produced by, or in direct relations to, an FMU workflow - in -*theory* it relates more to *subsurface predictive modeling* generally, than FMU specifically. - -In Equinor, FMU is the primary system for creating, maintaining and using 3D predictive -numerical models for the subsurface. Therefore, FMU is the main use case for this data model. - -There are plenty of other data models in play in the complex world of subsurface predictive modeling. -Each software applies its own data model, and in FMU this encompasses multiple different systems. - -Similarly, there are other data models in the larger scope where FMU workflows represent -one out of many providors/consumers of data. A significant motivation for defining this -data model is to ensure consistency towards other systems and enable stable conditions for integration. - -fmu-dataio has three important roles in this context: - -* Be a translating layer between individual softwares' data models and the FMU results data model. -* Enable fully-automated materialization of metadata during FMU runs (hundreds of thousands of files being made) -* Abstract the FMU results data model through Python methods and functions, allowing them to be embedded into other systems - helping maintain a centralized definition of this data model. - - -The parent/child principle --------------------------- - -In the FMU results data model, the traditional hierarchy of an FMU setup is not continued. -An individual file produced by an FMU workflow and exported to disk can be seen in -relations to a hiearchy looking something like this: case > iteration > realization > file - -Many reading this will instinctively disagree with this definition, and significant confusion -arises from trying to have meaningful discussions around this. There is no -unified definition of this hierarchy (despite many *claiming to have* such a definition). - -In the FMU results data model, this hiearchy is flattened down to two levels: -The Parent (*case*) and children to that parent (*files*). From this, it follows that the -most fundamental definition in this context is a *case*. To a large degree, this definition -belongs to the ERT workflow manager in the FMU context. For now, however, the case definitions -are extracted by-proxy from the file structure and from arguments passed to fmu-dataio. - -Significant confusion can *also* arise from discussing the definition of a case, and the -validity of this hiearchy, of course. But consensus (albeit probably local minima) is that -this serves the needs. - -Each file produced *in relations to* an FMU case (meaning *before*, *during* or *after*) is tagged -with information about the case - signalling that *this entity* belongs to *this case*. It is not -the intention of the FMU results data model to maintain *all* information about a case, and -in the future it is expected that ERT will serve case information beyond the basics. - -.. note:: - - **Dot-annotation** - we like it and use it. This is what it means: - - The metadata structure is a dictionary-like structure, e.g. - - .. code-block:: json - - { - "myfirstkey": { - "mykey": "myvalue", - "anotherkey": "anothervalue" - } - } - - Annotating tracks along a dictionary can be tricky. With dot-annotation, we can refer to ```mykey``` in the example above as ``myfirstkey.mykey``. This will be a pointer to ``myvalue`` in this case. You will see dot annotation in the explanations of the various metadata blocks below: Now you know what it means! - -Weaknesses ----------- - -**uniqueness** -The data model currently has challenges wrt ensuring uniqueness. Uniqueness is a challenge -in this context, as a centralized data model cannot (and should not!) dictate in detail nor -define in detail which data an FMU user should be able to export from local workflows. - -**understanding validation errors** -When validating against the current schema, understanding the reasons for non-validation -can be tricky. The root cause of this is the use of conditional logic in the schemas - -a functionality JSON Schema is not designed for. See `Logical rules `__. - - -Logical rules -------------- - -The schema contains some logical rules which are applied during validation. These are -rules of type "if this, then that". They are, however, not explicitly written (nor readable) -as such directly. This type of logic is implemented in the schema by explicitly generating -subschemas that A) are only valid for specific conditions, and B) contain requirements for -that specific situation. In this manner, one can assure that if a specific condition is -met, the associated requirements for that condition is used. - -Example: - - .. code-block:: json - - "oneOf": [ - { - "$comment": "Conditional schema A - 'if class == case make myproperty required'", - "required": [ - "myproperty" - ], - "properties": { - "class": { - "enum": ["case"] - }, - "myproperty": { - "type": "string", - "example": "sometext" - } - } - }, - { - "$comment": "Conditional schema B - 'if class != case do NOT make myproperty required'", - "properties": { - "myproperty": { - "type": "string", - "example": "sometext" - }, - } - ] - - -For metadata describing a ``case``, requirements are different compared to metadata describing data objects. - -For selected contents, a content-specific block under **data** is required. This is implemented for -"fluid_contact", "field_outline" and "seismic". - - -The metadata structure -====================== - -Full schema ------------ - -.. toggle:: - - .. literalinclude:: ../schema/definitions/0.8.0/schema/fmu_results.json - :language: js - -For the average user, there is no need to deep-dive into the schema itself. The purpose -of fmu-dataio is to broker between the different other data models used in FMU, and the -definitions of FMU results. E.g. RMS has its data model, Eclipse has its data model, ERT -has its data model, and so on. - -What you need to know is that for every data object exported out of FMU with the intention -of using in other contexts a metadata instance pertaining to this definition will also be -created. - -Outgoing metadata for an individual data object (file) in the FMU context will contain -the relevant root attributes and blocks described further down this document. Not all -data objects will contain all attributes and blocks - this depends on the data type, the -context it is exported in and the data available. - -Example: Data produced by pre- or post-processes will contain information about the ``case`` but -not about ``realization`` implicitly meaning that they belong to a specific -case but not any specific realizations. - -.. note:: - - The ``case`` object is a bit special: It represents the parent object, and records - information about the case only. It follows the same patterns as for individual data objects - but will not contain the ``data`` block which is mandatory for data objects. - - -Root attributes ---------------- - -At the root level of the metadata, a few single-value attributes are used. These are -attributes saying something fundamental about these data: - - -* **$schema**: A reference to the schema which this metadata should be valid against. -* **version**: The version of the FMU results data model being used. -* **source**: The source of these data. Will always say "fmu" for FMU results. -* **class**: The fundamental type of data. Valid classes: - * case - * surface - * table - * cpgrid - * cpgrid_property - * polygons - * cube - * well - * points - - -Blocks ------------ - -The bulk of the metadata is gathered in specific blocks. *Blocks* are sub-dictionaries -containing a specific part of the metadata. Not all blocks are present in all materialized metadata, -and not all block sub-attributes are applied in all contexts. - - -fmu -~~~ - -The ``fmu`` block contains all attributes specific to FMU. The idea is that the FMU results -data model can be applied to data from *other* sources - in which the fmu-specific stuff -may not make sense or be applicable. Within the fmu-block, there are more blocks: - - -**fmu.model**: The ``fmu.model`` block contains information about the model used. - -.. note:: - Synonyms for "model" in this context are "template", "setup", etc. The term "model" - is ultra-generic but was chosen before e.g. "template" as the latter deviates from - daily communications and is, if possible, even more generic than "model". - -**fmu.workflow**: The ``fmu.workflow`` block refers to specific subworkflows within the large -FMU workflow being ran. This has not (yet?) been standardized, mainly due to the lack -of programmatic access to the workflows being run in important software within FMU. -One sub-attribute has been defined and is used: -**fmu.workflow.reference**: A string referring to which workflow this data object was exported by. - -.. note:: A key usage of ``fmu.workflow.reference`` is related to ensuring uniqueness of data objects. - -**Example of uniqueness challenge** -During an hypothetical FMU workflow, a surface representing a specific horizon in -depth is exported multiple times during the run for QC purposes. E.g. a representation -of *Volantis Gp. Top* is first exported at the start of the workflow, then 2-3 times during -depth conversion to record changes, then at the start of structural modeling, then 4-5 -times during structural modeling to record changes, then extracted from multiple grids. - -The end result is 10+ versions of *Volantis Gp. Top* which are identical except from -which workflow they were produced by. - -**fmu.case**: The ``fmu.case`` block contains information about the case from which this data -object was exported. ``fmu.case`` has the following subattributes, and more may arrive: - -* **fmu.case.name**: [string] The name of the case -* **fmu.case.uuid**: [uuid] The unique identifier of this case. Currently made by fmu.dataio. Future made by ERT? - -* **fmu.case.user**: A block holding information about the user. - - * **fmu.case.user.id**: [string] A user identity reference. - -* **fmu.case.description**: [list of strings] (a free-text description of this case) (optional) - -.. note:: If an FMU data object is exported outside the case context, this block will not be present. - -**fmu.iteration**: The ``fmu.iteration`` block contains information about the iteration this data object belongs to. The ``fmu.iteration`` -has the following defined sub-attributes: - -* **fmu.iteration.id**: [int] The internal ID of the iteration, typically represented by an integer. -* **fmu.iteration.uuid**: [uuid] The universally unique identifier for this iteration. It is a hash of ``fmu.case.uuid`` and ``fmu.iteration.id``. -* **fmu.iteration.name**: [string] The name of the iteration. This is typically reflecting the folder name on scratch. In ERT, custom names for iterations are supported, e.g. "pred". For this reason, if logic is implied, the name can be risky to trust - even if it often contains the ID, e.g. "iter-0" -* **fmu.iteration.restart_from**: [uuid] The intention with this attribute is to flag when a iteration is a restart fromm another iteration. - -**fmu.realization**: The ``fmu.realization`` block contains information about the realization this data object belongs to, with the following sub-attributes: - -* **fmu.realization.id**: The internal ID of the realization, typically represented by an integer. -* **fmu.realization.uuid**: The universally unique identifier for this realization. It is a hash of ``fmu.case.uuid`` and ``fmu.iteration.uuid`` and ``fmu.realization.id``. -* **fmu.realization.name**: The name of the realization. This is typically reflecting the folder name on scratch. Custom names for realizations are not supported by ERT, but we still recommend to use ``fmu.realization.id`` for all usage except purely visual appearance. -* **fmu.realization.parameters**: These are the parameters used in this realization. It is a direct pass of ``parameters.txt`` and will contain key:value pairs representing the design parameters. - -**fmu.jobs**: Directly pass "jobs.json". Temporarily deactivated in fmu-dataio pending further alignment with ERT. - -.. note:: - The blocks within the ``fmu`` section signal by their presence which context a data object is exported under. Example: If an - aggregated object contains ``fmu.case`` and ``fmu.iteration``, but not ``fmu.realization``, it can be assumed that this object belongs - to this ``case`` and ``iteration`` but not to any specific ``realization``. - - -file -~~~~ - -The ``file`` block contains references to this data object as a file on a disk. A filename -in this context can be actual, or abstract. Particularly the ``relative_path`` is, and will -most likely remain, an important identifier for individual file objects within an FMU -case - irrespective of the existance of an actual file system. For this reason, the -``relative_path`` - as well as the ``checksum_md5`` will be generated even if a file is -not saved to disk. The ``absolute_path`` will only be generated in the case of actually -creating a file on disk and is not required under this schema. - -* **file.relative_path**: [path] The path of a file relative to the case root. -* **file.absolute_path**: [path] The absolute path of a file, e.g. /scratch/field/user/case/etc -* **file.checksum_md5**: [string] A valid MD5 checksum of the file. - -data -~~~~ - -The ``data`` block contains information about the data contains in this object. - -* **data.content**: [string] The content of these data. Examples are "depth", "porosity", etc. - -* **data.name**: [string] This is the identifying name of this data object. For surfaces, this is typically the horizon name or similar. Shall be compliant with the stratigraphic column if applicable. -* **data.stratigraphic**: [bool] True if this is defined in the stratigraphic column. -* **data.offset**: If a specific horizon is represented with an offset, e.g. "2 m below Top Volantis". - -.. note:: If data object represents an interval, the data.top and data.base attributes can be used. - -* **data.top**: - - * **data.top.name**: *As data.name* - * **data.top.stratigraphic**: *As data.stratigraphic* - * **data.top.offset**: *As data.offset* - -* **data.base**: - - * **data.base.name**: *As data.name* - * **data.base.stratigraphic**: *As data.stratigraphic* - * **data.base.offset**: *As data.offset* - -* **data.stratigraphic_alias**: [list] A list of strings representing stratigraphic aliases for this *data.name*. E.g. the top of the uppermost member of a formation will be alias to the top of the formation. -* **data.alias**: [list] Other known-as names for *data.name*. Typically names used within specific software, e.g. RMS and others. - -* **data.tagname**: [string] An identifier for this/these data object(s). Similar to the second part of the generated filename in disk-oriented FMU data standard. - -* **data.properties**: A list of dictionary objects, where each object describes a property contained by this data object. - - * **data.properties..name**: [string] The name of this property. - * **data.properties..attribute**: [string] The attribute. - * **data.properties..is_discrete**: [bool] Flag if this property is is_discrete. - * **data.properties..calculation**: [string] A reference to a calculation performed to derive this property. - -.. note:: The ``data.properties`` concept is experimental. Use cases include surfaces containing multiple properties/attributes, grids with parameters, etc. - -* **data.format**: [string] A reference to a known file format. -* **data.layout**: [string] A reference to the layout of the data object. Examples: "regular", "cornerpoint", "structured" -* **data.unit**: [string] A reference to a known unit. Examples. "m" -* **data.vertical_domain**: [string] A reference to a known vertical domain. Examples: "depth", "time" -* **data.depth_reference**: [string] A reference to a known depth reference. Examples: "msl", "seabed" - -* **data.grid_model**: A block containing information pertaining to grid model content. - - * **data.grid_model.name**: [string] A name reference to this data. - -* **data.spec**: A block containing the specs for this object, if applicable. - - * **data.spec.ncol**: [int] Number of columns - * **data.spec.nrow**: [int] Number of rows - * **data.spec.nlay**: [int] Number of layers - * **data.spec.xori**: [float] Origin X coordinate - * **data.spec.yori**: [float] Origin Y coordinate - * **data.spec.xinc**: [float] X increment - * **data.spec.yinc**: [float] Y increment - * **data.spec.yflip**: [int] Y flip flag (from IRAP Binary) - * **data.spec.rotation**: [float] Rotation (degrees) - * **data.spec.undef**: [float] Number representing the Null value - -* **data.bbox**: A block containing the bounding box for this data, if applicable - - * **data.bbox.xmin**: [float] Minimum X coordinate - * **data.bbox.xmax**: [float] Maximum X coordinate - * **data.bbox.ymin**: [float] Minimum Y coordinate - * **data.bbox.ymax**: [float] Maximum Y coordinate - * **data.bbox.zmin**: [float] Minimum Z coordinate - * **data.bbox.zmax**: [float] Maximum Z coordinate - - -* **data.time**: A block containing lists of objects describing timestamp information for this data object, if applicable. - - * **data.time.value**: [datetime] A datetime representation - * **data.time.label**: [string] A label corresponding to the timestamp - -.. note:: **data.time** items can be repeated to include many time stamps - -* **data.is_prediction**: [bool] True if this is a prediction -* **data.is_observation**: [bool] True if this is an observation -* **data.description**: [list] A list of strings, freetext description of this data, if applicable. - -Conditional attributes of the data block: - -* **data.fluid_contact**: A block describing a fluid contact. Shall be present if "data.content" == "fluid_contact" - - * **data.fluid_contact.contact**: [string] A known type of contact. Examples: "owc", "fwl" - * **data.fluid_contact.truncated**: [bool] If True, this is a representation of a contact surface which is truncated to stratigraphy. - -* **data.field_outline**: A block describing a field outline. Shall be present if "data.content" == "field_outline" - - * **data.field_outline.contact**: The fluid contact used to define the field outline. - -* **data.seismic**: A block describing seismic data. Shall be present if "data.content" == "seismic" - - * **data.seismic.attribute**: [string] A known seismic attribute. - * **data.seismic.zrange**: [float] The z-range applied. - * **data.seismic.filter_size**: [float] The filter size applied. - * **data.seismic.scaling_factor**: [float] The scaling factor applied. - - -display -~~~~~~~ - -The ``display`` block contains information related to how this data object should/could be displayed. -As a general rule, the consumer of data is responsible for figuring out how a specific data object shall -be displayed. However, we use this block to communicate preferences from the data producers perspective. - -We also maintain this block due to legacy reasons. No data filtering logic should be placed on the ``display`` block. - -* **display.name**: A display-friendly version of ``data.name``. -* **display.subtitle**: A display-friendly subtitle. - -* **display.line**: (Experimental) A block containing display information for line objects. - - * **display.line.show**: [bool] Show a line - * **display.line.color**: [string] A reference to a known color. - -* **display.points**: (Experimental) A block containing display information for point(s) objects. - - * **display.points.show**: [bool] Show points. - * **display.points.color**: [string] A reference to a known color. - -* **display.contours**: (Experimental) A block containing display information for contours. - - * **display.contours.show**: [bool] Show contours. - * **display.contours.color**: [string] A reference to a known color. - -* **display.fill**: (Experimental) A block containing display information for fill. - - * **display.fill.show**: [bool] Show fill. - * **display.fill.color**: [string] A reference to a known color. - * **display.fill.colormap**: [string] A reference to a known color map. - * **display.fill.display_min**: [float] The value to use as minimum value when displaying. - * **display.fill.display_max**: [float] The value to use as maximum value when displaying. - - - -access -~~~~~~ - -The ``access`` block contains information related to acces control for this data object. - -* **asset**: A block containing information about the owner asset of these data. - - * **access.asset.name**: [string] A string referring to a known asset name. - -* **access.ssdl**: A block containing information related to SSDL. Note that this is kept due to legacy. - - * **access.ssdl.access_level**: [string] The SSDL access level (internal/asset) - * **access.ssdl.rep_include**: [bool] Flag if this data is to be shown in REP or not. - - We fully acknowledge that horrible pattern of putting application-specific information into a data model like this. However - for legacy reasons this is kept until better options exists. - - -masterdata -~~~~~~~~~~ - -The ``masterdata`` block contains information related to masterdata. Currently, smda holds the masterdata. - -* **masterdata.smda**: Block containing SMDA-related attributes. - - * **masterdata.smda.country**: [list] A list of strings referring to countries known to SMDA. First item is primary. - * **masterdata.smda.discovery**: [list] A list of strings referring to discoveries known to SMDA. First item is primary. - * **masterdata.smda.field**: [list] A list of strings referring to fields known to SMDA. First item is primary. - -* **masterdata.smda.coordinate_system**: Reference to coordinate system known to SMDA - - * **masterdata.smda.coordinate_system.identifier**: [string] Identifier known to SMDA - * **masterdata.smda.coordinate_system.uuid**: [uuid] A UUID known to SMDA - -* **masterdata.smda.stratigraphic_column**: Reference to stratigraphic column known to SMDA - - * **masterdata.smda.stratigraphic_column.identifier**: [string] Identifier known to SMDA - * **masterdata.smda.stratigraphic_column.uuid**: [uuid] A UUID known to SMDA - - -tracklog -~~~~~~~~ - -The tracklog block contains a record of events recorded on these data. This is experimental for now. -The tracklog is a list of *tracklog_events* with the following definition: - -* **tracklog.**: An event. - * **tracklog..datetime**: [datetime] Timestamp of the event - * **tracklog..user**: [string] Identification of user associated with the event - * **tracklog..event**: [string] String representing the event - - -.. note:: - The "tracklog" concept is included but considered heavily experimental for now. The concept of - data lineage goes far beyond this, and this should not be read as the full lineage of these data. - -Validation of data -================== - -When fmu-dataio exports data from FMU workflows, it produces a pair of data + metadata. The two are -considered one entity. Data consumers who wish to validate the correct match of data and metadata can -do so by verifying recreation of ``file.checksum_md5`` on the data object only. Metadata is not considered -when generating the checksum. - -This checksum is the string representation of the hash created using RSA's ``MD5`` algorithm. This hash -was created from the _file_ that fmu-dataio exported. In most cases, this is the same file that are -provided to consumer. However, there are some exceptions: - -- Seismic data may be transformed to other formats when stored out of FMU context and the checksum may -be invalid. - -Changes and revisions -===================== - -The only constant is change, as we know, and in the case of the FMU results data model - definitely so. -The learning component here is huge, and there will be iterations. This poses a challenge, given that -there are existing dependencies on top of this data model already, and more are arriving. - -To handle this, two important concepts has been introduced. - -1) **Versioning**. The current version of the FMU metadata is 0.8.0. This version is likely to remain for a while. (We have not yet figured out how to best deal with versioning. Have good ideas? Bring them!) -2) **Contractual attributes**. Within the FMU ecosystem, we need to retain the ability to do rapid changes to the data model. As we are in early days, unknowns will become knowns and unknown unknowns will become known unknowns. However, from the outside perspective some stability is required. Therefore, we have labelled some key attributes as *contractual*. They are listed at the top of the schema. This is not to say that they will never change - but they should not change erratically, and when we need to change them, this needs to be subject to alignment. - - -Contractual attributes ----------------------- - -The following attributes are contractual: - -* class -* source -* version -* tracklog -* data.format -* data.name -* data.stratigraphic -* data.alias -* data.stratigraphic_alias -* data.offset -* data.content -* data.vertical_domain -* data.grid_model -* data.bbox -* data.is_prediction -* data.is_observation -* data.seismic.attribute -* access -* masterdata -* fmu.model -* fmu.workflow -* fmu.case -* fmu.iteration -* fmu.realization.name -* fmu.realization.id -* fmu.realization.uuid -* fmu.aggregation.operation -* fmu.aggregation.realization_ids -* file.relative_path -* file.checksum_md5 - - -Metadata example -================ - -Expand below to see a full example of valid metadata for surface exported from FMU. - -.. toggle:: - - .. literalinclude:: ../schema/definitions/0.8.0/examples/surface_depth.yml - :language: yaml - -| - -You will find more examples in `fmu-dataio github repository `__. - - -FAQ -=== - -We won't claim that these questions are really very *frequently* asked, but these are some -key questions you may have along the way. - -**My existing FMU workflow does not produce any metadata. Now I am told that it has to. What do I do?** -First step: Start using fmu-dataio in your workflow. You will get a lot for free using it, amongst -other things, metadata will start to appear from your workflow. To get started with fmu-dataio, -see `the overview section `__. - -**This data model is not what I would have chosen. How can I change it?** -The FMU community (almost always) builds what the FMU community wants. The first step -would be to define what you are unhappy with, preferably formulated as an issue in the -`fmu-dataio github repository `__. -(If your comments are Equinor internal, please reach out to either Per Olav (peesv) or Jan (jriv).) - -**This data model allows me to create a smashing data visualisation component, but I fear that it -is so immature that it will not be stable - will it change all the time?** -Yes, and no. It is definitely experimental and these are early days. Therefore, changes -will occur as learning is happening. Part of that learning comes from development of -components utilizing the data model, so your feedback may contribute to evolving this -data model. However, you should not expact erratic changes. The concept of Contractual attributes -are introduced for this exact purpose. We have also chosen to version the metadata - partly to -clearly separate from previous versions, but also for allowing smooth evolution going forward. -We don't yet know *exactly* how this will be done in practice, but perhaps you will tell us! \ No newline at end of file diff --git a/docs/datastructure.rst b/docs/datastructure.rst deleted file mode 100644 index 6d87e3b33..000000000 --- a/docs/datastructure.rst +++ /dev/null @@ -1,223 +0,0 @@ -.. Do not modifly this file manuely, docs/gen.py -Meta export datastructure -========================= - - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Access - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Aggregation - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Asset - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.ClassMeta - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.CoordinateSystem - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.CountryItem - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.DiscoveryItem - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.FMU - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.FMUCase - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.FMUCaseClassMeta - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.FMUDataClassMeta - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.FMUModel - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.FieldItem - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.File - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Iteration - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Masterdata - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Parameters - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Realization - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.RealizationJobListing - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.RealizationJobs - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Root - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Smda - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Ssdl - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.SsdlAccess - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.StratigraphicColumn - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.TracklogEvent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.User - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.meta.Workflow - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.AnyContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.BoundingBox - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.Content - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.DepthContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.FMUTimeObject - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.FaultLinesContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.FieldOutline - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.FieldOutlineContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.FieldRegion - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.FieldRegionContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.FluidContact - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.FluidContactContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.GridModel - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.InplaceVolumesContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.KPProductContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.Layer - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.LiftCurvesContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.PVTContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.ParametersContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.PinchoutContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.PropertyContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.RFTContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.RegionsContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.RelpermContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.Seismic - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.SeismicContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.SubcropContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.ThicknessContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.Time - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.TimeContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.TimeSeriesContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.TransmissibilitiesContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.VelocityContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.VolumesContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.VolumetricsContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.content.WellPicksContent - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.CPGridPropertySpecification - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.CPGridSpecification - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.CubeSpecification - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.PolygonsSpecification - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.Shape - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.SurfaceSpecification - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.TableSpecification - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.FaultRoomSurfaceSpecification - :model-show-json: false - -.. autopydantic_model:: fmu.dataio.datastructure.meta.specification.WellPointsDictionaryCaseSpecification - :model-show-json: false diff --git a/docs/ext/pydantic_autosummary/__init__.py b/docs/ext/pydantic_autosummary/__init__.py new file mode 100644 index 000000000..089f4ff5a --- /dev/null +++ b/docs/ext/pydantic_autosummary/__init__.py @@ -0,0 +1,921 @@ +"""Extension that adds an autosummary:: directive. + +The directive can be used to generate function/method/attribute/etc. summary +lists, similar to those output eg. by Epydoc and other API doc generation tools. + +An :autolink: role is also provided. + +autosummary directive +--------------------- + +The autosummary directive has the form:: + + .. autosummary:: + :nosignatures: + :toctree: generated/ + + module.function_1 + module.function_2 + ... + +and it generates an output table (containing signatures, optionally) + + ======================== ============================================= + module.function_1(args) Summary line from the docstring of function_1 + module.function_2(args) Summary line from the docstring + ... + ======================== ============================================= + +If the :toctree: option is specified, files matching the function names +are inserted to the toctree with the given prefix: + + generated/module.function_1 + generated/module.function_2 + ... + +Note: The file names contain the module:: or currentmodule:: prefixes. + +.. seealso:: autosummary_generate.py + + +autolink role +------------- + +The autolink role functions as ``:obj:`` when the name referred can be +resolved to a Python object, and otherwise it becomes simple emphasis. +This can be used as the default role to make links 'smart'. +""" + +from __future__ import annotations + +import functools +import inspect +import operator +import os +import posixpath +import re +import sys +from inspect import Parameter +from os import path +from types import ModuleType +from typing import TYPE_CHECKING, Any, ClassVar, cast + +import sphinx +from docutils import nodes +from docutils.parsers.rst import directives +from docutils.parsers.rst.states import RSTStateMachine, Struct, state_classes +from docutils.statemachine import StringList +from sphinx import addnodes +from sphinx.config import Config +from sphinx.environment import BuildEnvironment +from sphinx.ext.autodoc import INSTANCEATTR, Documenter +from sphinx.ext.autodoc.directive import DocumenterBridge, Options +from sphinx.ext.autodoc.importer import import_module +from sphinx.ext.autodoc.mock import mock +from sphinx.locale import __ +from sphinx.project import Project +from sphinx.pycode import ModuleAnalyzer, PycodeError +from sphinx.registry import SphinxComponentRegistry +from sphinx.util import logging, rst +from sphinx.util.docutils import ( + NullReporter, + SphinxDirective, + SphinxRole, + new_document, + switch_source_input, +) +from sphinx.util.inspect import getmro, signature_from_str +from sphinx.util.matching import Matcher + +if TYPE_CHECKING: + from collections.abc import Sequence + + from docutils.nodes import Node, system_message + from sphinx.application import Sphinx + from sphinx.extension import Extension + from sphinx.util.typing import ExtensionMetadata, OptionSpec + from sphinx.writers.html import HTML5Translator + +logger = logging.getLogger(__name__) + + +periods_re = re.compile(r"\.(?:\s+)") +literal_re = re.compile(r"::\s*$") + +WELL_KNOWN_ABBREVIATIONS = ("et al.", "e.g.", "i.e.") + + +# -- autosummary_toc node ------------------------------------------------------ + + +class autosummary_toc(nodes.comment): + pass + + +def autosummary_toc_visit_html(self: nodes.NodeVisitor, node: autosummary_toc) -> None: + """Hide autosummary toctree list in HTML output.""" + raise nodes.SkipNode + + +def autosummary_noop(self: nodes.NodeVisitor, node: Node) -> None: + pass + + +# -- autosummary_table node ---------------------------------------------------- + + +class autosummary_table(nodes.comment): + pass + + +def autosummary_table_visit_html( + self: HTML5Translator, node: autosummary_table +) -> None: + """Make the first column of the table non-breaking.""" + try: + table = cast(nodes.table, node[0]) + tgroup = cast(nodes.tgroup, table[0]) + tbody = cast(nodes.tbody, tgroup[-1]) + rows = cast(list[nodes.row], tbody) + for row in rows: + col1_entry = cast(nodes.entry, row[0]) + par = cast(nodes.paragraph, col1_entry[0]) + for j, subnode in enumerate(list(par)): + if isinstance(subnode, nodes.Text): + new_text = subnode.astext().replace(" ", "\u00a0") + par[j] = nodes.Text(new_text) + except IndexError: + pass + + +# -- autodoc integration ------------------------------------------------------- + + +class FakeApplication: + def __init__(self) -> None: + self.doctreedir = None + self.events = None + self.extensions: dict[str, Extension] = {} + self.srcdir = None + self.config = Config() + self.project = Project("", {}) + self.registry = SphinxComponentRegistry() + + +class FakeDirective(DocumenterBridge): + def __init__(self) -> None: + settings = Struct(tab_width=8) + document = Struct(settings=settings) + app = FakeApplication() + app.config.add("autodoc_class_signature", "mixed", "env", ()) + env = BuildEnvironment(app) # type: ignore[arg-type] + state = Struct(document=document) + super().__init__(env, None, Options(), 0, state) + + +def get_documenter(app: Sphinx, obj: Any, parent: Any) -> type[Documenter]: + """Get an autodoc.Documenter class suitable for documenting the given + object. + + *obj* is the Python object to be documented, and *parent* is an + another Python object (e.g. a module or a class) to which *obj* + belongs to. + """ + from sphinx.ext.autodoc import DataDocumenter, ModuleDocumenter + + if inspect.ismodule(obj): + # ModuleDocumenter.can_document_member always returns False + return ModuleDocumenter + + # Construct a fake documenter for *parent* + if parent is not None: + parent_doc_cls = get_documenter(app, parent, None) + else: + parent_doc_cls = ModuleDocumenter + + if hasattr(parent, "__name__"): + parent_doc = parent_doc_cls(FakeDirective(), parent.__name__) + else: + parent_doc = parent_doc_cls(FakeDirective(), "") + + # Get the correct documenter class for *obj* + classes = [ + cls + for cls in app.registry.documenters.values() + if cls.can_document_member(obj, "", False, parent_doc) + ] + if classes: + classes.sort(key=lambda cls: cls.priority) + return classes[-1] + return DataDocumenter + + +# -- .. autosummary:: ---------------------------------------------------------- + + +class Autosummary(SphinxDirective): + """ + Pretty table containing short signatures and summaries of functions etc. + + autosummary can also optionally generate a hidden toctree:: node. + """ + + required_arguments = 0 + optional_arguments = 0 + final_argument_whitespace = False + has_content = True + option_spec: ClassVar[OptionSpec] = { + "caption": directives.unchanged_required, + "toctree": directives.unchanged, + "nosignatures": directives.flag, + "recursive": directives.flag, + "template": directives.unchanged, + } + + def run(self) -> list[Node]: + self.bridge = DocumenterBridge( + self.env, self.state.document.reporter, Options(), self.lineno, self.state + ) + + names = [ + x.strip().split()[0] + for x in self.content + if x.strip() and re.search(r"^[~a-zA-Z_]", x.strip()[0]) + ] + items = self.get_items(names) + nodes = self.get_table(items) + + if "toctree" in self.options: + dirname = posixpath.dirname(self.env.docname) + + tree_prefix = self.options["toctree"].strip() + docnames = [] + excluded = Matcher(self.config.exclude_patterns) + filename_map = self.config.autosummary_filename_map + for _name, _sig, _summary, real_name in items: + real_name = filename_map.get(real_name, real_name) + docname = posixpath.join(tree_prefix, real_name) + docname = posixpath.normpath(posixpath.join(dirname, docname)) + if docname not in self.env.found_docs: + if excluded(self.env.doc2path(docname, False)): + msg = __( + "autosummary references excluded document %r. Ignored." + ) + else: + msg = __( + "autosummary: stub file not found %r. " + "Check your autosummary_generate setting." + ) + + logger.warning(msg, real_name, location=self.get_location()) + continue + + docnames.append(docname) + + if docnames: + tocnode = addnodes.toctree() + tocnode["includefiles"] = docnames + tocnode["entries"] = [(None, docn) for docn in docnames] + tocnode["maxdepth"] = -1 + tocnode["glob"] = None + tocnode["caption"] = self.options.get("caption") + + nodes.append(autosummary_toc("", "", tocnode)) + + if "toctree" not in self.options and "caption" in self.options: + logger.warning( + __("A captioned autosummary requires :toctree: option. ignored."), + location=nodes[-1], + ) + + return nodes + + def import_by_name( + self, + name: str, + prefixes: list[str | None], + ) -> tuple[str, Any, Any, str]: + with mock(self.config.autosummary_mock_imports): + try: + return import_by_name(name, prefixes) + except ImportExceptionGroup as exc: + # check existence of instance attribute + try: + return import_ivar_by_name(name, prefixes) + except ImportError as exc2: + if exc2.__cause__: + errors: list[BaseException] = [*exc.exceptions, exc2.__cause__] + else: + errors = [*exc.exceptions, exc2] + + raise ImportExceptionGroup(exc.args[0], errors) from None + + def create_documenter( + self, app: Sphinx, obj: Any, parent: Any, full_name: str + ) -> Documenter: + """Get an autodoc.Documenter class suitable for documenting the given + object. + + Wraps get_documenter and is meant as a hook for extensions. + """ + doccls = get_documenter(app, obj, parent) + return doccls(self.bridge, full_name) + + def get_items(self, names: list[str]) -> list[tuple[str, str, str, str]]: + """Try to import the given names, and return a list of + ``[(name, signature, summary_string, real_name), ...]``. + """ + prefixes = get_import_prefixes_from_env(self.env) + + items: list[tuple[str, str, str, str]] = [] + + max_item_chars = 50 + + for name in names: + display_name = name + if name.startswith("~"): + name = name[1:] + display_name = name.split(".")[-1] + + try: + real_name, obj, parent, modname = self.import_by_name( + name, prefixes=prefixes + ) + except ImportExceptionGroup as exc: + errors = list({f"* {type(e).__name__}: {e}" for e in exc.exceptions}) + logger.warning( + __("autosummary: failed to import %s.\nPossible hints:\n%s"), + name, + "\n".join(errors), + location=self.get_location(), + ) + continue + + self.bridge.result = StringList() # initialize for each documenter + full_name = real_name + if not isinstance(obj, ModuleType): + # give explicitly separated module name, so that members + # of inner classes can be documented + full_name = modname + "::" + full_name[len(modname) + 1 :] + # NB. using full_name here is important, since Documenters + # handle module prefixes slightly differently + documenter = self.create_documenter(self.env.app, obj, parent, full_name) + if not documenter.parse_name(): + logger.warning( + __("failed to parse name %s"), + real_name, + location=self.get_location(), + ) + items.append((display_name, "", "", real_name)) + continue + if not documenter.import_object(): + logger.warning( + __("failed to import object %s"), + real_name, + location=self.get_location(), + ) + items.append((display_name, "", "", real_name)) + continue + + # try to also get a source code analyzer for attribute docs + try: + documenter.analyzer = ModuleAnalyzer.for_module( + documenter.get_real_modname() + ) + # parse right now, to get PycodeErrors on parsing (results will + # be cached anyway) + documenter.analyzer.find_attr_docs() + except PycodeError as err: + logger.debug("[autodoc] module analyzer failed: %s", err) + # no source file -- e.g. for builtin and C modules + documenter.analyzer = None + + # -- Grab the signature + + try: + sig = documenter.format_signature(show_annotation=False) + except TypeError: + # the documenter does not support ``show_annotation`` option + sig = documenter.format_signature() + + if not sig: + sig = "" + else: + max_chars = max(10, max_item_chars - len(display_name)) + sig = mangle_signature(sig, max_chars=max_chars) + + # -- Grab the summary + + # bodge for ModuleDocumenter + documenter._extra_indent = "" # type: ignore[attr-defined] + + documenter.add_content(None) + summary = extract_summary(self.bridge.result.data[:], self.state.document) + + items.append((display_name, sig, summary, real_name)) + + return items + + def get_table(self, items: list[tuple[str, str, str, str]]) -> list[Node]: + """Generate a proper list of table nodes for autosummary:: directive. + + *items* is a list produced by :meth:`get_items`. + """ + table_spec = addnodes.tabular_col_spec() + table_spec["spec"] = r"\X{1}{2}\X{1}{2}" + + table = autosummary_table("") + real_table = nodes.table("", classes=["autosummary longtable"]) + table.append(real_table) + group = nodes.tgroup("", cols=2) + real_table.append(group) + group.append(nodes.colspec("", colwidth=10)) + group.append(nodes.colspec("", colwidth=90)) + body = nodes.tbody("") + group.append(body) + + def append_row(*column_texts: str) -> None: + row = nodes.row("") + source, line = self.state_machine.get_source_and_line() + for text in column_texts: + node = nodes.paragraph("") + vl = StringList() + vl.append(text, "%s:%d:" % (source, line)) + with switch_source_input(self.state, vl): + self.state.nested_parse(vl, 0, node) + try: + if isinstance(node[0], nodes.paragraph): + node = node[0] + except IndexError: + pass + row.append(nodes.entry("", node)) + body.append(row) + + for name, sig, summary, real_name in items: + qualifier = "obj" + if "nosignatures" not in self.options: + col1 = f":py:{qualifier}:`{name} <{real_name}>`\\ {rst.escape(sig)}" + else: + col1 = f":py:{qualifier}:`{name} <{real_name}>`" + col2 = summary + append_row(col1, col2) + + return [table_spec, table] + + +def strip_arg_typehint(s: str) -> str: + """Strip a type hint from argument definition.""" + return s.split(":")[0].strip() + + +def _cleanup_signature(s: str) -> str: + """Clean up signature using inspect.signautre() for mangle_signature()""" + try: + sig = signature_from_str(s) + parameters = list(sig.parameters.values()) + for i, param in enumerate(parameters): + if param.annotation is not Parameter.empty: + # Remove typehints + param = param.replace(annotation=Parameter.empty) + if param.default is not Parameter.empty: + # Replace default value by "None" + param = param.replace(default=None) + parameters[i] = param + sig = sig.replace(parameters=parameters, return_annotation=Parameter.empty) + return str(sig) + except Exception: + # Return the original signature string if failed to clean (ex. parsing error) + return s + + +def mangle_signature(sig: str, max_chars: int = 30) -> str: + """Reformat a function signature to a more compact form.""" + s = _cleanup_signature(sig) + + # Strip return type annotation + s = re.sub(r"\)\s*->\s.*$", ")", s) + + # Remove parenthesis + s = re.sub(r"^\((.*)\)$", r"\1", s).strip() + + # Strip literals (which can contain things that confuse the code below) + s = re.sub(r"\\\\", "", s) # escaped backslash (maybe inside string) + s = re.sub(r"\\'", "", s) # escaped single quote + s = re.sub(r'\\"', "", s) # escaped double quote + s = re.sub(r"'[^']*'", "", s) # string literal (w/ single quote) + s = re.sub(r'"[^"]*"', "", s) # string literal (w/ double quote) + + # Strip complex objects (maybe default value of arguments) + while re.search( + r"\([^)]*\)", s + ): # contents of parenthesis (ex. NamedTuple(attr=...)) + s = re.sub(r"\([^)]*\)", "", s) + while re.search(r"<[^>]*>", s): # contents of angle brackets (ex. ) + s = re.sub(r"<[^>]*>", "", s) + while re.search(r"{[^}]*}", s): # contents of curly brackets (ex. dict) + s = re.sub(r"{[^}]*}", "", s) + + # Parse the signature to arguments + options + args: list[str] = [] + opts: list[str] = [] + + opt_re = re.compile(r"^(.*, |)([a-zA-Z0-9_*]+)\s*=\s*") + while s: + m = opt_re.search(s) + if not m: + # The rest are arguments + args = s.split(", ") + break + + opts.insert(0, m.group(2)) + s = m.group(1)[:-2] + + # Strip typehints + for i, arg in enumerate(args): + args[i] = strip_arg_typehint(arg) + + for i, opt in enumerate(opts): + opts[i] = strip_arg_typehint(opt) + + # Produce a more compact signature + sig = limited_join(", ", args, max_chars=max_chars - 2) + if opts: + if not sig: + sig = "[%s]" % limited_join(", ", opts, max_chars=max_chars - 4) + elif len(sig) < max_chars - 4 - 2 - 3: + sig += "[, %s]" % limited_join( + ", ", opts, max_chars=max_chars - len(sig) - 4 - 2 + ) + + return "(%s)" % sig + + +def extract_summary(doc: list[str], document: Any) -> str: + """Extract summary from docstring.""" + + def parse(doc: list[str], settings: Any) -> nodes.document: + state_machine = RSTStateMachine(state_classes, "Body") + node = new_document("", settings) + node.reporter = NullReporter() + state_machine.run(doc, node) + + return node + + # Skip a blank lines at the top + while doc and not doc[0].strip(): + doc.pop(0) + + # If there's a blank line, then we can assume the first sentence / + # paragraph has ended, so anything after shouldn't be part of the + # summary + for i, piece in enumerate(doc): + if not piece.strip(): + doc = doc[:i] + break + + if doc == []: + return "" + + # parse the docstring + node = parse(doc, document.settings) + if isinstance(node[0], nodes.section): + # document starts with a section heading, so use that. + summary = node[0].astext().strip() + elif not isinstance(node[0], nodes.paragraph): + # document starts with non-paragraph: pick up the first line + summary = doc[0].strip() + else: + # Try to find the "first sentence", which may span multiple lines + sentences = periods_re.split(" ".join(doc)) + if len(sentences) == 1: + summary = sentences[0].strip() + else: + summary = "" + for i in range(len(sentences)): + summary = ". ".join(sentences[: i + 1]).rstrip(".") + "." + node[:] = [] + node = parse(doc, document.settings) + if summary.endswith(WELL_KNOWN_ABBREVIATIONS): + pass + elif not any(node.findall(nodes.system_message)): + # considered as that splitting by period + # does not break inline markups + break + + # strip literal notation mark ``::`` from tail of summary + return literal_re.sub(".", summary) + + +def limited_join( + sep: str, items: list[str], max_chars: int = 30, overflow_marker: str = "..." +) -> str: + """Join a number of strings into one, limiting the length to *max_chars*. + + If the string overflows this limit, replace the last fitting item by + *overflow_marker*. + + Returns: joined_string + """ + full_str = sep.join(items) + if len(full_str) < max_chars: + return full_str + + n_chars = 0 + n_items = 0 + for item in items: + n_chars += len(item) + len(sep) + if n_chars < max_chars - len(overflow_marker): + n_items += 1 + else: + break + + return sep.join([*list(items[:n_items]), overflow_marker]) + + +# -- Importing items ----------------------------------------------------------- + + +class ImportExceptionGroup(Exception): + """Exceptions raised during importing the target objects. + + It contains an error messages and a list of exceptions as its arguments. + """ + + def __init__( + self, message: str | None, exceptions: Sequence[BaseException] + ) -> None: + super().__init__(message) + self.exceptions = list(exceptions) + + +def get_import_prefixes_from_env(env: BuildEnvironment) -> list[str | None]: + """ + Obtain current Python import prefixes (for `import_by_name`) + from ``document.env`` + """ + prefixes: list[str | None] = [None] + + currmodule = env.ref_context.get("py:module") + if currmodule: + prefixes.insert(0, currmodule) + + currclass = env.ref_context.get("py:class") + if currclass: + if currmodule: + prefixes.insert(0, currmodule + "." + currclass) + else: + prefixes.insert(0, currclass) + + return prefixes + + +def import_by_name( + name: str, + prefixes: Sequence[str | None] = (None,), +) -> tuple[str, Any, Any, str]: + """Import a Python object that has the given *name*, under one of the + *prefixes*. The first name that succeeds is used. + """ + tried = [] + errors: list[ImportExceptionGroup] = [] + for prefix in prefixes: + try: + prefixed_name = f"{prefix}.{name}" if prefix else name + obj, parent, modname = _import_by_name( + prefixed_name, grouped_exception=True + ) + return prefixed_name, obj, parent, modname + except ImportError: + tried.append(prefixed_name) + except ImportExceptionGroup as exc: + tried.append(prefixed_name) + errors.append(exc) + + exceptions: list[BaseException] = functools.reduce( + operator.iadd, (e.exceptions for e in errors), [] + ) + raise ImportExceptionGroup("no module named %s" % " or ".join(tried), exceptions) + + +def _import_by_name(name: str, grouped_exception: bool = True) -> tuple[Any, Any, str]: + """Import a Python object given its full name.""" + errors: list[BaseException] = [] + + try: + name_parts = name.split(".") + + # try first interpret `name` as MODNAME.OBJ + modname = ".".join(name_parts[:-1]) + if modname: + try: + mod = import_module(modname) + return getattr(mod, name_parts[-1]), mod, modname + except (ImportError, IndexError, AttributeError) as exc: + errors.append(exc.__cause__ or exc) + + # ... then as MODNAME, MODNAME.OBJ1, MODNAME.OBJ1.OBJ2, ... + last_j = 0 + modname = "" + for j in reversed(range(1, len(name_parts) + 1)): + last_j = j + modname = ".".join(name_parts[:j]) + try: + import_module(modname) + except ImportError as exc: + errors.append(exc.__cause__ or exc) + + if modname in sys.modules: + break + + if last_j < len(name_parts): + parent = None + obj = sys.modules[modname] + for obj_name in name_parts[last_j:]: + parent = obj + obj = getattr(obj, obj_name) + return obj, parent, modname + return sys.modules[modname], None, modname + except (ValueError, ImportError, AttributeError, KeyError) as exc: + errors.append(exc) + if grouped_exception: + raise ImportExceptionGroup("", errors) from None # NoQA: EM101 + raise ImportError(*exc.args) from exc + + +def import_ivar_by_name( + name: str, prefixes: Sequence[str | None] = (None,), grouped_exception: bool = True +) -> tuple[str, Any, Any, str]: + """Import an instance variable that has the given *name*, under one of the + *prefixes*. The first name that succeeds is used. + """ + try: + name, attr = name.rsplit(".", 1) + real_name, obj, parent, modname = import_by_name(name, prefixes) + + # Get ancestors of the object (class.__mro__ includes the class itself as + # the first entry) + candidate_objects = getmro(obj) + if len(candidate_objects) == 0: + candidate_objects = (obj,) + + for candidate_obj in candidate_objects: + analyzer = ModuleAnalyzer.for_module( + getattr(candidate_obj, "__module__", modname) + ) + analyzer.analyze() + # check for presence in `annotations` to include dataclass attributes + found_attrs = set() + found_attrs |= {attr for (qualname, attr) in analyzer.attr_docs} + found_attrs |= {attr for (qualname, attr) in analyzer.annotations} + if attr in found_attrs: + return real_name + "." + attr, INSTANCEATTR, obj, modname + except (ImportError, ValueError, PycodeError) as exc: + raise ImportError from exc + except ImportExceptionGroup: + raise # pass through it as is + + raise ImportError + + +# -- :autolink: (smart default role) ------------------------------------------- + + +class AutoLink(SphinxRole): + """Smart linking role. + + Expands to ':obj:`text`' if `text` is an object that can be imported; + otherwise expands to '*text*'. + """ + + def run(self) -> tuple[list[Node], list[system_message]]: + pyobj_role = self.env.get_domain("py").role("obj") + assert pyobj_role is not None + objects, errors = pyobj_role( + "obj", + self.rawtext, + self.text, + self.lineno, + self.inliner, + self.options, + self.content, + ) + if errors: + return objects, errors + + assert len(objects) == 1 + pending_xref = cast(addnodes.pending_xref, objects[0]) + try: + # try to import object by name + prefixes = get_import_prefixes_from_env(self.env) + import_by_name(pending_xref["reftarget"], prefixes) + except ImportExceptionGroup: + literal = cast(nodes.literal, pending_xref[0]) + objects[0] = nodes.emphasis( + self.rawtext, literal.astext(), classes=literal["classes"] + ) + + return objects, errors + + +def get_rst_suffix(app: Sphinx) -> str | None: + def get_supported_format(suffix: str) -> tuple[str, ...]: + parser_class = app.registry.get_source_parsers().get(suffix.removeprefix(".")) + if parser_class is None: + return ("restructuredtext",) + return parser_class.supported + + suffix = None + for suffix in app.config.source_suffix: + if "restructuredtext" in get_supported_format(suffix): + return suffix + + return None + + +def process_generate_options(app: Sphinx) -> None: + genfiles = app.config.autosummary_generate + + if genfiles is True: + env = app.builder.env + genfiles = [ + env.doc2path(x, base=False) + for x in env.found_docs + if os.path.isfile(env.doc2path(x)) + ] + elif genfiles is False: + pass + else: + ext = list(app.config.source_suffix) + genfiles = [ + genfile + (ext[0] if not genfile.endswith(tuple(ext)) else "") + for genfile in genfiles + ] + + for entry in genfiles[:]: + if not path.isfile(path.join(app.srcdir, entry)): + logger.warning(__("autosummary_generate: file not found: %s"), entry) + genfiles.remove(entry) + + if not genfiles: + return + + suffix = get_rst_suffix(app) + if suffix is None: + logger.warning( + __( + "autosummary generates .rst files internally. " + "But your source_suffix does not contain .rst. Skipped." + ) + ) + return + + # ----------- pydantic_autosummary change + from .generate import generate_autosummary_docs + # ----------/ pydantic_autosummary change + + imported_members = app.config.autosummary_imported_members + with mock(app.config.autosummary_mock_imports): + generate_autosummary_docs( + genfiles, + suffix=suffix, + base_path=app.srcdir, + app=app, + imported_members=imported_members, + overwrite=app.config.autosummary_generate_overwrite, + encoding=app.config.source_encoding, + ) + + +def setup(app: Sphinx) -> ExtensionMetadata: + # I need autodoc + app.setup_extension("sphinx.ext.autodoc") + app.add_node( + autosummary_toc, + html=(autosummary_toc_visit_html, autosummary_noop), + latex=(autosummary_noop, autosummary_noop), + text=(autosummary_noop, autosummary_noop), + man=(autosummary_noop, autosummary_noop), + texinfo=(autosummary_noop, autosummary_noop), + ) + app.add_node( + autosummary_table, + html=(autosummary_table_visit_html, autosummary_noop), + latex=(autosummary_noop, autosummary_noop), + text=(autosummary_noop, autosummary_noop), + man=(autosummary_noop, autosummary_noop), + texinfo=(autosummary_noop, autosummary_noop), + ) + app.add_directive("autosummary", Autosummary) + app.add_role("autolink", AutoLink()) + app.connect("builder-inited", process_generate_options) + app.add_config_value("autosummary_context", {}, "env") + app.add_config_value("autosummary_filename_map", {}, "html") + app.add_config_value("autosummary_generate", True, "env", {bool, list}) + app.add_config_value("autosummary_generate_overwrite", True, "") + app.add_config_value( + "autosummary_mock_imports", lambda config: config.autodoc_mock_imports, "env" + ) + app.add_config_value("autosummary_imported_members", [], "", bool) + app.add_config_value("autosummary_ignore_module_all", True, "env", bool) + + return {"version": sphinx.__display_version__, "parallel_read_safe": True} diff --git a/docs/ext/pydantic_autosummary/generate.py b/docs/ext/pydantic_autosummary/generate.py new file mode 100644 index 000000000..d2669a10b --- /dev/null +++ b/docs/ext/pydantic_autosummary/generate.py @@ -0,0 +1,880 @@ +"""Generates reST source files for autosummary. + +Usable as a library or script to generate automatic RST source files for +items referred to in autosummary:: directives. + +Each generated RST file contains a single auto*:: directive which +extracts the docstring of the referred item. + +Example Makefile rule:: + + generate: + sphinx-autogen -o source/generated source/*.rst +""" + +from __future__ import annotations + +import argparse +import importlib +import inspect +import locale +import os +import pkgutil +import pydoc +import re +import sys +from os import path +from typing import TYPE_CHECKING, Any, NamedTuple + +import sphinx.locale +from jinja2 import TemplateNotFound +from jinja2.sandbox import SandboxedEnvironment +from sphinx import __display_version__ +from sphinx.builders import Builder +from sphinx.config import Config +from sphinx.ext.autodoc.importer import import_module +from sphinx.locale import __ +from sphinx.pycode import ModuleAnalyzer, PycodeError +from sphinx.registry import SphinxComponentRegistry +from sphinx.util import logging, rst +from sphinx.util.inspect import getall, safe_getattr +from sphinx.util.osutil import ensuredir +from sphinx.util.template import SphinxTemplateLoader + +# ----------- pydantic_autosummary change +from . import ( + ImportExceptionGroup, + get_documenter, + import_by_name, + import_ivar_by_name, +) +from .pydantic import set_pydantic_model_fields + +# ----------/ pydantic_autosummary change + +if TYPE_CHECKING: + from collections.abc import Sequence, Set + from gettext import NullTranslations + + from sphinx.application import Sphinx + from sphinx.ext.autodoc import Documenter + +logger = logging.getLogger(__name__) + + +class DummyApplication: + """Dummy Application class for sphinx-autogen command.""" + + def __init__(self, translator: NullTranslations) -> None: + self.config = Config() + self.registry = SphinxComponentRegistry() + self.messagelog: list[str] = [] + self.srcdir = "/" + self.translator = translator + self.verbosity = 0 + self._warncount = 0 + self.warningiserror = False + + self.config.add("autosummary_context", {}, "env", ()) + self.config.add("autosummary_filename_map", {}, "env", ()) + self.config.add("autosummary_ignore_module_all", True, "env", bool) + + def emit_firstresult(self, *args: Any) -> None: + pass + + +class AutosummaryEntry(NamedTuple): + name: str + path: str | None + template: str + recursive: bool + + +def setup_documenters(app: Any) -> None: + from sphinx.ext.autodoc import ( + AttributeDocumenter, + ClassDocumenter, + DataDocumenter, + DecoratorDocumenter, + ExceptionDocumenter, + FunctionDocumenter, + MethodDocumenter, + ModuleDocumenter, + PropertyDocumenter, + ) + + documenters: list[type[Documenter]] = [ + ModuleDocumenter, + ClassDocumenter, + ExceptionDocumenter, + DataDocumenter, + FunctionDocumenter, + MethodDocumenter, + AttributeDocumenter, + DecoratorDocumenter, + PropertyDocumenter, + ] + for documenter in documenters: + app.registry.add_documenter(documenter.objtype, documenter) + + +def _underline(title: str, line: str = "=") -> str: + if "\n" in title: + msg = "Can only underline single lines" + raise ValueError(msg) + return title + "\n" + line * len(title) + + +class AutosummaryRenderer: + """A helper class for rendering.""" + + def __init__(self, app: Sphinx) -> None: + if isinstance(app, Builder): + msg = "Expected a Sphinx application object!" + raise ValueError(msg) + + # ----------- pydantic_autosummary change + system_templates_path = [os.path.join(os.path.dirname(__file__), "templates")] + # ----------/ pydantic_autosummary change + loader = SphinxTemplateLoader( + app.srcdir, app.config.templates_path, system_templates_path + ) + + self.env = SandboxedEnvironment(loader=loader) + self.env.filters["escape"] = rst.escape + self.env.filters["e"] = rst.escape + self.env.filters["underline"] = _underline + + if app.translator: + self.env.add_extension("jinja2.ext.i18n") + # ``install_gettext_translations`` is injected by the + # ``jinja2.ext.i18n`` extension + self.env.install_gettext_translations(app.translator) # type: ignore[attr-defined] + + def render(self, template_name: str, context: dict) -> str: + """Render a template file.""" + try: + template = self.env.get_template(template_name) + except TemplateNotFound: + try: + # objtype is given as template_name + template = self.env.get_template("autosummary/%s.rst" % template_name) + except TemplateNotFound: + # fallback to base.rst + template = self.env.get_template("autosummary/base.rst") + + return template.render(context) + + +def _split_full_qualified_name(name: str) -> tuple[str | None, str]: + """Split full qualified name to a pair of modname and qualname. + + A qualname is an abbreviation for "Qualified name" introduced at PEP-3155 + (https://peps.python.org/pep-3155/). It is a dotted path name + from the module top-level. + + A "full" qualified name means a string containing both module name and + qualified name. + + .. note:: This function actually imports the module to check its existence. + Therefore you need to mock 3rd party modules if needed before + calling this function. + """ + parts = name.split(".") + for i, _part in enumerate(parts, 1): + try: + modname = ".".join(parts[:i]) + importlib.import_module(modname) + except ImportError: + if parts[: i - 1]: + return ".".join(parts[: i - 1]), ".".join(parts[i - 1 :]) + return None, ".".join(parts) + except IndexError: + pass + + return name, "" + + +# -- Generating output --------------------------------------------------------- + + +class ModuleScanner: + def __init__(self, app: Any, obj: Any) -> None: + self.app = app + self.object = obj + + def get_object_type(self, name: str, value: Any) -> str: + return get_documenter(self.app, value, self.object).objtype + + def is_skipped(self, name: str, value: Any, objtype: str) -> bool: + try: + return self.app.emit_firstresult( + "autodoc-skip-member", objtype, name, value, False, {} + ) + except Exception as exc: + # ----------- pydantic_autosummary change + logger.warning( + __( + "pydantic_autosummary: failed to determine %r to be documented, " + "the following exception was raised:\n%s" + ), + name, + exc, + type="autosummary", + ) + # ----------/ pydantic_autosummary change + return False + + def scan(self, imported_members: bool) -> list[str]: + members = [] + try: + analyzer = ModuleAnalyzer.for_module(self.object.__name__) + attr_docs = analyzer.find_attr_docs() + except PycodeError: + attr_docs = {} + + for name in members_of(self.object, self.app.config): + try: + value = safe_getattr(self.object, name) + except AttributeError: + value = None + + objtype = self.get_object_type(name, value) + if self.is_skipped(name, value, objtype): + continue + + try: + if ("", name) in attr_docs: + imported = False + elif inspect.ismodule(value): # NoQA: SIM114 + imported = True + elif safe_getattr(value, "__module__") != self.object.__name__: + imported = True + else: + imported = False + except AttributeError: + imported = False + + respect_module_all = not self.app.config.autosummary_ignore_module_all + if ( + # list all members up + imported_members + # list not-imported members + or imported is False + # list members that have __all__ set + or (respect_module_all and "__all__" in dir(self.object)) + ): + members.append(name) + + return members + + +def members_of(obj: Any, conf: Config) -> Sequence[str]: + """Get the members of ``obj``, possibly ignoring the ``__all__`` module attribute + + Follows the ``conf.autosummary_ignore_module_all`` setting. + """ + return dir(obj) if conf.autosummary_ignore_module_all else (getall(obj) or dir(obj)) + + +def generate_autosummary_content( + name: str, + obj: Any, + parent: Any, + template: AutosummaryRenderer, + template_name: str, + imported_members: bool, + app: Any, + recursive: bool, + context: dict, + modname: str | None = None, + qualname: str | None = None, +) -> str: + doc = get_documenter(app, obj, parent) + + ns: dict[str, Any] = {} + ns.update(context) + + if doc.objtype == "module": + scanner = ModuleScanner(app, obj) + ns["members"] = scanner.scan(imported_members) + + respect_module_all = not app.config.autosummary_ignore_module_all + imported_members = imported_members or ( + "__all__" in dir(obj) and respect_module_all + ) + + ns["functions"], ns["all_functions"] = _get_members( + doc, app, obj, {"function"}, imported=imported_members + ) + ns["classes"], ns["all_classes"] = _get_members( + doc, app, obj, {"class"}, imported=imported_members + ) + + # ----------- pydantic_autosummary change + ns["pydantic_models"], ns["all_pydantic_models"] = _get_members( + doc, app, obj, {"pydantic_model"}, imported=imported_members + ) + ns["pydantic_settings"], ns["all_pydantic_settings"] = _get_members( + doc, app, obj, {"pydantic_settings"}, imported=imported_members + ) + # ----------/ pydantic_autosummary change + + ns["exceptions"], ns["all_exceptions"] = _get_members( + doc, app, obj, {"exception"}, imported=imported_members + ) + ns["attributes"], ns["all_attributes"] = _get_module_attrs(name, ns["members"]) + + ispackage = hasattr(obj, "__path__") + if ispackage and recursive: + # Use members that are not modules as skip list, because it would then mean + # that module was overwritten in the package namespace + skip = ( + ns["all_functions"] + + ns["all_classes"] + + ns["all_exceptions"] + + ns["all_attributes"] + ) + + # If respect_module_all and module has a __all__ attribute, first get + # modules that were explicitly imported. Next, find the rest with the + # get_modules method, but only put in "public" modules that are in the + # __all__ list + # + # Otherwise, use get_modules method normally + if respect_module_all and "__all__" in dir(obj): + imported_modules, all_imported_modules = _get_members( + doc, app, obj, {"module"}, imported=True + ) + skip += all_imported_modules + imported_modules = [ + name + "." + modname for modname in imported_modules + ] + all_imported_modules = [ + name + "." + modname for modname in all_imported_modules + ] + public_members = getall(obj) + else: + imported_modules, all_imported_modules = [], [] + public_members = None + + modules, all_modules = _get_modules( + obj, skip=skip, name=name, public_members=public_members + ) + ns["modules"] = imported_modules + modules + ns["all_modules"] = all_imported_modules + all_modules + elif doc.objtype == "class": + ns["members"] = dir(obj) + ns["inherited_members"] = set(dir(obj)) - set(obj.__dict__.keys()) + ns["methods"], ns["all_methods"] = _get_members( + doc, app, obj, {"method"}, include_public={"__init__"} + ) + ns["attributes"], ns["all_attributes"] = _get_members( + doc, app, obj, {"attribute", "property"} + ) + + if modname is None or qualname is None: + modname, qualname = _split_full_qualified_name(name) + + if doc.objtype in ("method", "attribute", "property"): + ns["class"] = qualname.rsplit(".", 1)[0] + + # ----------- pydantic_autosummary change + if doc.objtype == "pydantic_model": + set_pydantic_model_fields(ns, obj) + # ----------/ pydantic_autosummary change + + shortname = qualname if doc.objtype == "class" else qualname.rsplit(".", 1)[-1] + + ns["fullname"] = name + ns["module"] = modname + ns["objname"] = qualname + ns["name"] = shortname + + ns["objtype"] = doc.objtype + ns["underline"] = len(name) * "=" + + return template.render(template_name or doc.objtype, ns) + + +def _skip_member(app: Sphinx, obj: Any, name: str, objtype: str) -> bool: + try: + return app.emit_firstresult( + "autodoc-skip-member", objtype, name, obj, False, {} + ) + except Exception as exc: + # ----------- pydantic_autosummary change + logger.warning( + __( + "pydantic_autosummary: failed to determine %r to be documented, " + "the following exception was raised:\n%s" + ), + name, + exc, + type="autosummary", + ) + # ----------/ pydantic_autosummary change + return False + + +def _get_class_members(obj: Any) -> dict[str, Any]: + members = sphinx.ext.autodoc.get_class_members(obj, None, safe_getattr) + return {name: member.object for name, member in members.items()} + + +def _get_module_members(app: Sphinx, obj: Any) -> dict[str, Any]: + members = {} + for name in members_of(obj, app.config): + try: + members[name] = safe_getattr(obj, name) + except AttributeError: + continue + return members + + +def _get_all_members(doc: type[Documenter], app: Sphinx, obj: Any) -> dict[str, Any]: + if doc.objtype == "module": + return _get_module_members(app, obj) + if doc.objtype == "class": + return _get_class_members(obj) + return {} + + +def _get_members( + doc: type[Documenter], + app: Sphinx, + obj: Any, + types: set[str], + *, + include_public: Set[str] = frozenset(), + imported: bool = True, +) -> tuple[list[str], list[str]]: + items: list[str] = [] + public: list[str] = [] + + all_members = _get_all_members(doc, app, obj) + for name, value in all_members.items(): + documenter = get_documenter(app, value, obj) + if documenter.objtype in types and ( + imported or getattr(value, "__module__", None) == obj.__name__ + ): + skipped = _skip_member(app, value, name, documenter.objtype) + if skipped is True: + pass + elif skipped is False: + # show the member forcedly + items.append(name) + public.append(name) + else: + items.append(name) + if name in include_public or not name.startswith("_"): + # considers member as public + public.append(name) + return public, items + + +def _get_module_attrs(name: str, members: Any) -> tuple[list[str], list[str]]: + """Find module attributes with docstrings.""" + attrs, public = [], [] + try: + analyzer = ModuleAnalyzer.for_module(name) + attr_docs = analyzer.find_attr_docs() + for namespace, attr_name in attr_docs: + if namespace == "" and attr_name in members: + attrs.append(attr_name) + if not attr_name.startswith("_"): + public.append(attr_name) + except PycodeError: + pass # give up if ModuleAnalyzer fails to parse code + return public, attrs + + +def _get_modules( + obj: Any, + *, + skip: Sequence[str], + name: str, + public_members: Sequence[str] | None = None, +) -> tuple[list[str], list[str]]: + items: list[str] = [] + public: list[str] = [] + for _, modname, _ispkg in pkgutil.iter_modules(obj.__path__): + if modname in skip: + # module was overwritten in __init__.py, so not accessible + continue + fullname = name + "." + modname + try: + module = import_module(fullname) + if module and hasattr(module, "__sphinx_mock__"): + continue + except ImportError: + pass + + items.append(fullname) + if public_members is not None: + if modname in public_members: + public.append(fullname) + else: + if not modname.startswith("_"): + public.append(fullname) + return public, items + + +def generate_autosummary_docs( + sources: list[str], + output_dir: str | os.PathLike[str] | None = None, + suffix: str = ".rst", + base_path: str | os.PathLike[str] | None = None, + imported_members: bool = False, + app: Any = None, + overwrite: bool = True, + encoding: str = "utf-8", +) -> None: + showed_sources = sorted(sources) + if len(showed_sources) > 20: + showed_sources = showed_sources[:10] + ["..."] + showed_sources[-10:] + + # ----------- pydantic_autosummary change + logger.info( + __("[pydantic_autosummary] generating autosummary for: %s") + % ", ".join(showed_sources) + ) + if output_dir: + logger.info(__("[pydantic_autosummary] writing to %s") % output_dir) + # ----------/ pydantic_autosummary change + + if base_path is not None: + sources = [os.path.join(base_path, filename) for filename in sources] + + template = AutosummaryRenderer(app) + + # read + items = find_autosummary_in_files(sources) + + # keep track of new files + new_files = [] + + filename_map = app.config.autosummary_filename_map if app else {} + + # write + for entry in sorted(set(items), key=str): + if entry.path is None: + # The corresponding autosummary:: directive did not have + # a :toctree: option + continue + + path = output_dir or os.path.abspath(entry.path) + ensuredir(path) + + try: + name, obj, parent, modname = import_by_name(entry.name) + qualname = name.replace(modname + ".", "") + except ImportExceptionGroup as exc: + try: + # try to import as an instance attribute + name, obj, parent, modname = import_ivar_by_name(entry.name) + qualname = name.replace(modname + ".", "") + except ImportError as exc2: + if exc2.__cause__: + exceptions: list[BaseException] = [*exc.exceptions, exc2.__cause__] + else: + exceptions = [*exc.exceptions, exc2] + + errors = list({f"* {type(e).__name__}: {e}" for e in exceptions}) + # ----------- pydantic_autosummary change + logger.warning( + __( + "[pydantic_autosummary] failed to import %s.\n" + "Possible hints:\n%s" + ), + entry.name, + "\n".join(errors), + ) + # ----------/ pydantic_autosummary change + continue + + context: dict[str, Any] = {} + if app: + context.update(app.config.autosummary_context) + + content = generate_autosummary_content( + name, + obj, + parent, + template, + entry.template, + imported_members, + app, + entry.recursive, + context, + modname, + qualname, + ) + + filename = os.path.join(path, filename_map.get(name, name) + suffix) + if os.path.isfile(filename): + with open(filename, encoding=encoding) as f: + old_content = f.read() + + if content == old_content: + continue + if overwrite: # content has changed + with open(filename, "w", encoding=encoding) as f: + f.write(content) + new_files.append(filename) + else: + with open(filename, "w", encoding=encoding) as f: + f.write(content) + new_files.append(filename) + + # descend recursively to new files + if new_files: + generate_autosummary_docs( + new_files, + output_dir=output_dir, + suffix=suffix, + base_path=base_path, + imported_members=imported_members, + app=app, + overwrite=overwrite, + ) + + +# -- Finding documented entries in files --------------------------------------- + + +def find_autosummary_in_files(filenames: list[str]) -> list[AutosummaryEntry]: + """Find out what items are documented in source/*.rst. + + See `find_autosummary_in_lines`. + """ + documented: list[AutosummaryEntry] = [] + for filename in filenames: + with open(filename, encoding="utf-8", errors="ignore") as f: + lines = f.read().splitlines() + documented.extend(find_autosummary_in_lines(lines, filename=filename)) + return documented + + +def find_autosummary_in_docstring( + name: str, + filename: str | None = None, +) -> list[AutosummaryEntry]: + """Find out what items are documented in the given object's docstring. + + See `find_autosummary_in_lines`. + """ + try: + real_name, obj, parent, modname = import_by_name(name) + lines = pydoc.getdoc(obj).splitlines() + return find_autosummary_in_lines(lines, module=name, filename=filename) + except AttributeError: + pass + except ImportExceptionGroup as exc: + errors = "\n".join({f"* {type(e).__name__}: {e}" for e in exc.exceptions}) + logger.warning(f"Failed to import {name}.\nPossible hints:\n{errors}") # NoQA: G004 + except SystemExit: + logger.warning( + "Failed to import '%s'; the module executes module level " + "statement and it might call sys.exit().", + name, + ) + return [] + + +def find_autosummary_in_lines( + lines: list[str], + module: str | None = None, + filename: str | None = None, +) -> list[AutosummaryEntry]: + """Find out what items appear in autosummary:: directives in the + given lines. + + Returns a list of (name, toctree, template) where *name* is a name + of an object and *toctree* the :toctree: path of the corresponding + autosummary directive (relative to the root of the file name), and + *template* the value of the :template: option. *toctree* and + *template* ``None`` if the directive does not have the + corresponding options set. + """ + autosummary_re = re.compile(r"^(\s*)\.\.\s+autosummary::\s*") + automodule_re = re.compile(r"^\s*\.\.\s+automodule::\s*([A-Za-z0-9_.]+)\s*$") + module_re = re.compile(r"^\s*\.\.\s+(current)?module::\s*([a-zA-Z0-9_.]+)\s*$") + autosummary_item_re = re.compile(r"^\s+(~?[_a-zA-Z][a-zA-Z0-9_.]*)\s*.*?") + recursive_arg_re = re.compile(r"^\s+:recursive:\s*$") + toctree_arg_re = re.compile(r"^\s+:toctree:\s*(.*?)\s*$") + template_arg_re = re.compile(r"^\s+:template:\s*(.*?)\s*$") + + documented: list[AutosummaryEntry] = [] + + recursive = False + toctree: str | None = None + template = "" + current_module = module + in_autosummary = False + base_indent = "" + + for line in lines: + if in_autosummary: + m = recursive_arg_re.match(line) + if m: + recursive = True + continue + + m = toctree_arg_re.match(line) + if m: + toctree = m.group(1) + if filename: + toctree = os.path.join(os.path.dirname(filename), toctree) + continue + + m = template_arg_re.match(line) + if m: + template = m.group(1).strip() + continue + + if line.strip().startswith(":"): + continue # skip options + + m = autosummary_item_re.match(line) + if m: + name = m.group(1).strip() + if name.startswith("~"): + name = name[1:] + if current_module and not name.startswith(current_module + "."): + name = f"{name}" + documented.append(AutosummaryEntry(name, toctree, template, recursive)) + continue + + if not line.strip() or line.startswith(base_indent + " "): + continue + + in_autosummary = False + + m = autosummary_re.match(line) + if m: + in_autosummary = True + base_indent = m.group(1) + recursive = False + toctree = None + template = "" + continue + + m = automodule_re.search(line) + if m: + current_module = m.group(1).strip() + # recurse into the automodule docstring + documented.extend( + find_autosummary_in_docstring(current_module, filename=filename) + ) + continue + + m = module_re.match(line) + if m: + current_module = m.group(2) + continue + + return documented + + +def get_parser() -> argparse.ArgumentParser: + parser = argparse.ArgumentParser( + usage="%(prog)s [OPTIONS] ...", + epilog=__("For more information, visit ."), + description=__(""" +Generate ReStructuredText using autosummary directives. + +sphinx-autogen is a frontend to sphinx.ext.autosummary.generate. It generates +the reStructuredText files from the autosummary directives contained in the +given input files. + +The format of the autosummary directive is documented in the +``sphinx.ext.autosummary`` Python module and can be read using:: + + pydoc sphinx.ext.autosummary +"""), + ) + + parser.add_argument( + "--version", + action="version", + dest="show_version", + version="%%(prog)s %s" % __display_version__, + ) + + parser.add_argument( + "source_file", nargs="+", help=__("source files to generate rST files for") + ) + + parser.add_argument( + "-o", + "--output-dir", + action="store", + dest="output_dir", + help=__("directory to place all output in"), + ) + parser.add_argument( + "-s", + "--suffix", + action="store", + dest="suffix", + default="rst", + help=__("default suffix for files (default: " "%(default)s)"), + ) + parser.add_argument( + "-t", + "--templates", + action="store", + dest="templates", + default=None, + help=__("custom template directory (default: " "%(default)s)"), + ) + parser.add_argument( + "-i", + "--imported-members", + action="store_true", + dest="imported_members", + default=False, + help=__("document imported members (default: " "%(default)s)"), + ) + parser.add_argument( + "-a", + "--respect-module-all", + action="store_true", + dest="respect_module_all", + default=False, + help=__( + "document exactly the members in module __all__ attribute. " + "(default: %(default)s)" + ), + ) + + return parser + + +def main(argv: Sequence[str] = (), /) -> None: + locale.setlocale(locale.LC_ALL, "") + sphinx.locale.init_console() + + app = DummyApplication(sphinx.locale.get_translator()) + logging.setup(app, sys.stdout, sys.stderr) # type: ignore[arg-type] + setup_documenters(app) + args = get_parser().parse_args(argv or sys.argv[1:]) + + if args.templates: + app.config.templates_path.append(path.abspath(args.templates)) + app.config.autosummary_ignore_module_all = ( # type: ignore[attr-defined] + not args.respect_module_all + ) + + generate_autosummary_docs( + args.source_file, + args.output_dir, + "." + args.suffix, + imported_members=args.imported_members, + app=app, + ) + + +if __name__ == "__main__": + main(sys.argv[1:]) diff --git a/docs/ext/pydantic_autosummary/pydantic.py b/docs/ext/pydantic_autosummary/pydantic.py new file mode 100644 index 000000000..f84e607f4 --- /dev/null +++ b/docs/ext/pydantic_autosummary/pydantic.py @@ -0,0 +1,64 @@ +from __future__ import annotations + +from enum import Enum +from typing import Any, Final, get_args, get_origin + +_DATAIO_METADATA_PACKAGE: Final = "fmu.dataio.datastructure.meta" + + +def _is_dataio(annotation: Any) -> bool: + if isinstance(annotation, str): + return annotation.startswith(_DATAIO_METADATA_PACKAGE) + return annotation.__module__.startswith(_DATAIO_METADATA_PACKAGE) + + +def _is_enum_or_enum_member(annotation: Any) -> bool: + return (isinstance(annotation, type) and issubclass(annotation, Enum)) or ( + isinstance(type(annotation), type) and issubclass(type(annotation), Enum) + ) + + +def _format_annotation(annotation: Any) -> str: + return f"{annotation.__module__}.{annotation.__qualname__}" + + +def _resolve_pydantic_field_annotations(annotation: Any) -> list[str]: + """Returns a list of Pydantic submodels used as fields for a given Pydantic + model.""" + # Enums aren't Pydantic but found in the same place + if _is_enum_or_enum_member(annotation): + return [] + + # Get the unsubscripted version of a type: for a typing object of the + # form: X[Y, Z, ...], return X. + origin = get_origin(annotation) + + annotations: list[str] = [] + if _is_dataio(annotation): + annotations.append(_format_annotation(annotation)) + if origin and _is_dataio(origin): + annotations.append(_format_annotation(origin)) + + # Get type arguments with all substitutions performed: for a typing object of the + # form: X[Y, Z, ...], return (Y, Z, ...). + for arg in get_args(annotation): + if _is_enum_or_enum_member(arg): + continue + if _is_dataio(arg): + annotations.append(_format_annotation(arg)) + # TODO: recurse into arg for things that might be nested more deeply, i.e. + # Optional[Union[List[...]]] + + return annotations + + +def set_pydantic_model_fields(ns: dict[str, Any], obj: Any) -> None: + ns["obj"] = obj + ns["is_int"] = issubclass(obj, int) + ns["is_str"] = issubclass(obj, str) + + annotations = [] + for field in obj.model_fields.values(): + annotations += _resolve_pydantic_field_annotations(field.annotation) + + ns["model_fields"] = list(set(annotations)) diff --git a/docs/ext/pydantic_autosummary/templates/autosummary/base.rst b/docs/ext/pydantic_autosummary/templates/autosummary/base.rst new file mode 100644 index 000000000..b7556ebf7 --- /dev/null +++ b/docs/ext/pydantic_autosummary/templates/autosummary/base.rst @@ -0,0 +1,5 @@ +{{ fullname | escape | underline}} + +.. currentmodule:: {{ module }} + +.. auto{{ objtype }}:: {{ objname }} diff --git a/docs/ext/pydantic_autosummary/templates/autosummary/class.rst b/docs/ext/pydantic_autosummary/templates/autosummary/class.rst new file mode 100644 index 000000000..0f7d6f32e --- /dev/null +++ b/docs/ext/pydantic_autosummary/templates/autosummary/class.rst @@ -0,0 +1,29 @@ +{{ fullname | escape | underline}} + +.. currentmodule:: {{ module }} + +.. autoclass:: {{ objname }} + + {% block methods %} + .. automethod:: __init__ + + {% if methods %} + .. rubric:: {{ _('Methods') }} + + .. autosummary:: + {% for item in methods %} + ~{{ name }}.{{ item }} + {%- endfor %} + {% endif %} + {% endblock %} + + {% block attributes %} + {% if attributes %} + .. rubric:: {{ _('Attributes') }} + + .. autosummary:: + {% for item in attributes %} + ~{{ name }}.{{ item }} + {%- endfor %} + {% endif %} + {% endblock %} diff --git a/docs/ext/pydantic_autosummary/templates/autosummary/module.rst b/docs/ext/pydantic_autosummary/templates/autosummary/module.rst new file mode 100644 index 000000000..e74c012f4 --- /dev/null +++ b/docs/ext/pydantic_autosummary/templates/autosummary/module.rst @@ -0,0 +1,60 @@ +{{ fullname | escape | underline}} + +.. automodule:: {{ fullname }} + + {% block attributes %} + {% if attributes %} + .. rubric:: {{ _('Module Attributes') }} + + .. autosummary:: + {% for item in attributes %} + {{ item }} + {%- endfor %} + {% endif %} + {% endblock %} + + {% block functions %} + {% if functions %} + .. rubric:: {{ _('Functions') }} + + .. autosummary:: + {% for item in functions %} + {{ item }} + {%- endfor %} + {% endif %} + {% endblock %} + + {% block classes %} + {% if classes %} + .. rubric:: {{ _('Classes') }} + + .. autosummary:: + {% for item in classes %} + {{ item }} + {%- endfor %} + {% endif %} + {% endblock %} + + {% block exceptions %} + {% if exceptions %} + .. rubric:: {{ _('Exceptions') }} + + .. autosummary:: + {% for item in exceptions %} + {{ item }} + {%- endfor %} + {% endif %} + {% endblock %} + +{% block modules %} +{% if modules %} +.. rubric:: Modules + +.. autosummary:: + :toctree: + :recursive: +{% for item in modules %} + {{ item }} +{%- endfor %} +{% endif %} +{% endblock %} diff --git a/docs/ext/pydantic_autosummary/templates/autosummary/pydantic_model.rst b/docs/ext/pydantic_autosummary/templates/autosummary/pydantic_model.rst new file mode 100644 index 000000000..dc62c4976 --- /dev/null +++ b/docs/ext/pydantic_autosummary/templates/autosummary/pydantic_model.rst @@ -0,0 +1,52 @@ +{{ name | escape | underline}} + +.. currentmodule:: {{ module }} + +.. autosummary:: + :toctree: {{ name }} + :recursive: + + .. toctree:: + :maxdepth: -1 + +{% block model_fields %} +{% for field in model_fields %} + ~{{ field }} +{% endfor %} +{% endblock %} + +.. autopydantic_model:: {{ objname }} + :members: + :inherited-members: BaseModel + :model-show-config-summary: False + :model-show-json: False + :model-show-validator-members: False + :model-show-validator-summary: False + :field-list-validators: False + :special-members: __call__, __add__, __mul__ + + {% block methods %} + {% if methods %} + .. rubric:: {{ _('Methods') }} + + .. autosummary:: + :nosignatures: + {% for item in methods %} + {%- if not item.startswith('_') %} + ~{{ name }}.{{ item }} + {%- endif -%} + {%- endfor %} + {% endif %} + {% endblock %} + + {% block attributes %} + {% if attributes %} + .. rubric:: {{ _('Attributes') }} + + .. autosummary:: + {% for item in attributes %} + ~{{ name }}.{{ item }} + {%- endfor %} + {% endif %} + {% endblock %} + diff --git a/docs/gen-datastructure.py b/docs/gen-datastructure.py deleted file mode 100644 index 9586e7108..000000000 --- a/docs/gen-datastructure.py +++ /dev/null @@ -1,32 +0,0 @@ -# Tool for building the datastructure.rst file. -# Ex. usage: python3 docs/gen-datastructure.py > docs/datastructure.rst - -from __future__ import annotations - -import inspect - -from fmu.dataio.datastructure.meta import content, meta, specification -from pydantic import BaseModel, RootModel - - -def pydantic_members(m): - for name, obj in inspect.getmembers(m): - if ( - inspect.isclass(obj) - and issubclass(obj, (RootModel, BaseModel)) - and obj.__module__.startswith("fmu.dataio.datastructure") - ): - yield obj.__module__, name - - -if __name__ == "__main__": - print( - """.. Do not modifly this file manuely, see: docs/gen-datastructure.py -Meta export datastructures -==========================\n\n""" - ) - - settings = "\n".join((" :model-show-json: false",)) - for module in (meta, content, specification): - for module_path, name in pydantic_members(module): - print(f".. autopydantic_model:: {module_path}.{name}\n{settings}\n") diff --git a/docs/conf.py b/docs/src/conf.py similarity index 85% rename from docs/conf.py rename to docs/src/conf.py index 2bfc14784..d4a1210b9 100755 --- a/docs/conf.py +++ b/docs/src/conf.py @@ -4,17 +4,17 @@ # pylint: skip-file import os import sys - -cwd = os.getcwd() -project_root = os.path.dirname(cwd) + "/src/fmu" -sys.path.insert(0, project_root) -print(sys.path) +from pathlib import Path from datetime import date import fmu.dataio import fmu.dataio.dataio +sys.path.insert(0, os.path.abspath("../../src/fmu")) +sys.path.insert(1, os.path.abspath("../ext")) + + # -- General configuration --------------------------------------------- # The full version, including alpha/beta/rc tags. @@ -22,24 +22,35 @@ extensions = [ "myst_parser", - "sphinxcontrib.apidoc", - "sphinx.ext.viewcode", - "sphinx.ext.napoleon", - "sphinx.ext.autosummary", - "sphinx.ext.mathjax", + "pydantic_autosummary", "sphinx.ext.autodoc", + "sphinx.ext.mathjax", + "sphinx.ext.napoleon", + "sphinx.ext.viewcode", "sphinx_autodoc_typehints", - "sphinxcontrib.autodoc_pydantic", "sphinx_togglebutton", + "sphinxcontrib.apidoc", + "sphinxcontrib.autodoc_pydantic", ] autosummary_generate = True +autosummary_imported_members = True +add_module_names = False togglebutton_hint = "Expand" -apidoc_module_dir = "../src/fmu/dataio" +apidoc_module_dir = "../../src/fmu/dataio" apidoc_output_dir = "apiref" -apidoc_excluded_paths = ["tests"] +apidoc_excluded_paths = [ + "case", + "datastructure", + "hook_implementations", + "providers", + "scripts", + "tests", + "types", + "version", +] apidoc_separate_modules = True apidoc_module_first = True apidoc_extra_args = ["-H", "API reference for fmu.dataio"] @@ -59,22 +70,23 @@ current_year = date.today().year copyright = f"Equinor {current_year} (fmu-dataio release {release})" - # Sort members by input order in classes autodoc_member_order = "bysource" autodoc_default_flags = ["members", "show_inheritance"] # Mocking ert module -autodoc_mock_imports = ["ert"] +autodoc_mock_imports = ["ert", "pydantic"] exclude_patterns = ["_build"] pygments_style = "sphinx" html_theme = "sphinx_rtd_theme" - html_theme_options = { "style_nav_header_background": "#C0C0C0", + "navigation_depth": -1, + "collapse_navigation": False, + "titles_only": True, } diff --git a/docs/src/contributing.rst b/docs/src/contributing.rst new file mode 100644 index 000000000..7a661a306 --- /dev/null +++ b/docs/src/contributing.rst @@ -0,0 +1,2 @@ +.. include:: ../../CONTRIBUTING.md + :parser: myst_parser.sphinx_ diff --git a/docs/src/datamodel/index.rst b/docs/src/datamodel/index.rst new file mode 100644 index 000000000..f91b3db0f --- /dev/null +++ b/docs/src/datamodel/index.rst @@ -0,0 +1,385 @@ +The FMU results data model +########################## + +This section describes the data model used for FMU results when exporting with +fmu-dataio. For the time being, the data model is hosted as part of fmu-dataio. + +The data model described herein is new and shiny, and experimental in many aspects. +Any feedback on this is greatly appreciated. The most effective feedback is to apply +the data model, then use the resulting metadata. + +The FMU data model is described using a `Pydantic `__ model +which programmatically generates a `JSON Schema `__. + +This schema contains rules and definitions for all attributes in the data model. This +means, in practice, that outgoing metadata from FMU needs to comply with the schema. +If data is uploaded to e.g. Sumo, validation will be done on the incoming data to ensure +consistency. + + +Data model documentation +======================== + +There are two closely related data models represented here: metadata generated from +an FMU realization and metadata generated on a case level. The structure and +documentation of these two models can be inspected from here. + +.. autosummary:: + :toctree: model/ + :recursive: + + + .. toctree:: + :maxdepth: -1 + + fmu.dataio.datastructure.meta.meta.FMUDataClassMeta + fmu.dataio.datastructure.meta.meta.FMUCaseClassMeta + + +About the data model +==================== + +Why is it made? +--------------- + +FMU is a mighty system developed by and for the subsurface community in Equinor, to +make reservoir modeling more efficient, less error-prone and more repeatable with higher quality, +mainly through automation of cross-disciplinary workflows. It combines off-the-shelf software +with in-house components such as the ERT orchestrator. + +FMU is defined more and more by the data it produces, and direct and indirect dependencies on +output from FMU is increasing. When FMU results started to be regularly transferred to cloud +storage for direct consumption from 2017/2018 and outwards, the need for stable metadata on +outgoing data became immiment. Local development on Johan Sverdrup was initiated to cater +for the digital ecosystem evolving in and around that particular project, and the need for +generalizing became apparent with the development of Sumo, Webviz and other initiatives. + +The purpose of the data model is to cater for the existing dependencies, as well as enable +more direct usage of FMU results in different contexts. The secondary objective of this +data model is to create a normalization layer between the components that create data +and the components that use those data. The data model is designed to also be adapted +to other sources of data than FMU. + +Scope of this data model +------------------------ + +This data model covers data produced by FMU workflows. This includes data generated by +direct runs of model templates, data produced by pre-processing workflows, data produced +in individual realizations or hooked workflows, and data produced by post-processing workflows. + +.. note:: + An example of a pre-processing workflow is a set of jobs modifying selected input data + for later use in the FMU workflows and/or for comparison with other results in a QC context. + +.. note:: + An example of a post-processing workflow is a script that aggregates results across many + realizations and/or iterations of an FMU case. + +This data model covers data that, in the FMU context, can be linked to a specific case. + +Note that e.g. ERT and other components will, and should, have their own data models to +cater for their needs. It is not the intention of this data model to cover all aspects +of data in the FMU context. The scope is primarily data going *out* of FMU to be used elsewhere. + + +A denormalized data model +------------------------- + +The data model used for FMU results is a denormalized data model, at least to a certain +point. This means that the static data will be repeated many times. Example: Each exported data object contains +basic information about the FMU case it belongs to, such as a unique ID for this case, +its name, the user that made it, which model template was used, etc. This information +if stored in *every* exported .yml file. This may seem counterintuitive, and differs +from a relational database (where this information would typically be stored once, and +referred to when needed). + +There are a few reasons for choosing a denormalized data model: + +First, the components for creating a relational database containing these data is not and would +be extremely difficult to implement fast. Also, the nature of data in an FMU context is very distributed, +with lots of files spread across many files and folders (currently). + +Second, a denormalized data model enables us to utilize search engine technologies for +for indexing. This is not efficient for a normalized data model. The penalty for +duplicating metadata across many individual files is returned in speed and ease-of-use. + +.. note:: + The data model is only denormalized *to a certain point*. Most likely, it is better + described as a hybrid. Example: The concept of a *case* is used in FMU context. In the + outgoing metadata for FMU results, some information about the current case is included. + However, *details* about the case is out of scope. For this, a consumer would have to + refer to the owner of the *case* definition. In FMU contexts, this will be the workflow + manager (ERT). + + +Standardized vs anarchy +----------------------- + +Creating a data model for FMU results brings with it some standard. In essence, this +represents the next evolution of the existing FMU standard. We haven't called it "FMU standard 2.0" +because although this would ressonate with many people, many would find it revolting. But, +sure, if you are so inclined you are allowed to think of it this way. The FMU standard 1.0 +is centric around folder structure and file names - a pre-requisite for standardizing for +the good old days when files where files, folders were folders, and data could be consumed +by double-clicking. Or, by traversing the mounted file system. + +With the transition to a cloud-native state comes numerous opportunities - but also great +responsibilities. Some of them are visible in the data model, and the data model is in itself +a testament to the most important of them: We need to get our data straight. + +There are many challenges. Aligning with everyone and everything is one. We probably don't +succeed with that in the first iteration(s). Materializing metadata effectively, and without +hassle, during FMU runs (meaning that *everything* must be *fully automated* is another. This +is what fmu-dataio solves. But, finding the balance between *retaining flexibility* and +*enforcing a standard* is perhaps the most tricky of all. + +This data model has been designed with the great flexibility of FMU in mind. If you are +a geologist on an asset using FMU for something important, you need to be able to export +any data from *your* workflow and *use that data* without having to wait for someone else +to rebuild something. For FMU, one glove certainly does not fit all, and this has been +taken into account. While the data model and the associated validation will set some requirements +that you need to follow, you are still free to do more or less what you want. + +We do, however, STRONGLY ENCOURAGE you to not invent too many private wheels. The risk +is that your data cannot be used by others. + +The materialized metadata has a nested structure which can be represented by Python +dictionaries, yaml or json formats. The root level only contains key attributes, where +most are nested sub-dictionaries. + + +Relations to other data models +------------------------------ + +The data model for FMU results is designed with generalization in mind. While in practice +this data model cover data produced by, or in direct relations to, an FMU workflow - in +*theory* it relates more to *subsurface predictive modeling* generally, than FMU specifically. + +In Equinor, FMU is the primary system for creating, maintaining and using 3D predictive +numerical models for the subsurface. Therefore, FMU is the main use case for this data model. + +There are plenty of other data models in play in the complex world of subsurface predictive modeling. +Each software applies its own data model, and in FMU this encompasses multiple different systems. + +Similarly, there are other data models in the larger scope where FMU workflows represent +one out of many providors/consumers of data. A significant motivation for defining this +data model is to ensure consistency towards other systems and enable stable conditions for integration. + +fmu-dataio has three important roles in this context: + +* Be a translating layer between individual softwares' data models and the FMU results data model. +* Enable fully-automated materialization of metadata during FMU runs (hundreds of thousands of files being made) +* Abstract the FMU results data model through Python methods and functions, allowing them to be embedded into other systems - helping maintain a centralized definition of this data model. + + +The parent/child principle +-------------------------- + +In the FMU results data model, the traditional hierarchy of an FMU setup is not continued. +An individual file produced by an FMU workflow and exported to disk can be seen in +relations to a hiearchy looking something like this: case > iteration > realization > file + +Many reading this will instinctively disagree with this definition, and significant confusion +arises from trying to have meaningful discussions around this. There is no +unified definition of this hierarchy (despite many *claiming to have* such a definition). + +In the FMU results data model, this hiearchy is flattened down to two levels: +The Parent (*case*) and children to that parent (*files*). From this, it follows that the +most fundamental definition in this context is a *case*. To a large degree, this definition +belongs to the ERT workflow manager in the FMU context. For now, however, the case definitions +are extracted by-proxy from the file structure and from arguments passed to fmu-dataio. + +Significant confusion can *also* arise from discussing the definition of a case, and the +validity of this hiearchy, of course. But consensus (albeit probably local minima) is that +this serves the needs. + +Each file produced *in relations to* an FMU case (meaning *before*, *during* or *after*) is tagged +with information about the case - signalling that *this entity* belongs to *this case*. It is not +the intention of the FMU results data model to maintain *all* information about a case, and +in the future it is expected that ERT will serve case information beyond the basics. + +.. note:: + + **Dot-annotation** - we like it and use it. This is what it means: + + The metadata structure is a dictionary-like structure, e.g. + + .. code-block:: json + + { + "myfirstkey": { + "mykey": "myvalue", + "anotherkey": "anothervalue" + } + } + + Annotating tracks along a dictionary can be tricky. With dot-annotation, we can refer to ```mykey``` in the example above as ``myfirstkey.mykey``. This will be a pointer to ``myvalue`` in this case. You will see dot annotation in the explanations of the various metadata blocks below: Now you know what it means! + +Weaknesses +---------- + +**uniqueness** +The data model currently has challenges wrt ensuring uniqueness. Uniqueness is a challenge +in this context, as a centralized data model cannot (and should not!) dictate in detail nor +define in detail which data an FMU user should be able to export from local workflows. + +**understanding validation errors** +When validating against the current schema, understanding the reasons for non-validation +can be tricky. The root cause of this is the use of conditional logic in the schemas - +a functionality JSON Schema is not designed for. See `Logical rules`_. + + +Logical rules +------------- + +The schema contains some logical rules which are applied during validation. These are +rules of type "if this, then that". They are, however, not explicitly written (nor readable) +as such directly. This type of logic is implemented in the schema by explicitly generating +subschemas that A) are only valid for specific conditions, and B) contain requirements for +that specific situation. In this manner, one can assure that if a specific condition is +met, the associated requirements for that condition is used. + +Example: + + .. code-block:: json + + "oneOf": [ + { + "$comment": "Conditional schema A - 'if class == case make myproperty required'", + "required": [ + "myproperty" + ], + "properties": { + "class": { + "enum": ["case"] + }, + "myproperty": { + "type": "string", + "example": "sometext" + } + } + }, + { + "$comment": "Conditional schema B - 'if class != case do NOT make myproperty required'", + "properties": { + "myproperty": { + "type": "string", + "example": "sometext" + }, + } + ] + + +For metadata describing a ``case``, requirements are different compared to metadata describing data objects. + +For selected contents, a content-specific block under **data** is required. This is implemented for +"fluid_contact", "field_outline" and "seismic". + + +Validation of data +================== + +When fmu-dataio exports data from FMU workflows, it produces a pair of data + metadata. The two are +considered one entity. Data consumers who wish to validate the correct match of data and metadata can +do so by verifying recreation of ``file.checksum_md5`` on the data object only. Metadata is not considered +when generating the checksum. + +This checksum is the string representation of the hash created using RSA's ``MD5`` algorithm. This hash +was created from the _file_ that fmu-dataio exported. In most cases, this is the same file that are +provided to consumer. However, there are some exceptions: +- Seismic data may be transformed to other formats when stored out of FMU context and the checksum may +be invalid. + + +Changes and revisions +===================== + +The only constant is change, as we know, and in the case of the FMU results data model - definitely so. +The learning component here is huge, and there will be iterations. This poses a challenge, given that +there are existing dependencies on top of this data model already, and more are arriving. + +To handle this, two important concepts has been introduced. + +1) **Versioning**. The current version of the FMU metadata is 0.8.0. This version is likely to remain for a while. (We have not yet figured out how to best deal with versioning. Have good ideas? Bring them!) +2) **Contractual attributes**. Within the FMU ecosystem, we need to retain the ability to do rapid changes to the data model. As we are in early days, unknowns will become knowns and unknown unknowns will become known unknowns. However, from the outside perspective some stability is required. Therefore, we have labelled some key attributes as *contractual*. They are listed at the top of the schema. This is not to say that they will never change - but they should not change erratically, and when we need to change them, this needs to be subject to alignment. + + +Contractual attributes +---------------------- + +The following attributes are contractual: + +* class +* source +* version +* tracklog +* data.format +* data.name +* data.stratigraphic +* data.alias +* data.stratigraphic_alias +* data.offset +* data.content +* data.vertical_domain +* data.grid_model +* data.bbox +* data.is_prediction +* data.is_observation +* data.seismic.attribute +* access +* masterdata +* fmu.model +* fmu.workflow +* fmu.case +* fmu.iteration +* fmu.realization.name +* fmu.realization.id +* fmu.realization.uuid +* fmu.aggregation.operation +* fmu.aggregation.realization_ids +* file.relative_path +* file.checksum_md5 + + +Metadata example +================ + +Expand below to see a full example of valid metadata for surface exported from FMU. + +.. toggle:: + + .. literalinclude:: ../../schema/definitions/0.8.0/examples/surface_depth.yml + :language: yaml + +| + +You will find more examples in `fmu-dataio github repository `__. + + +FAQ +=== + +We won't claim that these questions are really very *frequently* asked, but these are some +key questions you may have along the way. + +**My existing FMU workflow does not produce any metadata. Now I am told that it has to. What do I do?** +First step: Start using fmu-dataio in your workflow. You will get a lot for free using it, amongst +other things, metadata will start to appear from your workflow. To get started with fmu-dataio, +see `the overview section `__. + +**This data model is not what I would have chosen. How can I change it?** +The FMU community (almost always) builds what the FMU community wants. The first step +would be to define what you are unhappy with, preferably formulated as an issue in the +`fmu-dataio github repository `__. +(If your comments are Equinor internal, please reach out to either Per Olav (peesv) or Jan (jriv).) + +**This data model allows me to create a smashing data visualisation component, but I fear that it +is so immature that it will not be stable - will it change all the time?** +Yes, and no. It is definitely experimental and these are early days. Therefore, changes +will occur as learning is happening. Part of that learning comes from development of +components utilizing the data model, so your feedback may contribute to evolving this +data model. However, you should not expact erratic changes. The concept of Contractual attributes +are introduced for this exact purpose. We have also chosen to version the metadata - partly to +clearly separate from previous versions, but also for allowing smooth evolution going forward. +We don't yet know *exactly* how this will be done in practice, but perhaps you will tell us! + diff --git a/docs/example_surface.yml b/docs/src/example_surface.yml similarity index 100% rename from docs/example_surface.yml rename to docs/src/example_surface.yml diff --git a/docs/example_surface_v070.yml b/docs/src/example_surface_v070.yml similarity index 100% rename from docs/example_surface_v070.yml rename to docs/src/example_surface_v070.yml diff --git a/docs/examples.rst b/docs/src/examples.rst similarity index 70% rename from docs/examples.rst rename to docs/src/examples.rst index 6dcc692e0..098263370 100644 --- a/docs/examples.rst +++ b/docs/src/examples.rst @@ -17,7 +17,7 @@ This is a snippet of the ``global_variables.yml`` file which holds the static me .. toggle:: - .. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/fmuconfig/output/global_variables.yml + .. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/fmuconfig/output/global_variables.yml :language: yaml | @@ -28,14 +28,14 @@ Exporting fault polygons Python script ~~~~~~~~~~~~~ -.. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/rms/bin/export_faultpolygons.py +.. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/rms/bin/export_faultpolygons.py :language: python Press + to see generated YAML file. .. toggle:: - .. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/share/results/polygons/.volantis_gp_top--faultlines.pol.yml + .. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/share/results/polygons/.volantis_gp_top--faultlines.pol.yml :language: yaml | @@ -46,7 +46,7 @@ Exporting average maps from grid properties Python script ~~~~~~~~~~~~~ -.. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/rms/bin/export_propmaps.py +.. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/rms/bin/export_propmaps.py :language: python @@ -54,7 +54,7 @@ Press + to see generated YAML file for metadata. .. toggle:: - .. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/share/results/maps/.therys--average_porosity.gri.yml + .. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/share/results/maps/.therys--average_porosity.gri.yml :language: yaml | @@ -65,7 +65,7 @@ Exporting 3D grids with properties Python script ~~~~~~~~~~~~~ -.. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/any/bin/export_grid3d.py +.. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/any/bin/export_grid3d.py :language: python Press + to see generated YAML files for metadata. @@ -73,12 +73,12 @@ Press + to see generated YAML files for metadata. .. toggle:: - .. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/share/results/grids/.geogrid.roff.yml + .. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/share/results/grids/.geogrid.roff.yml :language: yaml .. toggle:: - .. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/share/results/grids/.facies.roff.yml + .. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/share/results/grids/.geogrid--facies.roff.yml :language: yaml | @@ -89,12 +89,12 @@ Exporting volume tables RMS or file Python script ~~~~~~~~~~~~~ -.. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/any/bin/export_volumetables.py +.. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/any/bin/export_volumetables.py :language: python .. toggle:: - .. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/share/results/tables/.geogrid--volumes.csv.yml + .. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/share/results/tables/.geogrid--volumes.csv.yml :language: yaml | @@ -108,12 +108,12 @@ The FaultRoom plugin for RMS produces special json files that e.g. can be viewed Python script ~~~~~~~~~~~~~ -.. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/rms/bin/export_faultroom_surfaces.py +.. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/rms/bin/export_faultroom_surfaces.py :language: python .. toggle:: - .. literalinclude:: ../examples/s/d/nn/xcase/realization-0/iter-0/share/results/maps/volantis_gp_top--faultroom_d1433e1.json + .. literalinclude:: ../../examples/s/d/nn/xcase/realization-0/iter-0/share/results/maps/volantis_gp_top--faultroom_d1433e1.json :language: yaml | @@ -146,7 +146,7 @@ robust by centralizing the definitions and handling of metadata. Python script ~~~~~~~~~~~~~ -.. literalinclude:: ../examples/s/d/nn/_project/aggregate_surfaces.py +.. literalinclude:: ../../examples/s/d/nn/_project/aggregate_surfaces.py :language: python | diff --git a/docs/index.rst b/docs/src/index.rst similarity index 98% rename from docs/index.rst rename to docs/src/index.rst index 531169e09..2d957bca4 100644 --- a/docs/index.rst +++ b/docs/src/index.rst @@ -41,7 +41,4 @@ post-processing services, new and improved cloud-only version of Webviz and much preparations examples apiref/modules - datamodel - datastructure - - + datamodel/index diff --git a/docs/installation.rst b/docs/src/installation.rst similarity index 100% rename from docs/installation.rst rename to docs/src/installation.rst diff --git a/docs/overview.rst b/docs/src/overview.rst similarity index 99% rename from docs/overview.rst rename to docs/src/overview.rst index c6e5ca4e2..29af7ca0b 100644 --- a/docs/overview.rst +++ b/docs/src/overview.rst @@ -65,4 +65,4 @@ is stored in *every* exported .yml file. This may seem counter-intuitive and dif from a relational database (where this information would typically be stored once, and referred to when needed). -The FMU results data model is further documented `here <./datamodel.html>`__ +The FMU results data model is further documented `here <./datamodel/index.html>`__. diff --git a/docs/preparations.rst b/docs/src/preparations.rst similarity index 100% rename from docs/preparations.rst rename to docs/src/preparations.rst diff --git a/pyproject.toml b/pyproject.toml index 3325599c4..53bb8e7b4 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -71,7 +71,7 @@ docs = [ "sphinx-autodoc-typehints<1.23", "sphinx-rtd-theme", "sphinx-togglebutton", - "Sphinx<7", + "Sphinx", "sphinxcontrib-apidoc", "urllib3<1.27", ] diff --git a/src/fmu/dataio/datastructure/__init__.py b/src/fmu/dataio/datastructure/__init__.py new file mode 100644 index 000000000..e69de29bb diff --git a/src/fmu/dataio/datastructure/_internal/internal.py b/src/fmu/dataio/datastructure/_internal/internal.py index a10c96f7e..d0e568d7d 100644 --- a/src/fmu/dataio/datastructure/_internal/internal.py +++ b/src/fmu/dataio/datastructure/_internal/internal.py @@ -12,8 +12,6 @@ from textwrap import dedent from typing import List, Literal, Optional, Union -from fmu.dataio._definitions import SCHEMA, SOURCE, VERSION -from fmu.dataio.datastructure.meta import meta from pydantic import ( AnyHttpUrl, BaseModel, @@ -22,6 +20,9 @@ model_validator, ) +from fmu.dataio._definitions import SCHEMA, SOURCE, VERSION +from fmu.dataio.datastructure.meta import meta + def property_warn() -> None: warnings.warn( diff --git a/src/fmu/dataio/datastructure/configuration/global_configuration.py b/src/fmu/dataio/datastructure/configuration/global_configuration.py index 141bcc03d..319303a6b 100644 --- a/src/fmu/dataio/datastructure/configuration/global_configuration.py +++ b/src/fmu/dataio/datastructure/configuration/global_configuration.py @@ -9,7 +9,6 @@ import warnings from typing import Any, Dict, List, Optional -from fmu.dataio.datastructure.meta import enums, meta from pydantic import ( BaseModel, Field, @@ -19,6 +18,8 @@ model_validator, ) +from fmu.dataio.datastructure.meta import enums, meta + def validation_error_warning(err: ValidationError) -> None: """