Skip to content

Commit

Permalink
pydantic v2 (#9)
Browse files Browse the repository at this point in the history
* output the models in pydantic v2

* add type and template info to models

* move json->python config to yaml

* test yaml file

* update excel read and write to use versions

* shade metadata page even if no metadata
  • Loading branch information
gblackadder authored Dec 6, 2024
1 parent 07d1802 commit ff07496
Show file tree
Hide file tree
Showing 30 changed files with 626 additions and 255 deletions.
10 changes: 6 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,19 +97,21 @@ microdata_metadata.study_desc.title_statement.idno = "project_idno"

## Updating Schemas

First create a branch from the main branch.
First create a branch from the main branch. Branch names should follow the pattern 'schema/\<your user name\>/\<short description of change\>'.

Then make the change you want to the json schema in the schemas folder.

Then in pyproject.toml update the version number, changing either the major, minor or patch number as appropriate.
Then in pyproject.toml update the version number, changing either the major, minor or patch number as appropriate given the conventions below.

After, update the version number of the **specific schema you updated** in the json_to_python_config.yaml file to match the version number in pyproject.toml.

Next update the pydantic schemas so that they match the latest json schemas by running

`python pydantic_schemas/generators/generate_pydantic_schemas.py`
python pydantic_schemas/generators/generate_pydantic_schemas.py

Finally update the Excel sheets by running

`python -m pydantic_schemas.generators.generate_excel_files`
python -m pydantic_schemas.generators.generate_excel_files

## Versioning conventions for schemas

Expand Down
Binary file modified excel_sheets/Document_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Geospatial_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Image_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Indicator_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Indicators_db_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Microdata_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Resource_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Script_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Table_metadata.xlsx
Binary file not shown.
Binary file modified excel_sheets/Video_metadata.xlsx
Binary file not shown.
59 changes: 59 additions & 0 deletions json_to_python_config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
document:
version: 0.1.0
json_file: document-schema.json
python_file: document_schema.py
model_name: ScriptSchemaDraft

geospatial:
version: 0.1.0
json_file: geospatial-schema.json
python_file: geospatial_schema.py
model_name: GeospatialSchema

image:
version: 0.1.0
json_file: image-schema.json
python_file: image_schema.py
model_name: ImageDataTypeSchema

microdata:
version: 0.1.0
json_file: microdata-schema.json
python_file: microdata_schema.py
model_name: DdiSchema

resource:
version: 0.1.0
json_file: resource-schema.json
python_file: resource_schema.py
model_name: Model

script:
version: 0.1.0
json_file: script-schema.json
python_file: script_schema.py
model_name: ResearchProjectSchemaDraft

table:
version: 0.1.0
json_file: table-schema.json
python_file: table_schema.py
model_name: Model

indicators_db:
version: 0.1.0
json_file: timeseries-db-schema.json
python_file: indicators_db_schema.py
model_name: TimeseriesDatabaseSchema

indicator:
version: 0.1.0
json_file: timeseries-schema.json
python_file: indicator_schema.py
model_name: TimeseriesSchema

video:
version: 0.1.0
json_file: video-schema.json
python_file: video_schema.py
model_name: Model
17 changes: 10 additions & 7 deletions pydantic_schemas/document_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from enum import Enum
from typing import Any, Dict, List, Optional

from pydantic import Extra, Field
from pydantic import ConfigDict, Field

from .utils.schema_base_model import SchemaBaseModel

Expand All @@ -23,9 +23,9 @@ class MetadataInformation(SchemaBaseModel):
Document description
"""

class Config:
extra = Extra.forbid

model_config = ConfigDict(
extra="forbid",
)
title: Optional[str] = Field(None, description="Document title", title="Document title")
idno: Optional[str] = Field(None, title="Unique ID number for the document")
producers: Optional[List[Producer]] = Field(None, description="List of producers", title="Producers")
Expand Down Expand Up @@ -299,9 +299,9 @@ class DocumentDescription(SchemaBaseModel):
Document Description
"""

class Config:
extra = Extra.forbid

model_config = ConfigDict(
extra="forbid",
)
title_statement: TitleStatement = Field(..., description="Study title")
authors: Optional[List[Author]] = Field(None, description="Authors", title="Authors")
editors: Optional[List[Editor]] = Field(None, description="Editors", title="Editors")
Expand Down Expand Up @@ -540,6 +540,9 @@ class ScriptSchemaDraft(SchemaBaseModel):
Schema for Document data type
"""

__metadata_type__ = "document"
__metadata_type_version__ = "0.1.0"

idno: Optional[str] = Field(None, description="Project unique identifier", title="Project unique identifier")
metadata_information: Optional[MetadataInformation] = Field(
None, description="Document description", title="Document metadata information"
Expand Down
4 changes: 2 additions & 2 deletions pydantic_schemas/generators/generate_excel_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,8 +30,8 @@ def compare_excel_files(file1, file2):
for row in ws1.iter_rows():
for cell in row:
cell_address = cell.coordinate
if sheet_name == "metadata" and cell_address == "C1":
continue # Skip comparison for cell C1 in 'metadata' sheet which only contains the versioning number
# if sheet_name == "metadata" and cell_address == "C1":
# continue # Skip comparison for cell C1 in 'metadata' sheet which only contains the versioning number

differences = []
if ws1[cell_address].value != ws2[cell_address].value:
Expand Down
49 changes: 32 additions & 17 deletions pydantic_schemas/generators/generate_pydantic_schemas.py
Original file line number Diff line number Diff line change
@@ -1,32 +1,33 @@
# import importlib.metadata
import os
import re
from subprocess import run

import yaml

SCHEMA_DIR = "schemas"
OUTPUT_DIR = os.path.join("pydantic_schemas")
PYTHON_VERSION = "3.11"
BASE_CLASS = ".utils.schema_base_model.SchemaBaseModel"

INPUTS_TO_OUTPUTS = {
"document-schema.json": "document_schema.py",
"geospatial-schema.json": "geospatial_schema.py",
"image-schema.json": "image_schema.py",
"microdata-schema.json": "microdata_schema.py",
"resource-schema.json": "resource_schema.py",
"script-schema.json": "script_schema.py",
"table-schema.json": "table_schema.py",
"timeseries-db-schema.json": "indicators_db_schema.py",
"timeseries-schema.json": "indicator_schema.py",
"video-schema.json": "video_schema.py",
}
# __version__ = importlib.metadata.version("metadataschemas")


if not os.path.exists(OUTPUT_DIR):
os.makedirs(OUTPUT_DIR)

for input_file, output_file in INPUTS_TO_OUTPUTS.items():
print(f"Generating pydantic schema for {input_file}")
input_path = os.path.join(SCHEMA_DIR, input_file)
output_path = os.path.join(OUTPUT_DIR, output_file).replace("-", "_")
with open("json_to_python_config.yaml", "r") as file:
data = yaml.safe_load(file)

# for json_file, (python_file, metadata_type, schema_class_name) in INPUTS_TO_OUTPUTS.items():
for section, details in data.items():
json_file = details["json_file"]
python_file = details["python_file"]
model_name = details["model_name"]
version = details["version"]

print(f"Generating pydantic schema for {json_file}")
input_path = os.path.join(SCHEMA_DIR, json_file)
output_path = os.path.join(OUTPUT_DIR, python_file).replace("-", "_")
run(
[
"datamodel-codegen",
Expand All @@ -44,7 +45,21 @@
"--disable-timestamp",
"--base-class",
BASE_CLASS,
"--output-model-type",
"pydantic_v2.BaseModel",
"--output",
output_path,
]
)

with open(output_path, "r") as file:
content = file.read()

updated_content = re.sub(
f'class {model_name}\(SchemaBaseModel\):\n( """\n.*\n """)', #
lambda match: f"""class {model_name}(SchemaBaseModel):\n{match.group(1)}\n __metadata_type__ = "{section}"\n __metadata_type_version__ = "{version}" """,
content,
)

with open(output_path, "w") as file:
file.write(updated_content)
13 changes: 8 additions & 5 deletions pydantic_schemas/geospatial_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
from enum import Enum
from typing import Any, Dict, List, Optional

from pydantic import Extra, Field, confloat
from pydantic import ConfigDict, Field, RootModel, confloat

from .utils.schema_base_model import SchemaBaseModel

Expand All @@ -23,9 +23,9 @@ class MetadataInformation(SchemaBaseModel):
Document description
"""

class Config:
extra = Extra.forbid

model_config = ConfigDict(
extra="forbid",
)
title: Optional[str] = Field(None, description="Document title", title="Document title")
idno: Optional[str] = Field(None, title="Unique ID number for the document")
producers: Optional[List[Producer]] = Field(None, description="List of producers", title="Producers")
Expand Down Expand Up @@ -1478,6 +1478,9 @@ class GeospatialSchema(SchemaBaseModel):
Geospatial draft schema
"""

__metadata_type__ = "geospatial"
__metadata_type_version__ = "0.1.0"

idno: Optional[str] = Field(None, description="Project unique identifier", title="Project unique identifier")
metadata_information: Optional[MetadataInformation] = Field(
None, description="Document description", title="Document metadata information"
Expand Down Expand Up @@ -1512,4 +1515,4 @@ class Locale(SchemaBaseModel):
)


OperationMetadata.update_forward_refs()
OperationMetadata.model_rebuild()
Loading

0 comments on commit ff07496

Please sign in to comment.