Skip to content

Commit

Permalink
Add taxonomy qna.yaml parsing API
Browse files Browse the repository at this point in the history
The parse method will yamllint and jsonschema validate a qna.yaml file.
It will return an object holding the parsed yaml.

Signed-off-by: BJ Hargrave <[email protected]>
  • Loading branch information
bjhargrave committed Jul 12, 2024
1 parent efab9b1 commit 0e1d7ff
Show file tree
Hide file tree
Showing 21 changed files with 1,034 additions and 105 deletions.
6 changes: 6 additions & 0 deletions .github/dependabot.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,3 +13,9 @@ updates:
directory: "/.github/workflows"
schedule:
interval: "daily"

# Maintain dependencies for Python code
- package-ecosystem: "pip"
directory: "/"
schedule:
interval: "daily"
2 changes: 1 addition & 1 deletion .github/workflows/lint.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:
tox -e jsonschema
- name: "ruff"
commands: |
tox -e ruff -- check
tox -e ruffcheck
- name: "pylint"
commands: |
echo "::add-matcher::.github/workflows/matchers/pylint.json"
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
# Taxonomy Schema

This Python package defines the JSON schema for the InstructLab [Taxonomy](https://github.com/instructlab/taxonomy) YAML.
This Python package defines the JSON schema and a parser for the InstructLab [Taxonomy](https://github.com/instructlab/taxonomy) YAML.

Consumers of this schema can `pip install instructlab-schema`, and access the schema files using `importlib.resources` on the `instructlab.schema` package.
Consumers of this schema can `pip install instructlab-schema`, and use the `instructlab.schema.taxonomy.TaxonomyParser` class to parse `qna.yaml` taxonomy files.
Schema files can be directly accessed using the `instructlab.schema.schema_base()` method to get access the base of the schema resources.
24 changes: 22 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# SPDX-License-Identifier: Apache-2.0

[build-system]
requires = ["setuptools>=64", "setuptools_scm>=8"]
requires = ["setuptools>=70.1.0", "setuptools_scm>=8"]
build-backend = "setuptools.build_meta"

[project]
Expand All @@ -21,7 +21,15 @@ classifiers = [
"Programming Language :: Python :: 3.11",
"Programming Language :: Python :: 3.12",
]
dynamic = ["dependencies", "optional-dependencies", "version"]
dependencies = [
"typing_extensions",
"jsonschema>=4.22.0",
"PyYAML>=6.0.0",
# The below library should NOT be imported into any python files
# We only use the command via subprocess
"yamllint>=1.35.1",
]
dynamic = ["version"]

[project.urls]
homepage = "https://instructlab.ai"
Expand All @@ -40,6 +48,7 @@ exclude = ["^src/instructlab/schema/_version\\.py$"]
target-version = "py310"
src = ["src", "tests"]
extend-exclude = ["src/instructlab/schema/_version.py"]
line-length = 180

[tool.ruff.lint]
select = [
Expand All @@ -53,11 +62,22 @@ select = [
"TID", # flake8-tidy-imports
]

[tool.ruff.lint.flake8-tidy-imports.banned-api]
"yamllint".msg = "yamllint is for use as a command via subprocess."

[tool.pylint.main]
py-version = "3.10"
source-roots = ["src", "tests"]
ignore = ["_version.py"]

[tool.pylint.design]
max-branches = 20
max-line-length = 180
min-public-methods = 1

[tool.pylint.format]
max-args = 8

[tool.pylint."messages control"]
disable = [
"missing-class-docstring",
Expand Down
50 changes: 0 additions & 50 deletions scripts/ruff.sh

This file was deleted.

21 changes: 13 additions & 8 deletions src/instructlab/schema/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,20 @@
"""InstructLab Taxonomy Schema"""

# Standard
from importlib import resources
import importlib.resources
from importlib.abc import Traversable

try:
from importlib.resources.abc import Traversable # type: ignore[import-not-found]
except ImportError: # python<3.11
from importlib.abc import Traversable
__all__ = ["schema_base", "schema_versions"]

__all__ = ["schema_versions"]

def schema_base() -> Traversable:
"""Return the schema base.
Returns:
Traversable: The base for the schema versions.
"""
base = importlib.resources.files(__name__)
return base


def schema_versions() -> list[Traversable]:
Expand All @@ -19,9 +25,8 @@ def schema_versions() -> list[Traversable]:
Returns:
list[Traversable]: A sorted list of schema versions.
"""
schema_base = resources.files(__package__)
versions = sorted(
(v for v in schema_base.iterdir() if v.name[0] == "v" and v.name[1:].isdigit()),
(v for v in schema_base().iterdir() if v.name[0] == "v" and v.name[1:].isdigit()),
key=lambda k: int(k.name[1:]),
)
return versions
Loading

0 comments on commit 0e1d7ff

Please sign in to comment.