Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_get_unique_name take really long with large swagger definition file #2286

Open
FJEANNOT opened this issue Jan 28, 2025 · 0 comments
Open

_get_unique_name take really long with large swagger definition file #2286

FJEANNOT opened this issue Jan 28, 2025 · 0 comments

Comments

@FJEANNOT
Copy link

Profiling shows a really a really long execution time of the _get_unique_name method from class ModelResolver

Image

To Reproduce

Example schema:
We use a large swagger file created on the fly. Typically a merged JSON Schema of all the schema in this folder: https://github.com/crossplane-contrib/provider-upjet-azure/blob/main/package/crds. With some modifications. For instance, we build the property name with the k8s apiGroup in this folder, then append the kind one more time (ex: io.upbound.azure.analysisservices.Server becomes io.upbound.azure.analysisservices.Server.Server) to makes sure datamodel codename outputs the resource in a dedicated file

Used commandline:

    generate(
        json.dumps(swagger),  # our really long swagger definition
        collapse_root_models=True,
        disable_timestamp=True,
        field_constraints=True,
        input_file_type=InputFileType.JsonSchema,
        output=output_dir,
        output_model_type=DataModelType.PydanticV2BaseModel,
        target_python_version=PythonVersion.PY_312,
        use_annotated=True,
        use_field_description=True,
        use_schema_description=True,
        use_subclass_enum=True,
        custom_formatters=[]
    )

Looking at the function, it seems it iterates on every 'object' property of the entire definition to avoid duplicates names. We tried to add a title to every object and pass the option use_title_as_name, but _get_unique_name is still called by get_class_name.

To my understanding, the functions iterates on every object reference to come up with a unique name to avoid duplicate class name in the ouput. But since we are rendering the classes in a dedicated file containing only a few classes of the complete output, isn't it possible to limit the reference list to only the ones present in the ouput file that will contain the class ?

Also, is it possible to completely drop the usage of _get_unique_name when calling with use_title_as_name ?

Additional context
As mentionned above, the swagger definition comes from the Kubernetes ecosystem.

The complete swagger file we are using is quite big. We combine the swagger.json file provided by our Kubernetes server with a large list of CustomResourceDefinition OpenAPISchemas. I can provide the complete script if needed.

That being said, the script still takes really long just with a smaller (but still large) swagger file, like the combination of all the CustomResourceDefinitions linked above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant