Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto Generation of Component Configuration from metadata.yaml #27003

Closed
wants to merge 5 commits into from

Conversation

kevinslin
Copy link
Contributor

Description:

This is a WORK IN PROGRESS and NOT READY for code review. Bringing it up as a PR to get feedback on design decisions

This is an initial change to have mdatagen also generate component configuration.
This works as described in https://docs.google.com/document/d/15SMsJAdE1BHAmwf8cLXBvsHDrezXOHniaRZafjO-5e4/edit

We are using go-jsonschema for doing yaml->jsonschema->gocode + validation. This library currently has many limitations (but is also the closest in terms of fulfilling the design goals).

This PR is a work in progress but I'm bringing it up to get feedback on a few design decisions that came up during development

Link to tracking Issue: #24189

Design Decisions

The following are design decisions that I'm looking for feedback on. I have listed a recommended approach for each choice. If there is anything I've missed (design wise), feel free to also bring it up in the review.

on pointer fields

go-jsonschema currently generates pointers for any property that is not marked as required

  • eg: generated config for fileexporter, note the pointer fields for Compression and FlushInterval
type Config struct {
	// Compression Codec used to export telemetry data. Supported compression
	// algorithms: `zstd`.
	Compression *ConfigCompression `json:"compression,omitempty" yaml:"compression,omitempty" mapstructure:"compression,omitempty"`

	// FlushInterval corresponds to the JSON schema field "flush_interval".
	FlushInterval *time.Duration `json:"flush_interval,omitempty" yaml:"flush_interval,omitempty" mapstructure:"flush_interval,omitempty"`

	// Format defines the data format of encoded telemetry data
	// Options:
	// - json [default]: OTLP json bytes.
	// - proto: OTLP binary protobuf bytes.
	Format ConfigFormat `json:"format,omitempty" yaml:"format,omitempty" mapstructure:"format,omitempty"`

	// Path of the file to write to. Path is relative to the current directory.
	Path string `json:"path" yaml:"path" mapstructure:"path"`

	// Rotation corresponds to the JSON schema field "rotation".
	Rotation *ConfigRotation `json:"rotation,omitempty" yaml:"rotation,omitempty" mapstructure:"rotation,omitempty"`
}

While this is arguably more correct, it goes against the convention used by component authors in the otel-collector (there is almost no presence of pointers in the Config in the vast majority of components). The builtin config unmarshaling logic actually explicitly converts any nil pointers to their equivalent zero values

This makes it easy to run into segmentation issues due to nil pointer dereferencing and also makes writing tests very verbose)

Recommendation: modify go-jsonschema so that the default is to not create pointers of non-required fields. Add a custom property, canBeNil, that can be set to true in the yaml if the component author wants to explicitly make a field a pointer. A required property can not have canBeNil set to true

  • the following would not be generated as a pointer
  properties:
    flush_interval:
      type: string
      format: duration
  • would be generated as a pointer
  properties:
    flush_interval:
      type: string
      format: duration
      canBeNil: true

on unmarshaling and validation

go-jsonschema combines validation & unmarshaling code by generating a json unmarshaling function with the following signature: func (j *Config) UnmarshalJSON(b []byte) error for each field that needs a validation. example here

  • because the collector components have separate functions for unmarshaling and validation, we currently wrap the generated unmarshaling code in a generated Validation call. Example here
  • this means the generated component validation will re-marshal the struct into json and unmarshal it again to trigger the generated validatino
func (cfg *Config)Validate() error {
    b, err := json.Marshal(cfg)
    if err != nil {
            return err
    }
    var config Config
    if err := json.Unmarshal(b, &config); err != nil {
            return err
    }
    return nili
}
  • note that there is still the usecase for adding a custom validator because
    • a) go-jsonschem currently doesn't support all validations by jsonschema
    • b) authors of components might implement custom validations that fall outside of jsonschema
  • for both these cases, we will generate a ValidateHelper function instead
func (cfg *Config)ValidateHelper() error {
    b, err := json.Marshal(cfg)
    if err != nil {
            return err
    }
    var config Config
    if err := json.Unmarshal(b, &config); err != nil {
            return err
    }
    return nil
}
  • custom Validation (and Unmarshaling) logic is written in the file config_custom.go
  • custom Validation will call the autogenerated ValidateHelper function
  • example here
  • the ValidateHelper function is autogenerated when
    • there is a jsonschema validation that the library currently doesn't support
    • when the config has an extra property customValidation: true

Recommendation: Move forward with this design. The alternative at this point is to either implement a non-json marshal/unmarshal based validation or not do validation at all. Since this is done once at startup and relatively efficient, proposing to keep current implementation

on go-jsonschema fork

The go-jsonschema library currently lacks many of the features that we require. Currently, I have forked some of the subpackages inside of an internal thirdparty package. The fork has the following changes

  • support date format that resolves to time.Duration type
  • enable reading in a struct instead of a file when doing the go code generation
  • keep track if there is a validation in the jsonschema that go-jsonschema does not support

I also plan on making the following changes

  • update the pointer logic to never generate a pointer unless canBeNil property is set to true
  • support the customValidation flag
  • other custom features that come up

Since some of these changes are not jsonschema compatible, it is unlikely to be accepted upstream.
In terms of the fork, I can either fork the package in a separate repo or continue as I have currently done, copying over the relevant files into mdatagen

Recommendation: Since the fork is 3 files, recommend just keeping it inside the otel-collector-contrib repo.

on default values

go-jsonschem has the ability to generate default values but these are currently only realized in the UnmarshalJSON logic. Example here

This means that the defaults only get set in the Validate code. i'm currently ignoring the generated default values since UnmarshalJSON is only execute inside of Validate

Some choices:

  • a) we could mutate the Config so that if no value is set, we can set a default in Validate - not a fan of this as Validate should not mutate the configuration. plus, since the default behavior is to replace pointers with zero values during unmarshaling, it will be hard to determine whether a default needs to be set
  • b) we could update the code generation logic to also generate a custom Unmarshal function if default values need to be set
    • this is the right way to do it but will expand the scope of this project
  • c) we can do nothing for now and leave it as future work to generate custom unmarshaling logic to generate default valuest

Recommendation: Do nothing right now to limit the scope of the upcoming changes. In the future, implement b (generating a custom unmarhsal function to set default values). At that point, we'll also have a better sense of the ergonomics of this process

@dmitryax
Copy link
Member

Amazing summary! Thank you, Kevin.

on unmarshaling and validation

I'm a bit concerned about splitting the validation logic and introducing the ValidateHelper. It seems like introducing too much complexity.

on go-jsonschema fork
on default values

If we are bringing the fork to our repo, do we need to stick to the json-schema requirements and behavior? Maybe we can change it to generate the code that we expect including the Validate and CreateDefaultConfig?

@kevinslin
Copy link
Contributor Author

If we are bringing the fork to our repo, do we need to stick to the json-schema requirements and behavior? Maybe we can change it to generate the code that we expect including the Validate and CreateDefaultConfig?

we don't - the nice thing with sticking to the current logic for validation is that it gets us 80/20 of the way there and gives us an incremental way to move forward

@mx-psi
Copy link
Member

mx-psi commented Sep 20, 2023

This is a WORK IN PROGRESS and NOT READY for code review. Bringing it up as a PR to get feedback on design decisions

I am going to mark this as draft then, feel free to mark it as ready whenever that is the case

@mx-psi mx-psi marked this pull request as draft September 20, 2023 08:28
@mx-psi mx-psi self-requested a review September 20, 2023 08:28
@mx-psi
Copy link
Member

mx-psi commented Sep 20, 2023

On the fork front: have we reported the issues upstream? Do we know if changes like these would be accepted?

@kevinslin
Copy link
Contributor Author

kevinslin commented Sep 20, 2023 via email

@@ -7,3 +7,41 @@ status:
distributions: [core, contrib, observiq, splunk, sumo, aws]
codeowners:
active: [atingchen]
config:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be good to pick another exporter where there is a heavy reuse of shared config objects, like the HTTPSettings. How would it be handled in the schema?

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions
Copy link
Contributor

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Oct 25, 2023
@yurishkuro
Copy link
Member

As an alternative to go-jsonschema I think it makes sense to look at Protobuf IDL as the source of schema. There are multiple validation packages available, e.g. https://github.com/bufbuild/protovalidate. It's also pretty simple to write a protoc-plugin to help with generation of the structs with the necessary struct tags, e.g. marking certain fields as sensitive (not to be printed).

@yurishkuro
Copy link
Member

Also, there are even more validation packages going against struct tags (rather than tying to protobuf), so a solution that allows to specify the tags in the IDL would be all that's needed. Here's a pretty simple (to reproduce) library that allows specifying struct tags: https://github.com/favadi/protoc-go-inject-tag

@ptodev
Copy link
Contributor

ptodev commented Aug 13, 2024

Hi, everyone! I opened a PR which aims to build upon this one. If you are interested in this project, please feel free to take a look.

My PR does differ from this one in a few notable ways:

  • I am trying to avoid having go-jsonschema code inside the Collector repo. For now I'm using a fork of go-jsonschema under my own GitHub username.
  • I am avoiding pointers in the generated structs by setting a default value in the schema. When there's a default value, go-jsonschema doesn't generate pointers.

Thank you all for your contributions! And in particular I'd like to give a huge thank you to @kevinslin for establishing a nice base on which others can build upon, and for gathering so much feedback!

@kevinslin
Copy link
Contributor Author

kevinslin commented Aug 13, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants