Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

YAML 1.2 support #116

Open
kislyuk opened this issue Dec 27, 2017 · 13 comments
Open

YAML 1.2 support #116

kislyuk opened this issue Dec 27, 2017 · 13 comments

Comments

@kislyuk
Copy link

kislyuk commented Dec 27, 2017

I maintain https://github.com/kislyuk/yq, and as seen in kislyuk/yq#10, the lack of YAML 1.2 support in this library is a hindrance to parsing YAML in a consistent "least surprise" manner. Recognizing the time constraints that the maintainers face, is there any roadmap or guidance on how YAML 1.2 support may be introduced into PyYAML (either from ruamel.yaml or otherwise)? Any implementation guidance for potential contributors who might work on a PR?

@Athanasius
Copy link

Having just started a project which is also my first use of YAML, and reading the official documentation, I've been wondering why "%YAML 1.2" at the start of the file was throwing errors. I take it there's still no support for this ?

@kislyuk
Copy link
Author

kislyuk commented Dec 5, 2019

Any updates on this issue?

@perlpunk
Copy link
Member

perlpunk commented Apr 29, 2020

I have some ideas on how to implement it.
I think the syntax is less important; most 1.2 syntax rules are already supported.

  1. The resolving of booleans and numbers is easy, it's just a bit of work and needs good testing.

I created a temporary repo with test data for that: yaml-test-schema
It might get part of the yaml/yaml-test-suite at some point.

Apart from that we need:
2. Add a yaml_version option
3. Accept %YAML 1.2 and change the resolving rules depending on the %YAML directive

I could start with creating the resolvers/constructors, so that they can already be loaded explicitly.
Point 2 and especially 3 depend on if @ingydotnet wants that automatic behaviour.
Point 3 could be complicated because I would have to store a complete new set of resolvers and constructors somewhere.

In YAML::PP I implemented it like this. I added an option for the yaml version that can take the following values:

  • 1.1 - Only support 1.1, ignore %YAML directive
  • 1.2 - Only support 1.2, ignore %YAML directive
  • [1.1, 1.2] - Support both and default to 1.1
  • [1.2, 1.1] - Support both and default to 1.2

@ingydotnet
Copy link
Member

@perlpunk what would be the default if no yaml_version is specified?

I think it needs to be 1.1, with a deprecation period.
The deprecation would look for things that change between 1.1 and 1.2 (like yes for true) and warn to either switch to 1.2 style or use yaml_version.
We'll want to do this same style deprecation for 1.3.

I'd like the default to eventually be the most recent version.

I like this plan to move to 1.2 support overall and suggest we move towards making it happen.

@perlpunk
Copy link
Member

@perlpunk what would be the default if no yaml_version is specified?

1.1 for now

I think it needs to be 1.1, with a deprecation period.
The deprecation would look for things that change between 1.1 and 1.2 (like yes for true) and warn to either switch to 1.2 style or use yaml_version.

This could be problematic, and a lot of work. What should we do if we encounter 010, which is octal 8 in 1.1 and simply 10 in 1.2?
We would have to double check all input with both sets of regexes for each schema.

I like this plan to move to 1.2 support overall and suggest we move towards making it happen.

I experimented already a bit (with implementing point 1) and can try to push my branch the next days. It includes a lot of moving functions around, but isn't complicated.

@dHannasch
Copy link

I have some ideas on how to implement it.

Apart from that we need:
2. Add a yaml_version option
3. Accept %YAML 1.2 and change the resolving rules depending on the %YAML directive

I could start with creating the resolvers/constructors, so that they can already be loaded explicitly.
Point 2 and especially 3 depend on if @ingydotnet wants that automatic behaviour.
Point 3 could be complicated because I would have to store a complete new set of resolvers and constructors somewhere.

In YAML::PP I implemented it like this. I added an option for the yaml version that can take the following values:

  • 1.1 - Only support 1.1, ignore %YAML directive
  • 1.2 - Only support 1.2, ignore %YAML directive
  • [1.1, 1.2] - Support both and default to 1.1
  • [1.2, 1.1] - Support both and default to 1.2

When you're talking about adding a yaml_version option, you're talking about adding it as a kwarg to the likes of safe_load(), right? https://github.com/yaml/pyyaml/blob/master/lib3/yaml/__init__.py#L154

As far as how to implement it, couldn't we just take the existing YAML 1.2 parser from ruyaml/ruamel.yaml? That seems easier than translating the Perl version.

Indeed, in the short term we could probably enable the easiest case (adding a yaml_version kwarg) in five minutes by adding ruyaml as a dependency and just kicking it over to the 1.2 parser in that case. And then in the long term re-integrate them back into a single codebase.

@perlpunk
Copy link
Member

perlpunk commented Feb 4, 2021

@dHannasch I think we should differentiate between the parser and the loader/resolver.

Regarding the parser, both pyyaml and ruamel diverge from the spec and do not fully support YAML 1.1 or 1.2.
But the support is "good enough" for most cases, and 1.1/1.2 don't differ that much.
We also should think about libyaml support. I'm not sure if ruamel can actually use libyaml as a backend parser. That is important.

Regarding ruamel as a dependency: if users want to process YAML 1.2, they can already use ruamel.
But projects like ansible are still using pyyaml, and I don't know all of the reasons, as I don't know too much about ruamel.

Regarding the loader/resolver: Implementing the rules is not that difficult, as I said. If we use the existing test data, it will simply be a lot to write.

So I am not sure where you would actually use ruamel code and how much sense that would make.

When you're talking about adding a yaml_version option, you're talking about adding it as a kwarg to the likes of safe_load(), right?

Something like that, yes.

@dHannasch
Copy link

I made a branch that allows safe_load to accept YAML 1.2: #489
It's not nearly as big and complicated as it looks. I ran into some other issues in the process, and opened separate pull requests for those. Once those are subtracted out, it's pretty simple, since it's really kind of a cheat, just importing an external library. But it allowed adding a YAML 1.2 test, and can allow adding further YAML 1.2 tests.

@perlpunk
Copy link
Member

perlpunk commented Apr 10, 2021

#555 Support for the YAML 1.2 Core and JSON schemas

@vallamost
Copy link

vallamost commented Jul 26, 2022

It's been 5 years...is it going to be decided if YAML 1.2 will be supported by PyYAML? If not, should PyYAML be consider deprecated as a YAML parser? YAML 1.2 released in 2009. PyYAML at the very least could reference other parsers to use instead of choking itself out with the safe_loader() when it finds a YAML tag.

I had to go on a time traveling scavenger hunt to understand the current reason why PyYAML is still throwing the error of could not determine a constructor for the tag '!reference' on a modern Yaml file. yaml.safe_loader() is not safe if it cannot parse YAML 1.2.

Seems like 1.2 support should of been in PyYAML 6.0 as seen here - #555 but it was passed up. Maybe it'll turn up in 7.0?

In either case this is what got me past the error:

yaml.add_multi_constructor('tag:yaml.org,2002:python/name', lambda loader, suffix, node: None, Loader=yaml.SafeLoader)
yaml.add_multi_constructor('!', lambda loader, suffix, node: None, Loader=yaml.SafeLoader)
yaml.add_multi_constructor('tag:', lambda loader, suffix, node: None, Loader=yaml.SafeLoader)
tmp_project_ci_dict = yaml.safe_load(f)

@perlpunk
Copy link
Member

I had to go on a time traveling scavenger hunt to understand the current reason why PyYAML is still throwing the error of could not determine a constructor for the tag '!reference' on a modern Yaml file. yaml.safe_loader() is not safe if it cannot parse YAML 1.2.

Whether a YAML file contains a tag !reference or not, has nothing to do with YAML 1.1 / 1.2.
It is a custom tag, which you add a custom constructor function for. You can do this today with pyyaml.

Please don't mix different issues here. thanks.

@fxmarty
Copy link

fxmarty commented Jan 18, 2023

Having "on" transformed into True is... interesting design choice.

@perlpunk
Copy link
Member

If you need 1.2 support and haven't read it yet, please checkout my other comment here about a new library on top of PyYAML for YAML 1.2 Core Schema support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants