WIP Support for the YAML 1.2 Core and JSON schemas #512
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Superseded by #555
This is a draft and subject to discussion.
See also #486
Thanks to @SUSE for another hackweek! I had four days of work time dedicated to an open source project of my choice. https://hackweek.suse.com/20/projects/yaml-1-dot-2-schema-support-for-pyyaml
This PR depends on #483
Introduction
For a quick overview of the schema changes between YAML 1.1 and 1.2, look here: https://perlpunk.github.io/yaml-test-schema/schemas.html
While also the syntax was changed in YAML 1.2, this pull request is about the schema changes.
As an example, in 1.1,
Y
,yes
,NO
,on
etc. are resolved as booleans in 1.1.This sounds convenient, but also means that all these 22 different strings must be quoted if they are not meant as booleans. A very common obstacle is the country code for Norway,
NO
("Norway Problem").In YAML 1.2 this was improved by reducing the list of boolean representations.
Also other types have been improved. The 1.1 regular expression for float allows
.
and._
as floats, although there isn't a single digit in these strings.While the 1.2 Core Schema, the recommended default for 1.2, still allows a few variations (
true
,True
andTRUE
, etc.), the 1.2 JSON Schema is there to match JSON behaviour regarding types, so it allows onlytrue
andfalse
.Current State
PyYAML implements the 1.1 types (with a few changes like leaving out the single character booleans
y
,Y
etc.), and it was never updated to support one of the 1.2 Schemas.Problem
Besides the above mentioned problems with the 1.1 types, more and more libraries are created or updated for YAML 1.2, probably also thanks to the relatively new YAML Test Suite, and PyYAML should be able to read and write YAML files used or produced by other libraries.
This PR
The PyYAML Safeloader, which is currently the most recommended Loader if you don't need special behaviour, implements YAML 1.1 types.
I added CoreLoader, CoreResolver, CoreConstructor, CoreRepresenter, CoreDumper, and the same for JSONLoader etc.
The suggestion is that CoreLoader and JSONLoader are recommended to try out, and the other mentioned classes might be subject to changes or removal in a later release. This way we have time to figure out a better API, while users can already use the new top level Core Loader.
One problem is that PyYAML's callbacks are class based, and while I was able to make the code a bit more compact via a dictionary of types/callbacks, there are still method calls which must be in a certain class.
The
!!merge <<
key for example needs special handling.That way it's tedious to add custom Loaders. Turning the class based approach into an instance based is on our wishlist.
One example use case we have in mind is, that you want to use the 1.2 CoreLoader, but on top of that you want it to recognize timestamps and mergekeys.
Or you want a very basic loader that should treat everything as a string except booleans and null.
Example
edit: some of the tests are failing, but this is unrelated and caused by an issue with Github Actions caching