Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support overriding anchors #394

Open
wants to merge 3 commits into
base: release/6.0
Choose a base branch
from
Open

Support overriding anchors #394

wants to merge 3 commits into from

Conversation

perlpunk
Copy link
Member

@perlpunk perlpunk commented Apr 9, 2020

In YAML 1.1 and 1.2, anchors can be reused.
https://yaml.org/spec/1.2/spec.html#id2786196
https://yaml.org/spec/1.1/#id863390

See also #100 and #334

This allows:

- &anchor A
- *anchor
- &anchor B
- *anchor
# [A, A, B, B]

@kitterma
Copy link

This change also resolves a long time bug in the pyyaml package in Debian:

https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=515634

Thanks.

@perlpunk perlpunk requested a review from ingydotnet May 21, 2020 11:47
@perlpunk
Copy link
Member Author

Some background on this:

There is a discussion that repeated anchors should be forbidden in the next version of YAML (AFAIK mainly because it makes roundtripping with keeping anchor names more complicated).
So @ingydotnet wants to wait until the decision is made if this feature will be removed or not.

I think it's a valid point to say, we shouldn't start supporting something that will be forbidden in the next YAML version anyway.
But IMHO the reason for not allowing this (complicated roundtripping of anchor names) is not really valid, especially if the next YAML version still considers anchor names as throwaway content by default.

What would probably help the discussion is to see some real world use cases where this is useful.
I haven't need this feature myself so far, but I can imagine use cases.

@zokradonh
Copy link

I have currently the following nice-to-have use case (docker-compose.yml):

version: 3

services:
  &service myfancycontainer:
    environment:
    - SERVICENAME: *service 

  &service anotherfancycontainer:
    environment:
    - SERVICENAME: *service 

This is a simple docker-compose sample that forwards the service names to the containers as environment variables. Of course it could be implemented like this:

version: 3

services:
  &service  myfancycontainer:
    environment:
    - SERVICENAME: *service 

  &service2 anotherfancycontainer:
    environment:
    - SERVICENAME: *service2 

But it is cleaner with overriding anchor names. Even more cleaner if you have in mind that this compose file is in constant growth and more services are added over time by the administrator.

@YAMLcase
Copy link

YAMLcase commented Aug 6, 2020

@perlpunk I'd be interested in reading through this discussions if it happens to be linkable. Where I work we're making heavy use of anchors, going so far as to use this branch for our CI/CD testing because reusing anchors makes DRY go even further (py-test and especially tavern-ci). Thanks.

@perlpunk
Copy link
Member Author

perlpunk commented Aug 6, 2020

@YAMLcase actually the public process of RFCs has just started these days.
This is the related RFC:
yaml/yaml-spec#65
Note that the Pull request for the RFC itself is not where discussion happens, but once the RFC PR is merged, we can create issues discussing each RFC.
You can of course comment there anyway if you want.

@cbrinker
Copy link

I needed this fixed now so I have monkey patched the change into my current library. Not an ideal situation but maybe others would like this as a work around as we wait:

import yaml

# A monkey patch to fix an anchor bug in the pyaml parser details here: https://github.com/yaml/pyyaml/pull/394
def compose_node_fix_394(self, parent, index):
    if self.check_event(yaml.events.AliasEvent):
        event = self.get_event()
        anchor = event.anchor
        if anchor not in self.anchors:
            raise ComposerError(None, None, "found undefined alias %r"
                    % anchor.encode('utf-8'), event.start_mark)
        return self.anchors[anchor]
    event = self.peek_event()
    anchor = event.anchor
    # REMOVED BUG FROM HERE
    self.descend_resolver(parent, index)
    if self.check_event(yaml.events.ScalarEvent):
        node = self.compose_scalar_node(anchor)
    elif self.check_event(yaml.events.SequenceStartEvent):
        node = self.compose_sequence_node(anchor)
    elif self.check_event(yaml.events.MappingStartEvent):
        node = self.compose_mapping_node(anchor)
    self.ascend_resolver()
    return node

yaml.composer.Composer.compose_node = compose_node_fix_394 # Monkey patch in the fixed yaml parser

# Demo it working:
if(yaml.load("""- &foo bar
- *foo
- &foo baz
- *foo""", Loader=yaml.Loader) == ["bar","bar","baz","baz"]):
    print("Hooray, bug is patched!")

@cbrinker
Copy link

@perlpunk we are using yaml anchors/aliases quite a bit to cut down on tons of repeated stanzas in our prometheus configuration files. I'd really like to not have to uniquely identify all of these aliases!

...
- <<: *scrape-template
  job_name: job_1
  sd_configs:
  - &common-values
    project: some-project
    filter: (afilter)
    port: 1111
  - <<: &common-values
    zone: zone-a
  - <<: &common-values
    zone: zone-b
  - <<: &common-values
    zone: zone-c
- <<: *scrape-template
  job_name: job_2
  sd_configs:
  - &common-values
    project: some-project
    filter: (bfilter)
    port: 2222
  - <<: &common-values
    zone: zone-a
  - <<: &common-values
    zone: zone-b
  - <<: &common-values
    zone: zone-c
...

@dgarozzo
Copy link

What's the plan for this? Do I need to switch to ruamel.yaml instead?

@zokradonh
Copy link

There is a discussion that repeated anchors should be forbidden in the next version of YAML (AFAIK mainly because it makes roundtripping with keeping anchor names more complicated).
So @ingydotnet wants to wait until the decision is made if this feature will be removed or not.

Does recent closing of yaml/yaml-spec#65 mean that repeated anchors will not be forbidden? So this pull request can be merged?

@RafalSkolasinski
Copy link

RafalSkolasinski commented Jul 29, 2021

+1 on this - this seems to start to cause the problems with certain k8s manifests, especially knative eventing 0.24

Do we have a timeline when it could land in release?

@cowboy
Copy link

cowboy commented Aug 30, 2021

Note that in addition to 1.2 (mentioned in the OP), per the YAML 1.1 spec, "anchors need not be unique" https://yaml.org/spec/1.1/#id863390. It would be great to get this PR merged in.

@perlpunk
Copy link
Member Author

rebased and retargeted to release/6.0

Copy link
Member

@ingydotnet ingydotnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the spec team to discuss the best move here.

@perlpunk
Copy link
Member Author

We had a meeting and discussed this. I added an option reuse_anchors that defaults to False, like requested.

@RafalSkolasinski
Copy link

Any update on this?

@perlpunk
Copy link
Member Author

It is considered for the next release 6.1, I believe

@ingydotnet
Copy link
Member

This is planned to be a non-default option in PyYAML 6.1.

See https://github.com/yaml/pyyaml/projects/9

No ETA for when this will be released.

@RafalSkolasinski
Copy link

If this will default to false does it mean that parser will keep failing if yaml contains duplicated anchors like here unless it is explicitly enabled?
ansible-collections/kubernetes.core#189

@qwertystop
Copy link

Note that in addition to 1.2 (mentioned in the OP), per the YAML 1.1 spec, "anchors need not be unique" https://yaml.org/spec/1.1/#id863390. It would be great to get this PR merged in.

Not even that, they were legal even in 1.0 with almost exactly the same wording: https://yaml.org/spec/1.0/#id2558599. This has been valid for as long as YAML has existed (not counting any pre-1.0 of which I am unaware and for which a spec has not been preserved on yaml.org). In that context, making it default to invalid, even if it can be enabled, seems an odd choice.

@perlpunk
Copy link
Member Author

perlpunk commented Oct 25, 2021

I agree that it should just be allowed. But adding an option for it was the compromise between allowing it and not changing anything at all.
And because there are plans to restructure the API for loading and dumping, this even has to wait until that is decided (because load doesn't have any options at all yet)...

@fernandezcuesta
Copy link

wondering if this is gonna be merged?

@nitzmahone
Copy link
Member

nitzmahone commented Aug 30, 2023

I definitely want this in v.next, but especially if that ends up being a major release (which I assume it will be if we include 1.2 support and some form of declarative Loader/Dumper config to light it up ala #700), I don't see why we wouldn't just turn it on by default... Document it as a "sorta breaking change that actually brings us into better spec compliance that probably nobody will hit on purpose", and maybe don't even provide a knob to shut it off (or if we do, implement it in the Loader config, not on any of the top-level APIs).

Thoughts?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.