Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema rename multiple fields #828

Closed
wants to merge 9 commits into from

Conversation

LWprogramming
Copy link
Collaborator

@LWprogramming LWprogramming commented May 18, 2020

Tests for rename_schema for #810. Don't look at this yet since I'll be adding comments explaining some of the trickier parts of what I've written so far and discussing other edge cases for which I haven't written tests yet.

I'll probably split this into a number of smaller PRs as this grows (since this is already a pretty decent-sized PR), but for now I wanted to have things all in one place so i can keep track of things.

Should stay unchanged because we don't rename scalars
To have them fit a consistent format: first, with specific details,
followed by whether it's a one-to-many rename or a suppression, and
finally if it's a one-to-many rename, whther or not it includes the
original field. If we have both a one-to-many rename and a suppression
in the same rename, the one-to-many rename comes first (e.g. in
test_one_to_many_rename_and_suppress).
@LWprogramming
Copy link
Collaborator Author

Possibly non-obvious decisions about the changes that are already here

  • All existing schema renaming tests have a 1-1 renaming, so we simply change the string to a list containing that string: A->B becomes A->[B].

  • If we suppress every field in the schema, you can't query anything. To avoid suppressing every field, I've added a field to the tests and input for non_null_schema. As a result, I also updated test_merge_schemas which uses this input string.

  • The tests are named as follows: test_detail_renametype.

    • Detail gives some more detailed information about what's being tested, e.g. specifying that a particular test renames an enum.

    • Rename type describes whether the test is suppressing a field or performing a one-to-many renaming (or both)

  • reverse_name_map should still work the same way as before even if the renaming is not 1-1. In particular, reverse_name_map maps an entry A->B iff rename_schema's renamings parameter maps B to a list containing A and A and B are different strings (and there are no naming conflicts).

    • This implies reverse_name_map will never map a string to itself.

    • reverse_name_map isn't affected by field suppression. If we suppress A by mapping it to the empty list, then there is no way to write a query with a field that gets renamed to A via rename_query. So, there's no need for this information to exist in reverse_name_map (it's unrepresentable anyways).

    • I considered an alternate representation of reverse_name_map which includes the mapping A->A for all such A where either A is not renamed or renamings includes a mapping A -> a list of strings that includes A.

      • This would also implicitly specify which fields are suppressed, because they'd be the only fields from the original schema that nothing maps to in reverse_name_map.

      • Although this has the nice property of including more information than the expected implementation of reverse_name_map, I decided not to choose this implementation because it would require building a dictionary proportional to an arbitrarily-large schema even for small renamings and there is no obvious benefit from this alternate implementation that I can see right now. I've written this here mainly to serve as a reference in case there is some reason to revise what we need from reverse_name_map.

  • For migrations as described in Support renaming one schema field into multiple new schema fields #810, a one-to-many renaming would include the original name in the list (e.g. renaming A-> [A, B, C]). However, from a programming perspective, so far I haven't seen anything that would make it harder to have a one-to-many renaming that doesn't include the original (e.g. renaming A-> [B, C]), so I've tested for both cases.

@LWprogramming
Copy link
Collaborator Author

Non-obvious decisions that I haven't made yet and would like feedback on

  • How to safeguard against suppressing every field in the schema? Such a renaming might not even require a large number of renamings-- see the bullet point on cascading suppression.

  • Cascading suppression: if we suppress all the schema fields that a particular type has, should we also suppress that type? This may lead to a cascade of suppressions, for instance:

type A {
    Listb: [B]
}

type B {
    c: C
}

enum C {
    YES
    NO
    MAYBE
}

Suppressing C also means suppressing B because all of B's fields are suppressed, which in turn suppresses A. The same question applies for union types.

  • If we suppress schema fields belonging to a union type, what do we do beyond simply removing the suppressed fields? Suppose we have a schema where A, B, and C are defined:
union AOrBOrC = A | B | C

If we suppress A, we're left with:

union AOrBOrC = B | C

This has two drawbacks: the name may be misleading because it doesn't match the types, and may reveal information about the datatypes, which suppression is supposed to prevent. Is this union name issue because of the test input unions have names based on which types they could be, or is there something more fundamental we should be concerned about here? If the former, we may be able to just leave the union name as-is.

One final note: if we also suppressed B in this example, union AOrBOrC = C would be misleading but still valid GraphQL according to the GraphQL spec, even though there's only one type left!

  • One-to-many renamings get complicated when other types depend on the schema fields being renamed.

For instance, if we start with this enum as part of the schema:

enum Height {
  TALL
  SHORT
}

And rename Height -> [Height, NewHeight], this renamed schema would seem to work:

enum Height {
  TALL
  SHORT
}

enum NewHeight {
  TALL
  SHORT
}

But if we also have a type that uses Height, then it's unclear how to proceed:

type Droid {
  height: Height
}

Neither of the following two options are good because now the schema has name conflicts:
1.

type Droid {
  height: Height
}
type Droid {
  height: NewHeight
}
type Droid {
  height: Height
  height: NewHeight
}

A plausible third option exists, involving unions:
3.

type Droid {
  height: HeightOrNewHeight
}
union HeightOrNewHeight = Height | NewHeight

However, this also fails if Height was a union of types, rather than an enum, to begin with because "you can't create a union type out of ... other unions". I don't have a great solution to this at the moment.

The same problem happens with non-enum fields.

  • Interfaces also present a similar but different problem: if A implements B originally, and interface B gets renamed to [B, C], then we end up with A implements B & C. Although this is valid GraphQL, it may be misleading, especially if it's not obvious that B and C refer to the same interface.

@LWprogramming
Copy link
Collaborator Author

Note: still no need to look at the code since I'll almost certainly split this into smaller PRs-- before diving into implementations or even writing out a bunch of tests that I'm not sure are correct, I wanted to iron out some uncertain and tricky parts conceptually in these two previous comments.

@pmantica1
Copy link
Contributor

Issue 1:
I don't have an issue with cascading suppression.

Issue 2:
Some thoughts:

  1. We auto-generate union names during schema generation using a certain template, but users can manually generate their own schema and don't need to follow the template that we use for union fieldnames. Since the original union field name doesn't need to follow a template, it is not clear if the new autogenerated name would be any better than the original one.
  2. If we renamed the union, then we could be implicitly breaking existing queries and I am not sure that's something we want to do.
  3. By changing the type of a field from GraphQLUnionType to GraphQLObjectType, we could also be implicitly breaking queries. So even though the one-union type situation is not great, I don't think that we should be automatically changing the field types.

So I think it's best to solely omit the object type from the union type and do nothing more.

Two ideas that I am thinking to improve on this would be to:

  1. Add another function that allows one to change the type of fields to deal with the one-type union.
  2. Returning a dictionary of suggested renamings.

I would prioritize getting a basic version of schema renaming working before focusing on improvements like these though. For now simply omitting the object type seems good enough to me.

Issue 3:
When we remap a type to multiple types, we want to avoid breaking any existing queries. I think we could do that by using the new type in all existing fields except unions. So we would have:

type Droid {
   height: NewHeight
}

For unions, we need to have both the old and new type as part of the union field. (The old type to avoid breaking any existing queries, and the new type to allow users to start referring to solely the new type in queries).

@obi1kenobi
Copy link
Contributor

As you correctly point out, the operations happening here are complex and difficult to think about. Thank you for writing up all the edge cases you are considering, it was very useful.

I'd like to make an observation here that might simplify things. We have two use cases that we are trying to cover here:

  • migrations that change names
  • permanent schema renaming, where we simply don't like what the underlying schema calls things and want to assign them new names

The latter only requires 1-1 renaming, which are conceptually simple. The former is solvable by 1-many renaming, but that's not the only viable solution and doesn't have to be the full story here. So I'd propose a hybrid approach for supporting migrations: allow 1-many renaming of fields, and use an alternative approach for types of all kinds: scalars, enums, interfaces, objects, unions, etc.

That means I'm suggesting we reduce scope and only allow renaming types to 0 or 1 names, raising NotImplementedError() or another error otherwise. This is because 1-many renaming of types is both extremely difficult to do well and will result in a massively bloated, user-unfriendly schema. Instead, we can handle migration-style renames of such schema elements by simply building the reverse_name_map component for them manually, and accepting schema-incompatible queries during the migration period by rewriting them into schema-compatible ones.

Also, I'd recommend we play it safe and don't implicitly cascade type hiding, since on a large schema (ours stretch into hundreds of MB!) it's going to be impractical to hand-review the changes resulting from a given operation. Instead, my preference would be to make the user explicitly state all the changes they'd like to perform, and have the code complain by raising errors if we end up in an illegal state, such as:

  • type has no fields
  • union has no types
  • type is being hidden but fields of that type exist, etc.

Importantly, such complaints must be clear ("input X said to do Y which caused illegal state Z") and actionable ("here are the actions that would resolve the problem") for the user.

@LWprogramming
Copy link
Collaborator Author

Thanks for the feedback! A few follow up questions:

simply building the reverse_name_map component for them manually, and accepting schema-incompatible queries during the migration period by rewriting them

  • If we do this for types, enums, etc. then instead of 1-many renamings for fields could we treat fields the same way we do everything else? Then we wouldn't have a mix of what happens to different parts of the schema. We could say that 1-1 and 1-0 is always done in rename_schema while 1-many is handled by being more permissive with queries than the schema suggests and renaming the query accordingly.

  • Does this mean that someone writing a query would see the schema after it was modified with 1-1 and 1-0 renamings and that we'd modify reverse_name_map somewhere in rename_query? Unsure where this extra processing should be happening.

such complaints must be clear ... and actionable

  • Makes sense-- what would be the actionable thing to do here though? Suppose we have this schema:
type A {
    Listb: [B]
}

type B {
    c: C
}

enum C {
    YES
    NO
    MAYBE
}

and C gets suppressed. This should raise an error, but what should the suggested action be?

  • If we suggest the user remove C from things to suppress, that kind of defeats the point of suppressing a field

  • If we suggest the user also suppress B, then they'd re-run the schema rename and get another error telling them to suppress A. That'd be annoying to deal with, especially if there's a long chain of suppression required

  • If we tell the user everything that would eventually need to get suppressed, we might end up showing them a massive, probably hard-to-reason-about error message (same problem with the long chain of suppression)

@obi1kenobi
Copy link
Contributor

Great points! My thoughts inline:

  • If we do this for types, enums, etc. then instead of 1-many renamings for fields could we treat fields the same way we do everything else? Then we wouldn't have a mix of what happens to different parts of the schema. We could say that 1-1 and 1-0 is always done in rename_schema while 1-many is handled by being more permissive with queries than the schema suggests and renaming the query accordingly.

This is definitely a viable approach. I would argue it's slightly less desirable, though, because it doesn't help users be self-sufficient. Let me take a step back and explain what I mean.

GraphQL has a built-in directive called @deprecated which takes an optional string reason parameter, and may be applied to field definitions or enum values. This is something that common UIs natively support: for example, GraphiQL will automatically give you a squiggly yellow underline if you use a deprecated field (together with a pop-over showing you the listed deprecation reason), and will avoid suggesting deprecated fields as a typeahead suggestion for as long as alternative valid suggestions exist. This is a very pleasant workflow: queries with deprecated fields remain valid in the UI, and the user is given specific and actionable advice like "field X is deprecated, please use the new field Y instead". Building on this, one could also imagine a world where we have a tool that is aware of "X field was replaced by the equivalent Y field" deprecations, and offers to rewrite your queries to avoid deprecations — using a nice sanity-check invariant that both the input and output queries must validate against the schema, but the output query has only a subset of the deprecation warnings of the input.

Deprecating a type in GraphQL, however, is semantically strange. A type definition by itself doesn't do anything except for "special" types like the root query or mutation type: if no fields of that type are reachable, you can't do anything with it anyway — hence the special-casing of the root query/mutation types, since those are reachable by default. Correspondingly, @deprecated isn't allowed to be present on object type/interface/union/enum definitions.

Let's refer to having both old and new copies in the schema as a "soft" migration (both queries valid, one deprecated), and let's call only having the new copy in the schema + a fall-back path that accepts queries under the old schema a "hard" migration.

Because of the @deprecated directive and associated tooling's support for it, supporting soft migrations for field definitions and enum values has a much higher payoff value than supporting soft migrations for types of all kinds. Coincidentally, soft migrations are also much easier to pull off for field definitions and enum values than they are for types, as we discussed previously.

On the basis of that, my recommendation was to support soft migrations for field definitions and enum values (because it's very good and not that hard), and only support hard migrations for object types/interfaces/unions/enums (because soft migrations are less good here anyway, and much harder to pull off).

  • Does this mean that someone writing a query would see the schema after it was modified with 1-1 and 1-0 renamings and that we'd modify reverse_name_map somewhere in rename_query? Unsure where this extra processing should be happening.

After applying all the renaming, the generated schema would be used to write queries against, yes. It may be used through a tool like GraphiQL directly, or simply used by the compiler to compile a user's query against that schema.

I wouldn't necessarily worry about rename_query and the hard migration's fallback path for type renaming. For that, we may want to add an extra preprocessing step (i.e. a separate function) that the user has to explicitly opt into (i.e. call), since we are technically accepting illegal queries and attempting on a best-effort basis to make them legal. That makes it tricky, because it's more difficult to differentiate between the buggy "illegal query should have become legal, but didn't" case and the works-as-intended "illegal query was illegal in a non-fixable way, so it remained illegal as expected" case.

such complaints must be clear ... and actionable

  • Makes sense-- what would be the actionable thing to do here though? Suppose we have this schema:
type A {
    Listb: [B]
}

type B {
    c: C
}

enum C {
    YES
    NO
    MAYBE
}

and C gets suppressed. This should raise an error, but what should the suggested action be?

  • If we suggest the user remove C from things to suppress, that kind of defeats the point of suppressing a field
  • If we suggest the user also suppress B, then they'd re-run the schema rename and get another error telling them to suppress A. That'd be annoying to deal with, especially if there's a long chain of suppression required
  • If we tell the user everything that would eventually need to get suppressed, we might end up showing them a massive, probably hard-to-reason-about error message (same problem with the long chain of suppression)

I'd pick the middle option. As you point out, the first one is counter to the user's expressed intent, and giving that message as feedback is akin to a "don't do that", which is unhelpful. The middle option is a nice balance of "reasonable to implement" and "works most of the time", because most real-world schemas wouldn't have these cascading situations, and if they do, they are unlikely to have many of them. Saving the user from needing to iterate 2-3 times, on a relative basis given our development bandwidth, is not worth the comparatively large amount of effort it would take to implement a system that would find and appropriately communicate the correct sequence of steps to take and the reasons for them to the user.

As described in the discussion of PR #828, we actually don't need to
support 1-many renaming for types, only for fields. That eliminates the
need for a number of tests that were previously here. This commit isn't
the end of the story though, since I plan to add in the 1-many argument
for rename_schema after this.
@LWprogramming
Copy link
Collaborator Author

Sounds good to me-- I'll aim to get the type hard migration/ suppression part for rename_schema done first and put that out in a separate PR whose code can be reviewed.

Just a few lingering issues:

  • Since fields only exist in their local scope (so a type foo and a type bar can both have a field name), I think it'd make more sense to create a separate parameter for field/ enum value renamings that maps a schema type T (string) to a dictionary that maps T's fields (string) a list of strings. For instance:
{
  "Human": {"name": [], "id": ["id", "id1"]}
}

suppresses name and 1-many renames id. Conveniently, the existing renaming code doesn't do anything to fields and enum values, which is exactly what this extra parameter lets us do.

  • If we have a type foo with fields A and B, is it legal to suppress A and one-to-many rename B to A and B? On one hand, both the expected beginning and end states are technically legal-- but if you try to rename B before suppressing A, the intermediate state would not be legal.

@obi1kenobi
Copy link
Contributor

Good points. Open to adopting the API you suggest, mapping types to dicts of fields with their own remappings — I'd just suggest that we treat that info as overriding the default of "change nothing", so that we don't have to have lots of entries that simply communicate "don't change this field".

To avoid describing the illegal intermediate state in the spec, perhaps we could specify the order of operations more fully — maybe something like "field suppressions happen before field renames, either 1-1 or 1-many". So long as the implementation's start-to-finish behavior is indistinguishable from the spec, whether the implementation actually follows that order shouldn't matter.

@obi1kenobi obi1kenobi changed the base branch from master to main June 24, 2020 14:00
@LWprogramming
Copy link
Collaborator Author

Most of the work here has gotten merged, closing for the same reason as 834 to clean up

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants