Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Algebraic types II #466

Open
bedeho opened this issue Jun 16, 2020 · 1 comment
Open

Algebraic types II #466

bedeho opened this issue Jun 16, 2020 · 1 comment
Labels

Comments

@bedeho
Copy link
Member

bedeho commented Jun 16, 2020

Background

As already explained in the background section here

https://github.com/Joystream/joystream/issues/554

We have a problem domain which has algebraic types all over the place, and it would be a big benefit if they could be reflected neatly in our query infrastructure. As explained in that first post, doing it deeply (approach 4), is ideal, however a later post points out

https://github.com/Joystream/joystream/issues/554#issuecomment-640254649

The resulting GraphQL API must generate OpenCRUD like capabilities that tie into matching these algebraic types, and this is a non-trivial task.

This issue attempts to clarify how this can be done.

Proposal

Input Schema

Algebraic Types

The new algebraic types respect the GraphQL standard, specifically they are informally defined as

  • union <name> = T_1 | ... | T_N where name is a GraphQL NamedType, and T_i are algebraic types, and referred to as cases of the type. This algebraic type is called an algebraic union.
  • type <name> @variant { f_1: T_1 ... f_N: T_N } where name is a GraphQL NamedType, f_i is a GraphQL Name, T_i is a non-ID GraphQL Type or an algebraic union. This algebraic type is called an algebraic variant, and such a type without any algebraic union members is called called a flat variant.

An algebraic union can be a member field of one or more normal @entity types, and can use non-null requirement. An algebraic variant cannot be a member of an @entity type.

Here is an example

type Miserable @variant {
  hates: String!
}

type HappyPoor @variant {
  isMale: boolean
}

union Poor = HappyPoor | Miserable

type MiddleClass @variant {
  father: Poor
  mother: Poor
}

type Rich @variant {
  bank: EntityC
}

union Status = Poor | MiddleClass | Rich

type EntityA @entity {
  id: ID!
  status*: Status!
}

type EntityB @entity {
  id: ID!
  status_b: Status!
  status_b: Status
}

type EntityC @entity {
  id: ID!
}

Relationships

When an algebraic union is a member in an entity type, there are cross-entity constraints around how entity member fields, or list thereof, because they model relationship semantics. This is also the case for normal entities.

In order to validate these requirements, just proceed as if every entity has any field in occurring in a member algebraic type as a top level field. If the resulting set of entities have valid relationship references, then the original usage of entity member fields in algebraic types is valid.

Concepts

An algebraic type can be represented as a labelled tree as follows

  • tree(union type <name> = T_1 | ... | T_N) = T[UNION||name, (ε, tree(T_1)), ..., (ε, tree(T_N))]
  • tree(type <name> @variant { f_1: T_1 ... f_N: T_N }) =
    • T[VARIANT||name, (f_{g_1}, tree(T_{g_1})), ... , (f_{g_M}, tree(T_{g_M}))] where g_i are distinct indexes of one or more algebraic member fields.
    • Node[VARIANT||name] when all fields f_i are non-algebraic.

where

  • for labelled non-empty trees c_1,...,c_n, and string labels l_1,...,l_n, T[x, (l_1, c_1), ..., (l_N, c_N)] is the tree with the root labelled with x and the roots of c_1,...,c_n as children, each with an edge labelled l_1,...,l_n.
  • Node[name] is a labelled node without any children.
  • || is a string concatenation operator.
  • ε is the empty string.

Any node in such a tree that corresponds to a algebraic union type, is called a union node, an any node corresponding to a variant type is called a variant node.

Given such a tree we can define the idea of coherent union, which is a subset of union nodes in the tree such that if you mark all edges from a node in the set all the way to the root, then this marked tree should have no union node which has more than one marked edge with a child. From this it should be clear that for any two nodes in such a set, the way that their paths to the root avoid violating this constraint is that the join up in some variant node because it has at least two union members. Here is the set of all coherent unions for Status above

  • {Status}
  • {Status/Poor}
  • {Status/MiddleClass.father}
  • {Status/MiddleClass.mother}
  • {Status/MiddleClass.father, Status/MiddleClass.mother}
  • {Status/Rich}

where each node is represented by its path from the root. Computing the set of of all such matches is trivially done recursively by coherent_unions(ε, tree(Status))

  • coherent_unions(s, T[UNION||name, (ε, T_1), ..., (ε, T_N)]) = {s||name} U coherent_unions(s||name/,T_1) U ... U coherent_unions(s||name/,T_N)
  • coherent_unions(s, T[VARIANT||name, (f_1, T_1), ... , (f_N, T_N)]) = coherent_unions(s||name||.||f_1,T_1) U ... U coherent_unions(s||name||.||f_N,T_N) U all_combined_coherent_unions(s, name, (f_1, T_1), ... , (f_N, T_N))
  • coherent_unions(Node[VARIANT||name]) = Ø

where all_combined_coherent_unions(s, name, (f_1, T_1), ... , (f_N, T_N)) will for be the union of

  • for each non-empty subset of inputs T_{g_1},...,T_{g_M} where M>1 do the next step
  • compute C_{g_i} = coherent_unions(s||name||.||f_{g_i},T_{g_i})
  • return flattened version of C_{g_1} X ... X C_{g_M}, i.e. where tuples are turned into sets.

GraphQL API

The key goal of the generated API for entities that have one or more algebraic member types is to allow for safe, expressive and practical queries that are sensitive to the algebraic structure. The key observation in the resulting API is that such queries each correspond to the concept of a coherent union, defined prior, as follows.

  1. Pick an entity type which has algebraic union member fields f_1: T_1,...,f_N:T_N. Notice that we allow for the entity type to have multiple union member fields.
  2. Compute the set of coherent unions C_i for each union type T_i.
  3. For each subset of fields f_{g_1}, ..., f_{g_M} do the next steps.
  4. For each (c_1,...,c_M) in C_{g_1}x...xC_{g_M} do the next steps. Recall that c_i is a set of union nodes.
  5. For each (n_1,...,n_M) in c_1,...,c_M do the next steps.
  6. For each (v_1,...,v_M) where v_i is a child node of n_i, generate the following query:
  • accepts where and order-by OpenCrud input types for each v_i if type is suitable (i.e. has comparable fields), and also pagination inputs.
  • returns a type which is the result of taking the initial entity type, taking each v_i and replacing n_i in the type tree to which it corresponds.

Database

Here the idea is very simple, simply take the table for any entity type and fully flatten, in the natural way, the type tree of any algebraic type. Union case indicators should be encoded as database level enumerated types. There should also

The TypeORM embedded entities approach may be a natural way to do this at the query node level, as it allows for the same class for an algebraic type to be reused across multiple fields in an entity, or multiple entities, as that is possible.

The most important thing is to capture as many constraints as possible at both the ORM and database level. The former will give the mapping author an ergonomic and safe interface to work with. It should indeed be possible to offer them a fully statically type safe interface for working with a given generated schema. The latter will protect against inadvertent representational corruption, for example when writing migration or initialisation code in the future.

@dzhelezov
Copy link
Contributor

dzhelezov commented Jun 29, 2020

Supporting variant types requires different design trade-offs in order to address both Postgres and GraphQL limitations.

GraphQL input types

The main limitation at the GraphQL side is the lack of union input types: graphql/graphql-js#1196
At the database (e.g. Postgres) level the fundamental limitation is the "flat" structure of the tables, that is, the set of columns = entity properties is fixed by the table schema.

There are no mutations, and the API should offer a set of self-describing queries. A typical query returns a list of entities (more precisely, a specified set of fields) filtered by the where clause.

A query per variant type

The approach described above suggests reflecting the type system by defining a lot of separate queries for each combination of the fields the fully flattened entity may have. In the example above, the flattened version ofEntityA.status have fields hates, isMale, bank, father, mother, coming from the different realizations of the status field. The main advantage of having a separate query for each possible realization is the "code completion" feature provided for each input type, tailored to each variant of the union type. The drawback is that the number of queries explodes if there are entities with multiple fields of algebraic type and/or if the input types form a complex tree as in the example above.

A bag of input fields approach

An alternative approach is to combine all the possible input fields into single 'JUMBO-INPUT' type:

EnitityAWhereInput {
  hates_eq: String,
  isMale_eq: boolean,
  bank: String,
  father_hates_eq: String,
  father_isMale_eq: String, 
  mother_hates_eq: String,
  mother_isMale_eq: String,
  ...
}

Many combinations of the input fields are impossible, but that would simply return an empty list.
The benefit of this approach is a conceptual and implementation simplicity (both at the Warthog-middleware and TypeORM level). The drawback is that the API consumer has to have an out-of-band knowledge of which input fields are actually compatible.

JSON type in InputTypes

A more flexible approach is to support free JSON in the where input. More concretely, in the example above, EntityAWhereInput will have, among others, a JSON field status_json. Such an input field accepts arbitrary JSON object, with the scalar attribute filtered according to OpenCRUD spec. For example, the query for selecting EntityA where the status is MiddleClass and the father is HappyPoor is going to be

query {
	accounts(
           where: { 
               status_json: { 
                  father: { isMale_eq: true }
               }
           }
        ) {
           id
           status {
                __typename 
                ... on MiddleClass {
                     father {
                         ... on HappyPoor {
                             isMale
                         }
                    }
                    mother {          
                        ... on Miserable {
                            hates
                        }
                   }
               }
            } 
        }
}

We adopt the latter option (JSON input type)

Implementation details

Under the hood, the union types are stored as jsonb type columns, [natively supported by PostgresSQL] (https://www.postgresql.org/docs/current/functions-json.html). This enables a rather straightforward translation from the JSON field of the WhereInput into the json-specific SQL syntax of Postgres (this part is implemented in warthog).

The TypeScript types and classes are created using the GraphQL union type integration provided by the type-grapqhl library. This gives type-safety for the entity classes, which can later be exported for independent external use (as done by the indexer mappers).

Current limitations

The following constraints must hold, otherwise a validation error will be thrown.

  • A @variant must be an ObjectType (scalars, unions, and interfaces are not allowed).
  • A variant type field must be either of
    • scalar type
    • union type
      No entity references are allowed.
  • Entity types cannot have fields of variant type (but union types are allowed).
  • A union type must be composed exclusively out of @variant types.

@bedeho bedeho transferred this issue from Joystream/joystream Dec 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants