Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generically derived schema reference names can clash #56

Open
brprice opened this issue Aug 10, 2022 · 3 comments
Open

Generically derived schema reference names can clash #56

brprice opened this issue Aug 10, 2022 · 3 comments

Comments

@brprice
Copy link

brprice commented Aug 10, 2022

With schemas for polymorphic types, we can have names of schemas clashing, leading to incorrect generation (i.e. inconsistent with aeson's generic ToJSON). This can happen if one has a type data T_A and also data T a & data A: the generated schema names for T_A and T A are identical!

As a complete example, consider the program

{-# LANGUAGE DeriveGeneric #-}
{-# LANGUAGE TypeApplications #-}

module Main where

import qualified Data.ByteString.Lazy.Char8 as B
import Data.OpenApi
import Data.OpenApi.Declare
import Data.OpenApi.Internal.Utils
import Data.Proxy
import Data.Typeable
import GHC.Generics

data A = A {x :: T_B, y :: T B}
 deriving Generic

data B = B

data T a = T {foo :: ()}
  deriving Generic

data T_B = T_B {bar :: Bool, quux :: Int}
  deriving Generic

instance ToSchema A
instance Typeable a => ToSchema (T a)
instance ToSchema T_B

main :: IO ()
main = B.putStrLn $ encodePretty $  execDeclare (declareSchemaRef $ Proxy @A) mempty

which has the output

{
    "A": {
        "properties": {
            "x": {
                "$ref": "#/components/schemas/T_B"
            },
            "y": {
                "$ref": "#/components/schemas/T_B"
            }
        },
        "required": [
            "x",
            "y"
        ],
        "type": "object"
    },
    "T_B": {
        "properties": {
            "bar": {
                "type": "boolean"
            },
            "quux": {
                "maximum": 9223372036854775807,
                "minimum": -9223372036854775808,
                "type": "integer"
            }
        },
        "required": [
            "bar",
            "quux"
        ],
        "type": "object"
    }
}

Notice that the two T_B schemas have overwritten each other -- a serialised value of type A will not match this schema.

Since types cannot contain the character -, one possible solution would be to change the serialisation introduced in #19 to use unspace ' ' = '-', rather than unspace ' ' = '_'.

@brprice brprice changed the title Genericly derived schema reference names can clash Generically derived schema reference names can clash Aug 10, 2022
@stevladimir
Copy link
Contributor

We observe similar issue with monomorphic types having identical names, but defined in different packages. I can assume that the same may happen inside single package if two modules define types with identical names.
IIUC all these problems arise from that only type names are used for constructing the final name.
An obvious solution to cover all these cases is to use package_name + module_name + type_name_itself for final type name construction. Generic representation has access to all these bits, so technically this is easily doable. In that case T A and T_A would end up as pkg_T_pkg_A and pkg_T_A which are distinct (omitted modules for shortness). Similarly, in our case that would be pkg1_Foo and pkg2_Foo.
The downside of this approach is that names become too long and ugly(?) and in 95% of cases we don't need this redundancy. As a solution to this one could use (package, module, type_name) (I believe something like that, i.e. unique type identifier, exists in TH, Cabal or other core packages, which we can re-use or use as implementation reference) as a key internally and smarter rendering of that tree, which would choose minimal combination of the 3 pieces above guaranteeing uniqueness. E.g. type_name if there are no collisions (what we have now), pkg_name+type_name (if 2 packages provide the same type) etc. Basically, the shorter one which is enough to uniquely identify the type across the whole API.
The question is how significant are required changes and how this will impact performance, maintainability etc

@brprice
Copy link
Author

brprice commented Oct 6, 2022

The downside of this approach is that names become too long and ugly(?) and in 95% of cases we don't need this redundancy.

Agree

As a solution to this one could use (package, module, type_name) (I believe something like that, i.e. unique type identifier, exists in TH, Cabal or other core packages, which we can re-use or use as implementation reference) as a key internally and smarter rendering of that tree, which would choose minimal combination of the 3 pieces above guaranteeing uniqueness.

However, I am worried about unintended consequences here. It seems like this approach would make it possible that adding a new type to your schema (/ adding a new route to your API) could change the names of pre-existing types. To me this would be very unexpected.

One (fairly manual) way around this would be to make the programmer choose how much disambiguation is required on a type-by-type basis (with some encoding such that a new definition can always be made distinct from any previous choices). Probably a deriving via approach à la deriving-aeson could work for this.

@stevladimir
Copy link
Contributor

It seems like this approach would make it possible that adding a new type to your schema (/ adding a new route to your API) could change the names of pre-existing types. To me this would be very unexpected.

Yeah, that's a downside. However, it is a relatively rare case which will occur only when changes introduce/remove name collisions and will affect only paths using relevant types.

Also the behavior could be controlled with a flag, so end-user could choose what is more preferable: API names stability or readability at cost of potential name shift. Conceptually,

data RenderNameOption
  = AlwaysFullName
  -- ^ Always use package+module+type
  | Compact
  -- ^ Like current behavior, but when required use package/module or their combination to prevent name clashes

One (fairly manual) way around this would be to make the programmer choose how much disambiguation is required on a type-by-type basis (with some encoding such that a new definition can always be made distinct from any previous choices). Probably a deriving via approach à la deriving-aeson could work for this.

Yeah, it's too "manual" but will work. Another drawback is that one needs tests to timely catch potential collisions.

As another option I can also imagine compile-time check for name conflicts (though not sure it is doable, especially, for T a vs T_a case), but another practical problem common for approach you've mentioned and compile-time check:

  1. Some library one does not control:
module Foo1

data Foo
  1. Another library
module Foo2

data Foo
  1. My module
import Foo1 qualified
import Foo2 qualified

type API = Get Foo1.Foo :<|> Get Foo2.Foo

You can alter neither Foo1 nor Foo2, hence cannot affect name generation. The solution exists: newtype one of the conflicting names, but that's not ideal.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants