MongoDB - Support for user-defined schemas #45130
Replies: 2 comments 3 replies
-
The main problem with mongodb, is historical data. As you applications evolve you have different needs so your schema evolves. In mongodb every attribute is actually stored on the disk other than relational databases that just have null as a pointer. So to choose not to migrate millions of old documents has an impact on the used diskspace and memory. Also in a multi-tenant system some datasets can be empty. Multiple teams can write documents and only provide certain data, historically, maybe legacy systems and else. So for us we see that it would be important to have at least the possibility to edit the schema. We used other systems in the past before we switched to airbyte cloud, we needed to switch to self hosted due to budgeting changes. |
Beta Was this translation helpful? Give feedback.
-
Stumbled upon that issue while trying to sync mongo to postgres, at the moment it requires us to cast types on column that would happen to become jsonb instead of his actual type. Being able to cast it on Airbyte directly sounds like a great idea |
Beta Was this translation helpful? Give feedback.
-
Airbyte is a schema-aware application, as are most of the destinations we sync to (e.g data warehouses with a fixed set of columns). For Airbyte's MongoDB source, we have 2 methods of discovering the schema:
_id
and the data in adata
blob.Option one (schema enforced syncs) work well for MongoDB collections with similar-ish properties on all objects. If you have a collection with differently shaped objects, sampling will likely miss some properties.
A third option was presented in #42862, which would be to allow users with varying objects in the collection to provide a schema they want to use for the sync (e.g. via a JSONSchema document).
Are you interested in Airbyte's MongoDB providing this new "describe your schema explicitly" option for schema-aware syncs?
Beta Was this translation helpful? Give feedback.
All reactions