-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: adds ADR 12. Clickhouse vs dbt #112
Merged
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
80409ad
docs: adds DBT concept documentation
pomegranited 5b5e01d
fix: addresses review comments
pomegranited a058d53
fix: use :ref: for internal links
pomegranited 18cc1da
docs: adds ADR 12. Clickhouse vs dbt
pomegranited d03cde9
Merge branch 'main' into jill/adr-dbt
pomegranited 9664708
docs: address PR review
pomegranited f4d0d05
docs: adds ADR 13. Clickhouse experimental and beta features (#113)
pomegranited File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,7 +1,7 @@ | ||
.. _xapi-concepts: | ||
|
||
xAPI Concepts | ||
************* | ||
xAPI | ||
**** | ||
|
||
Introduction | ||
################### | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
12. Clickhouse vs dbt | ||
##################### | ||
|
||
Status | ||
****** | ||
|
||
Accepted | ||
|
||
Context | ||
******* | ||
|
||
Most of the use cases for Aspects require complex transformations to be performed on the data stored in | ||
:ref:`clickhouse` before it can be displayed in :ref:`superset`. While :ref:`xapi-concepts` is a good format for | ||
communicating learning-related events as they happen, it's not the best format for understanding the effect of these | ||
events over time. In order to analyse and draw conclusions for what these events signify over time, Aspects needs | ||
efficient views of the data that are designed specifically for the queries being performed. | ||
|
||
:ref:`clickhouse` provides some features for transforming complex data, but creating efficient views and managing schema | ||
changes over time can be difficult. Plus, some of these features (like named collections) are not available on all | ||
hosted :ref:`clickhouse` services. | ||
|
||
This ADR describes the division of responsibilities in Aspects between the :ref:`clickhouse` database and :ref:`dbt`, a | ||
data transformation tool. | ||
|
||
Decision | ||
******** | ||
|
||
**Store in Clickhouse, transform with dbt** | ||
|
||
Raw event and event sink data tables should continue to be created using :ref:`clickhouse-migrations`. | ||
|
||
Data transformations on this raw data should be made with :ref:`dbt`, including: | ||
|
||
* materialized views | ||
* partitions | ||
* dictionaries (pending support, see `tutor-contrib-aspects#565`_) | ||
* fields extracted from event JSON | ||
|
||
.. note:: | ||
|
||
We use dbt to manage the database schema that performs the transformations, but the transformations themselves happen | ||
in the ClickHouse process. As data is inserted it is immediately transformed and stored in the various query-efficient | ||
tables. | ||
|
||
Consequences | ||
************ | ||
|
||
#. Contribute upstream to `dbt-clickhouse`_ where support for required features is missing, where possible. | ||
#. Move transformations made by the "query" and "dataset" Aspects Superset assets to `aspects-dbt`_. | ||
#. Move dictionaries and partitions originally created using :ref:`clickhouse-migrations` to `aspects-dbt`_. | ||
#. Squash remaining alembic migrations. | ||
|
||
Rejected Alternatives | ||
********************* | ||
|
||
**Use native Clickhouse transforms instead of dbt** | ||
|
||
This option was rejected for maintainability reasons: :ref:`dbt` was designed to manage data transformations with its | ||
package and test framework, and so is more modular and reusable, and better suited to Aspects' long-term goals. | ||
|
||
References | ||
********** | ||
|
||
* `tutor-contrib-aspects#546`_ Only recreate materialized views when necessary | ||
* `tutor-contrib-aspects#565`_ Add dictionary support to dbt-clickhouse | ||
|
||
.. _aspects-dbt: https://github.com/openedx/aspects-dbt | ||
.. _dbt-clickhouse: https://github.com/ClickHouse/dbt-clickhouse | ||
.. _tutor-contrib-aspects#546: https://github.com/openedx/tutor-contrib-aspects/issues/546 | ||
.. _tutor-contrib-aspects#565: https://github.com/openedx/tutor-contrib-aspects/issues/565 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,55 @@ | ||
13. Clickhouse experimental and beta features | ||
############################################# | ||
|
||
Status | ||
****** | ||
|
||
Accepted | ||
|
||
Context | ||
******* | ||
|
||
When the Aspects project started, Clickhouse was working on a JSON column type and expected to have it out of | ||
"experimental" status by Sep 2023. However, Clickhouse revealed that the feature development is stuck in limbo, and may | ||
never be completed. | ||
|
||
Clickhouse documentation warns that `experimental features`_ may never become accepted into the general feature list, | ||
may be removed at any time, and are not supported by the Clickhouse development team. Additionally, "experimental" | ||
features may not be available on all Clickhouse hosting providers. | ||
|
||
By contrast, `beta features`_ are on track to becoming generally-available features, and so though they may change, they | ||
are fully supported by the Clickhouse team, and will eventually be available on all Clickhouse hosting providers. | ||
|
||
Decision | ||
******** | ||
|
||
**Avoid experimental features** | ||
|
||
Find alternative solutions and avoid enabling experimental features in Clickhouse. | ||
|
||
**Use beta features with caution** | ||
|
||
Use Clickhouse `beta features`_ only if no alternative solution can be found. | ||
|
||
Plan a feature flag to disable functionality that depends on beta features, to avoid breaking deployments where | ||
Clickhouse beta features are not available. | ||
|
||
Consequences | ||
************ | ||
|
||
#. Remove JSON field types from Clickhouse table schemas, and remove references to the flag that enables them | ||
(``allow_experimental_object_type``). | ||
|
||
Rejected Alternatives | ||
********************* | ||
|
||
None | ||
|
||
References | ||
********** | ||
|
||
* `tutor-contrib-aspects#376`_ Feat: Remove JSON column from xapi_events_all | ||
|
||
.. _beta features: https://clickhouse.com/docs/en/beta-and-experimental-features#beta-features | ||
.. _experimental features: https://clickhouse.com/docs/en/beta-and-experimental-features#experimental-features | ||
.. _tutor-contrib-aspects#376: https://github.com/openedx/tutor-contrib-aspects/issues/376 |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This probably deserves a "where possible", there are several things we still have to do in Superset virtual datasets because we have to use filters in specific ways for performance reasons, but I think this is still the long term goal as ClickHouse's predicate pushdown capabilities get overhauled.