Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: adds ADR 12. Clickhouse vs dbt #112

Merged
merged 7 commits into from
Jan 25, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/concepts/xapi_concepts.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _xapi-concepts:

xAPI Concepts
*************
xAPI
****

Introduction
###################
Expand Down
4 changes: 2 additions & 2 deletions docs/decisions/0007_clickhouse_migrations.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _clickhouse-migrations:

7. Alembic migrations for Aspects
#################################
7. Alembic migrations
#####################

Status
******
Expand Down
70 changes: 70 additions & 0 deletions docs/decisions/0012_clickhouse_dbt.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
12. Clickhouse vs dbt
#####################

Status
******

Accepted

Context
*******

Most of the use cases for Aspects require complex transformations to be performed on the data stored in
:ref:`clickhouse` before it can be displayed in :ref:`superset`. While :ref:`xapi-concepts` is a good format for
communicating learning-related events as they happen, it's not the best format for understanding the effect of these
events over time. In order to analyse and draw conclusions for what these events signify over time, Aspects needs
efficient views of the data that are designed specifically for the queries being performed.

:ref:`clickhouse` provides some features for transforming complex data, but creating efficient views and managing schema
changes over time can be difficult. Plus, some of these features (like named collections) are not available on all
hosted :ref:`clickhouse` services.

This ADR describes the division of responsibilities in Aspects between the :ref:`clickhouse` database and :ref:`dbt`, a
data transformation tool.

Decision
********

**Store in Clickhouse, transform with dbt**

Raw event and event sink data tables should continue to be created using :ref:`clickhouse-migrations`.

Data transformations on this raw data should be made with :ref:`dbt`, including:

* materialized views
* partitions
* dictionaries (pending support, see `tutor-contrib-aspects#565`_)
* fields extracted from event JSON

.. note::

We use dbt to manage the database schema that performs the transformations, but the transformations themselves happen
in the ClickHouse process. As data is inserted it is immediately transformed and stored in the various query-efficient
tables.

Consequences
************

#. Contribute upstream to `dbt-clickhouse`_ where support for required features is missing, where possible.
#. Move transformations made by the "query" and "dataset" Aspects Superset assets to `aspects-dbt`_.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably deserves a "where possible", there are several things we still have to do in Superset virtual datasets because we have to use filters in specific ways for performance reasons, but I think this is still the long term goal as ClickHouse's predicate pushdown capabilities get overhauled.

#. Move dictionaries and partitions originally created using :ref:`clickhouse-migrations` to `aspects-dbt`_.
#. Squash remaining alembic migrations.

Rejected Alternatives
*********************

**Use native Clickhouse transforms instead of dbt**

This option was rejected for maintainability reasons: :ref:`dbt` was designed to manage data transformations with its
package and test framework, and so is more modular and reusable, and better suited to Aspects' long-term goals.

References
**********

* `tutor-contrib-aspects#546`_ Only recreate materialized views when necessary
* `tutor-contrib-aspects#565`_ Add dictionary support to dbt-clickhouse

.. _aspects-dbt: https://github.com/openedx/aspects-dbt
.. _dbt-clickhouse: https://github.com/ClickHouse/dbt-clickhouse
.. _tutor-contrib-aspects#546: https://github.com/openedx/tutor-contrib-aspects/issues/546
.. _tutor-contrib-aspects#565: https://github.com/openedx/tutor-contrib-aspects/issues/565
55 changes: 55 additions & 0 deletions docs/decisions/0013_clickhouse_experimental.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
13. Clickhouse experimental and beta features
#############################################

Status
******

Accepted

Context
*******

When the Aspects project started, Clickhouse was working on a JSON column type and expected to have it out of
"experimental" status by Sep 2023. However, Clickhouse revealed that the feature development is stuck in limbo, and may
never be completed.

Clickhouse documentation warns that `experimental features`_ may never become accepted into the general feature list,
may be removed at any time, and are not supported by the Clickhouse development team. Additionally, "experimental"
features may not be available on all Clickhouse hosting providers.

By contrast, `beta features`_ are on track to becoming generally-available features, and so though they may change, they
are fully supported by the Clickhouse team, and will eventually be available on all Clickhouse hosting providers.

Decision
********

**Avoid experimental features**

Find alternative solutions and avoid enabling experimental features in Clickhouse.

**Use beta features with caution**

Use Clickhouse `beta features`_ only if no alternative solution can be found.

Plan a feature flag to disable functionality that depends on beta features, to avoid breaking deployments where
Clickhouse beta features are not available.

Consequences
************

#. Remove JSON field types from Clickhouse table schemas, and remove references to the flag that enables them
(``allow_experimental_object_type``).

Rejected Alternatives
*********************

None

References
**********

* `tutor-contrib-aspects#376`_ Feat: Remove JSON column from xapi_events_all

.. _beta features: https://clickhouse.com/docs/en/beta-and-experimental-features#beta-features
.. _experimental features: https://clickhouse.com/docs/en/beta-and-experimental-features#experimental-features
.. _tutor-contrib-aspects#376: https://github.com/openedx/tutor-contrib-aspects/issues/376