Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(data-warehouse): Override external data table fields #20997

Merged
merged 13 commits into from
Mar 20, 2024

Conversation

Gilbert09
Copy link
Member

Problem

  • We want to be able to change the data structure of external data tables without actually modifying the underlying data. This is so that we can do some table modeling and clean-up, such as parsing Unix timestamps into a DateTime type

Changes

  • Allows the hogql fields of external data tables to be overridden in code
  • Adds functionality to "hide" database fields in hogql:
    • This removes the field from autocomplete, and
    • removes the field from getting returned via *
    • But, the field can still be queried directly if need be, we just don't expose the names
  • Also adds the ability for expression fields to get returned from * queries (effectively reverting feat(query-engine): Remove expression fields from asterisk expansion #19239)
  • Adds some overrides for the stripe_customer table as an example
    • This parses the created field from stripe (a unix timestamp) into created_at (a DateTime clickhouse type)

How did you test this code?

  • Unit tests

Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had a look, and seems good. I'm not sure about one part through (replied inline).

Also, we now have hidden aliases and hidden fields, where the meaning of "hidden" is slightly different. I wonder if we should disambiguate? 🤔 🤷

"object": StringDatabaseField(name="object"),
"address": StringJSONDatabaseField(name="address"),
"balance": IntegerDatabaseField(name="balance"),
"__created": IntegerDatabaseField(name="created", hidden=True),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I can imagine users having some of their own tables with __-prefixed fields, and someone someday having some conflict, if we expand this further or make it automatic. Should we be even harsher with something like:

        "__x_created": IntegerDatabaseField(name="created", hidden=True),

No strong feelings, but thought it worth mentioning.

Comment on lines 250 to 252
def visit_expression_field_type(self, node: ast.ExpressionFieldType):
self.visit(node.expr)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this is correct. In the resolver, when we resolve expression fields, we clone this node.expr and swap out the node with ExpressionFieldType with the cloned expr.

The change here probably has some side effects, and/or might mutate the "reference expression" instead of cloning and applying. Not sure, but it might be worth double checking.

Copy link
Member Author

@Gilbert09 Gilbert09 Mar 19, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, yeah, some tests were failing due to TraversingVisitor not having a visit_expression_field_type func - we do pass on some other _type funcs, which may be better here than visiting the expression

Copy link
Collaborator

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 👍

The only thing you should keep in mind is that the expression fields aren't perfect (finished). The code that swaps the field out with the expression does a very simple clone_expr operation. This means that under certain conditions, the expression field will "resolve" to garbage.

For example this nonsense query:

select * 
from stripe_invoices a 
left join stripe_invoices b 
on a.amount_paid = b.amount_paid
where a.period_start_at < b.period_start_at

will be swapped with:

select * 
from stripe_invoices a 
left join stripe_invoices b 
on a.amount_paid = b.amount_paid
where __period_start < __period_start

I think to solve it, we should update the cloning code here, and push a fake ast.SelectQueryType onto self.scopes (with just that one table) before visiting and pop it after. That should give everything inside the expression field the right type, and they should get the a.__period_start added automatically.

@Gilbert09 Gilbert09 merged commit de67c1c into master Mar 20, 2024
125 checks passed
@Gilbert09 Gilbert09 deleted the tom/stripe-view-1 branch March 20, 2024 07:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants