-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Volatile Tables #31
Comments
@DEVi1wing , thank you for the feature request. I've discussed adding support for volatile tables with several users of
The problem might be with how dbt manages sessions. As far as I can tell, dbt creates/closes sessions liberally. Even if you run with a single thread, there will be many sessions involved. Does your data indicate that a single-threaded dbt run with volatile tables will be more performant than many threads but no volatile tables? |
The config you provided is helpful, but the behavior is exactly what you alluded to. The table is created, the session ends, and the table is gone before we are able to reference it in subsequent SQL files. Is there a way to force the session to stay open so it does not evaporate? Is there a way to force DBT to potentially create the volatile table in the same SQL file as the file table will be generated? As for your question, yes, we have taken to using volatile tables as a standard practice in our development as they have been much more performant in many scenarios. We can always work around it by creating standard tables and dropping indices on them, but we still have the cleanup challenge afterwards. |
We may attempt to cache the connection in I understand that volatile tables are better than materialized ones. dbt gives you other options like ephemeral materializations and multi-threaded dag processing. So the question is the performance of volatile tables is better than parallel dag processing + appropriate table/view/ephemeral/incremental materializations. Before we start exploring a solution that stretches the dbt interface, could you please share you dag (could be the one generated by |
One example would be if you are using intermediate models which are scheduled at different times but are then added to target model (incremental model). Here it would be beneficial for performance to create multiple volatile tables with indexes but without having to write the data to tables and then cleaning them up. An example intermediate_source1 --> intermediate_source1_transform --> my_target_model (incremental) [run with tag:source1] where the two sources are run 5 hours apart but need to be available as soon as possible for analysis. At the same time it would be nice to not have cleanup post-hooks on the downstream models. This is why volatile materialization would be nice. |
Describe the feature
Volatile tables are a great asset to Teradata since they can be an efficient way to break queries into intermediate steps while still benefiting from things like indices and statistics that subqueries do not have. We would love to be able to define a table as volatile with indices and stat collections in dbt SQL models and then allow the system to drop them once the session ends.
Describe alternatives you've considered
Creating separate tables with indices on them. We cannot drop them through DBT, however, when the work is complete and they clutter up the database.
Additional context
Volatile tables will afford significant performance enhancements when working with large numbers of rows.
Who will this benefit?
What kind of use case will this feature be useful for? Please be specific and provide examples, this will help us prioritize properly.
This will be useful for anyone leveraging dbt with Teradata. One of our current barriers to adoption is we can create procedures that allow for volatile tables and are very performant. This dbt implementation, however, allows us to achieve the same result but it takes much longer to process or it leaves intermediate tables behind that clutter up the database and need to be deleted outside of dbt.
Are you interested in contributing this feature?
Let us know if you want to write some code, and how we can help.
The text was updated successfully, but these errors were encountered: