Add materialize hint to the QueryOperator #36

roveo · 2022-04-27T07:45:02Z

There are database-specific performance issues with totals and such. Some databases are better at e.g. window functions, some at joins with group by subquery. To avoid having this logic in the backend (will make creating new backends more difficult), there are multiple solutions:

Have the engine attached to the backend. def __init__(self, backend: Backend) for the Engine. Then the backend can give hints to the engine about its preferences. For example, window_functions = True class attribute would mean that the engine prefers window functions, False that it's better to use groupby + join
The engine provides a hint materialize=True to the QueryOperator, so the operator will emit a DataFrame instead of a query and the rest of the computation will be performed using Pandas. This will surely improve database performance, but might affect local performance, especially important in case of the cloud version (out-of-memory errors). Using dask might alleviate this.

The text was updated successfully, but these errors were encountered:

roveo added the engine label Apr 27, 2022

roveo added this to the Dictum 0.1.0 milestone Apr 27, 2022

roveo removed this from the Dictum 0.1.0 milestone Jul 16, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add materialize hint to the QueryOperator #36

Add materialize hint to the QueryOperator #36

roveo commented Apr 27, 2022 •

edited

Loading

Add materialize hint to the QueryOperator #36

Add materialize hint to the QueryOperator #36

Comments

roveo commented Apr 27, 2022 • edited Loading

roveo commented Apr 27, 2022 •

edited

Loading