You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have implemented some interim limits checking in #215 , but we know it
to have some shortcomings (the most important being that it requires significant logic to be implemented at the node classes to estimate requirements and check the limits).
We will refactor/reimplement this feature removing that logic from the nodes: ideally, to define a new type of analysis one has only to declare its parameters and provide sql templates to generate it's queries.
Nodes will not be required to provide functions to estimate requirements or check its limits, instead they'll define their requirements in a declarative kind of way, passing data that will be used by limit-checking functions defined elsewhere.
For example, when defining a node class for a new kind of analysis we could pass limit-checking defining data to the Node factory as follows:
LIMITS={// enable a limit checker that limits then number of estimated output rows,// with a default value of 10000 rows (that can be overriden in the configuration)max_output_rows: 100000,// enable a limit on the product of the estimated number of rows of each input:max_input_rows_product: 10000,// define data services quota consumptionsdataservices: {// this analysis will require a geocoding quota unit per output row:gecoding: 1}}varMyAnalysisNode=Node.create(TYPE,PARAMS,LIMITS);
In the configuration passed to Camshaft (e.g. from Windshaft-Cartodb) we could provide data to override the limits for specific analysis types and also functions to check the dataservices quotas.
The limit-checking functions will need a way to obtain the estimated number of rows
of a node (and its inputs). Two possible approaches are:
For each node, individually, execute EXPLAIN this.sql() in the user database. This does not require any implementation for each node class.
For each node have a function that computes the estimation and stores it in the node
(as in Add mechanism to check analysis before it's executed #215 a node could use SQL like EXPLAIN or COUNT or it could use its inputs' estimations)
We'll need to take examples from the most relevant kind of analyses and compare the kind of results we can obtain from each approach.
The text was updated successfully, but these errors were encountered:
We have implemented some interim limits checking in #215 , but we know it
to have some shortcomings (the most important being that it requires significant logic to be implemented at the node classes to estimate requirements and check the limits).
We will refactor/reimplement this feature removing that logic from the nodes: ideally, to define a new type of analysis one has only to declare its parameters and provide sql templates to generate it's queries.
Nodes will not be required to provide functions to estimate requirements or check its limits, instead they'll define their requirements in a declarative kind of way, passing data that will be used by limit-checking functions defined elsewhere.
For example, when defining a node class for a new kind of analysis we could pass limit-checking defining data to the Node factory as follows:
In the configuration passed to Camshaft (e.g. from Windshaft-Cartodb) we could provide data to override the limits for specific analysis types and also functions to check the dataservices quotas.
The limit-checking functions will need a way to obtain the estimated number of rows
of a node (and its inputs). Two possible approaches are:
(as in Add mechanism to check analysis before it's executed #215 a node could use SQL like EXPLAIN or COUNT or it could use its inputs' estimations)
We'll need to take examples from the most relevant kind of analyses and compare the kind of results we can obtain from each approach.
The text was updated successfully, but these errors were encountered: