Analysis limits checking, second round #220

jgoizueta · 2016-11-04T11:50:51Z

We have implemented some interim limits checking in #215 , but we know it
to have some shortcomings (the most important being that it requires significant logic to be implemented at the node classes to estimate requirements and check the limits).

We will refactor/reimplement this feature removing that logic from the nodes: ideally, to define a new type of analysis one has only to declare its parameters and provide sql templates to generate it's queries.

Nodes will not be required to provide functions to estimate requirements or check its limits, instead they'll define their requirements in a declarative kind of way, passing data that will be used by limit-checking functions defined elsewhere.

For example, when defining a node class for a new kind of analysis we could pass limit-checking defining data to the Node factory as follows:

LIMITS = {
  // enable a limit checker that limits then number of estimated output rows,
  // with a default value of 10000 rows (that can be overriden in the configuration)
  max_output_rows: 100000,
  // enable a limit on the product of the estimated number of rows of each input:
  max_input_rows_product: 10000,
  // define data services quota consumptions
  dataservices: {
    // this analysis will require a geocoding quota unit per output row:
    gecoding: 1
  }
}

var MyAnalysisNode = Node.create(TYPE, PARAMS, LIMITS);

In the configuration passed to Camshaft (e.g. from Windshaft-Cartodb) we could provide data to override the limits for specific analysis types and also functions to check the dataservices quotas.

The limit-checking functions will need a way to obtain the estimated number of rows
of a node (and its inputs). Two possible approaches are:

For each node, individually, execute EXPLAIN this.sql() in the user database. This does not require any implementation for each node class.
For each node have a function that computes the estimation and stores it in the node
(as in Add mechanism to check analysis before it's executed #215 a node could use SQL like EXPLAIN or COUNT or it could use its inputs' estimations)

We'll need to take examples from the most relevant kind of analyses and compare the kind of results we can obtain from each approach.

This release introduces analysis pre-checking, see #215 This is the first attempt at this, see #220 for intended changes.

jgoizueta mentioned this issue Nov 4, 2016

Add mechanism to check analysis before it's executed #215

Merged

4 tasks

jgoizueta added a commit that referenced this issue Nov 10, 2016

Release 0.48.0

09540bc

This release introduces analysis pre-checking, see #215 This is the first attempt at this, see #220 for intended changes.

jgoizueta mentioned this issue Nov 29, 2016

Unify getting limit values and checking them #237

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Analysis limits checking, second round #220

Analysis limits checking, second round #220

jgoizueta commented Nov 4, 2016 •

edited

Loading

Analysis limits checking, second round #220

Analysis limits checking, second round #220

Comments

jgoizueta commented Nov 4, 2016 • edited Loading

jgoizueta commented Nov 4, 2016 •

edited

Loading