Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis limits checking, second round #220

Open
jgoizueta opened this issue Nov 4, 2016 · 0 comments
Open

Analysis limits checking, second round #220

jgoizueta opened this issue Nov 4, 2016 · 0 comments

Comments

@jgoizueta
Copy link
Contributor

jgoizueta commented Nov 4, 2016

We have implemented some interim limits checking in #215 , but we know it
to have some shortcomings (the most important being that it requires significant logic to be implemented at the node classes to estimate requirements and check the limits).

We will refactor/reimplement this feature removing that logic from the nodes: ideally, to define a new type of analysis one has only to declare its parameters and provide sql templates to generate it's queries.

Nodes will not be required to provide functions to estimate requirements or check its limits, instead they'll define their requirements in a declarative kind of way, passing data that will be used by limit-checking functions defined elsewhere.

For example, when defining a node class for a new kind of analysis we could pass limit-checking defining data to the Node factory as follows:

LIMITS = {
  // enable a limit checker that limits then number of estimated output rows,
  // with a default value of 10000 rows (that can be overriden in the configuration)
  max_output_rows: 100000,
  // enable a limit on the product of the estimated number of rows of each input:
  max_input_rows_product: 10000,
  // define data services quota consumptions
  dataservices: {
    // this analysis will require a geocoding quota unit per output row:
    gecoding: 1
  }
}

var MyAnalysisNode = Node.create(TYPE, PARAMS, LIMITS);

In the configuration passed to Camshaft (e.g. from Windshaft-Cartodb) we could provide data to override the limits for specific analysis types and also functions to check the dataservices quotas.

The limit-checking functions will need a way to obtain the estimated number of rows
of a node (and its inputs). Two possible approaches are:

  • For each node, individually, execute EXPLAIN this.sql() in the user database. This does not require any implementation for each node class.
  • For each node have a function that computes the estimation and stores it in the node
    (as in Add mechanism to check analysis before it's executed #215 a node could use SQL like EXPLAIN or COUNT or it could use its inputs' estimations)

We'll need to take examples from the most relevant kind of analyses and compare the kind of results we can obtain from each approach.

jgoizueta added a commit that referenced this issue Nov 10, 2016
This release introduces analysis pre-checking, see #215
This is the first attempt at this, see #220 for intended changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant