-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add mechanism to check analysis before it's executed #215
Conversation
Implement basic checking for source and filter-category nodes for testing purposes.
rename some variables to camelCase use configuration limits for computeRequirements
jshint doesn't like == the semantics change a little, but for our purposes === will work
A note about aliased nodesLet's consider this analysis definition: {
id: 'a2',
type: 'point-in-polygon',
params: {
points_source: {
id: 'airbnb_rooms',
type: 'source',
params: {
query: 'SELECT * FROM airbnb_rooms'
}
},
polygons_source: {
id: 'a1',
type: 'trade-area',
params: {
source: {
id: 'a0',
type: 'source',
params: {
query: 'SELECT * FROM airbnb_rooms'
}
},
kind: 'car',
time: 100,
isolines: 1,
dissolved: false
}
}
}
}; The In this case we have two nodes, If But if we get the topologically sorted nodes we will have In If we only assign requirements to the visited Nodes, a node such as This is fixed in 9808d6c by replicating the requirements to all Nodes having the same id. Note that the naive solution in that commit introduces O(n2) time complexity on the total number of nodes n; |
173a8d9
to
3b40450
Compare
How to define per-node limits to be checked before performing an analysis.Currently we have introduced only one limit: this.estimatedRequirements = { numberOfRows: 1000 };
this.limits = { maximumNumberOfRows: 1000000 }; In addition, the function We have a default The function In the limits object global limits to apply to all analysis can be passed, and also limits for specific analysis types. For example: limits: {
analyses: {
maximumNumberOfRows: 100000, // general limit
'trade-area': {
maximumNumberOfRows: 10000 // specifc limit
}
}
} The compute requirements functions should use the passed limits object and have a default value for any limit. This can be fetched using the For an example, take a look at how checking has been added for line-creating analyses: 700a116 |
Hey @dgaubert can you look at this? cc/ @rafatower @ethervoid |
|
||
AggregateIntersection.prototype.computeRequirements = function(databaseService, limits, callback) { | ||
// we estimate the maximum possible number of rows of the result | ||
var product = this.source.estimatedRequirements.numberOfRows * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is not consider good practice, should I define an accessor and use this.source.getEstimatedRequirements().numberOfRows
instead?
(your opinion will be welcome, @dgaubert )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have several places where we don't follow getter/setter
pattern. IMHO in Javascript is not useful, you can always get a member property directly with .
. My advice is to avoid the Cannot read property 'wadus' of null
using default values and so on.
numberOfRows: maxRows | ||
}; | ||
this.limits = { | ||
maximumNumberOfRows: getNodeLimit(limits, this.getType(), 'maximumNumberOfRows', 1000000) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since limit values should be definable in Redis, maybe we should use snake_case instead of camelCase?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you mean maximumNumberOfRows
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, having maximum_number_of_rows
instead.
dcd7825
to
700a116
Compare
If a timeout occurs while estimating requirements consider the query too complex and reject it.
@dgaubert did you have a chance to take a further look? I'm more or less familiar with the code so I can walk you through the changes if you want. |
@rafatower I'm dealing with some test weird stuff in SQL-API. This afternoon let's take a look of it. Sorry! |
sure, no problem... and thanks a lot :) |
We have decided to release this ASAP, so I'll take care of the most relevant issues and will leave the rest for a second iteration. |
Let me know when you feel ready to review again! |
Hey @dgaubert, I'll be testing this in staging, but you can already review the changes whenever it suits you |
So that errors are properly shown in the UI
Also change default number of max rows to unlimited for source nodes, but keep a limit of 1M as a default for other kind of nodes.
General limits are removed; only analysis with specific limits will be limited.
Avoid the number of categories estimation and go for a conservative estimation (potentially all input points could end up on the same line).
I think is ready to be deployed, @dgaubert can you take a last look at it? The current kind of analysis are limited; the default limits can be overriden in redis:
I have place some limits in other analysis not initially considered:
|
@@ -80,6 +82,11 @@ AnalysisFactory.prototype.create = function(configuration, definition, callback) | |||
return done(err, analysis); | |||
}); | |||
}, | |||
function analysis$checkLimits(analysis, done) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we check limits after registering analysis? We are going to have a wrong status in cdb_analysis_catalog
if analysis reaches some limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oops, I meant to do it before registration... I'm fixing it
@rochoa please take a look at this |
@@ -468,3 +471,44 @@ function validate(validator, params, expectedParamName) { | |||
|
|||
return param; | |||
} | |||
|
|||
Node.prototype.computeRequirements = function(databaseService, limits, callback) { | |||
var default_limit = null; // no limit by default |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we use something like Infinity
here?
var messages = []; | ||
|
||
// Note that any falsy value (null, 0, '', ...) means no limits: | ||
if (this.requirements.getLimit('maximumNumberOfRows')) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would hide these details in requirements
,
@@ -1,3 +1,4 @@ | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙈
Hey, don't consider my small comments as a blocker for this. As we will work on #220, I'm pretty much OK with this first iteration. The limitations and caveats are well defined, so let's . BTW, good catch about Thanks a lot for working on this one! |
Could we add a "limits" section in the camshaft reference so the UI can use them to show warnings? like "aggregate-intersection": {
"params": {
"source": {
"type": "node",
"geometry": [
"*"
]
limits: {
max_input_rows: 100000
}
},
...
}
cc @rochoa |
The problem with integrating directly in the reference is that this is something dynamic, you might want to limit to 100k rows in some cases for an analysis but it's OK in others: depending on your hardware, for instance. |
See #214
Open questions:
QueryRunner
instead of aDatabaseService
to theRequirements
constructor?ANALYZE
for a fast estimation of the rows in a query, and we're doing nothing if this fails (e.g. for lack of stats). We should handle that case or return to useCOUNT
(see a6aae8d)Pending tasks
or if we instead let each node checks whatever it decides to checkwe'll start with number of rows onlycomputeRequirements
for them.Some things to look at at a later time: