Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mechanism to check analysis before it's executed #215

Merged
merged 29 commits into from
Nov 10, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
17925d1
Add mechanism to check analysis before it's executed
jgoizueta Oct 27, 2016
ba97e6e
Fix tests
jgoizueta Oct 27, 2016
023a5d5
:lipstick:
jgoizueta Oct 27, 2016
a6aae8d
Replace count by faster estimation of the number of rows
jgoizueta Oct 27, 2016
74b1fad
Keep precheck failed nodes status in the nodes themselves
jgoizueta Oct 27, 2016
6fc38a4
Fix base Node computeRequirements
jgoizueta Oct 27, 2016
f38729e
Compute requirements for aggregate-intersection anaylises
jgoizueta Oct 27, 2016
9808d6c
Fix requirements problem with aliased nodes
jgoizueta Oct 27, 2016
a2e4520
:lipstick:
jgoizueta Oct 27, 2016
f91a33d
Replace comparison operator
jgoizueta Oct 27, 2016
86961cf
Syntax fixes
jgoizueta Oct 27, 2016
b5d2bbd
Fix Node limit computation
jgoizueta Oct 28, 2016
880fa16
Fix requirements & limits names
jgoizueta Oct 28, 2016
3b40450
Integration tests for requirements/limits
jgoizueta Oct 28, 2016
700a116
Add prechecking for line-creation analyses
jgoizueta Oct 28, 2016
fa211d9
Add prechecking for geocoding returning polygons
jgoizueta Oct 28, 2016
e36669e
Avoid using internal details of DatabaseService
jgoizueta Oct 28, 2016
6d6d71c
Handle SQL timeouts during pre-checks
jgoizueta Oct 28, 2016
3d26bfe
Remove testing remnant
jgoizueta Nov 4, 2016
e2e6e59
Estimate requirements and check limits in sigle step before registration
jgoizueta Nov 4, 2016
baf428c
Expose node id in limit-rejected analyses
jgoizueta Nov 4, 2016
722de54
:lipstick:
jgoizueta Nov 4, 2016
1309a75
Allow unlimited number of rows for a node's output.
jgoizueta Nov 7, 2016
27782e6
Move some limit-checking functionality to Requirements Class
jgoizueta Nov 8, 2016
4d97ee2
Don't limit analyses by number of output rows in general
jgoizueta Nov 8, 2016
e88b88e
Simplify sequential line requirement estimation.
jgoizueta Nov 8, 2016
aa12fef
Fix tests
jgoizueta Nov 8, 2016
e931566
Perform limit-checking before registering the analysis
jgoizueta Nov 8, 2016
b247f61
Raise limit for points per sequential line
jgoizueta Nov 10, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions lib/analysis.js
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ var toposort = require('../lib/dag/toposort');
var validator = require('../lib/dag/validator');

var DatabaseService = require('./service/database');
var Requirements = require('./service/requirements');

var AnalysisLogger = require('./logging/logger');

Expand Down Expand Up @@ -60,6 +61,7 @@ AnalysisFactory.prototype.create = function(configuration, definition, callback)
configuration.batch,
configuration.limits
);
var requirements = new Requirements(databaseService, configuration.limits);
var logger = configuration.logger ? new AnalysisLogger(configuration.logger.stream, configuration.user) : undefined;

async.waterfall(
Expand All @@ -75,6 +77,11 @@ AnalysisFactory.prototype.create = function(configuration, definition, callback)

return done(null, new Analysis(rootNode));
},
function analysis$checkLimits(analysis, done) {
requirements.checkLimits(analysis, function(err) {
return done(err, analysis);
});
},
function analysis$register(analysis, done) {
databaseService.registerAnalysisInCatalog(analysis, function(err) {
return done(err, analysis);
Expand All @@ -100,10 +107,8 @@ AnalysisFactory.prototype.create = function(configuration, definition, callback)
if (err && err.message && err.message.match(/permission denied/i)) {
err = new Error('Analysis requires authentication with API key: permission denied.');
}

return callback(err);
}

return callback(null, analysis);
}
);
Expand Down
46 changes: 45 additions & 1 deletion lib/node/node.js
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈

'use strict';

var util = require('util');
Expand All @@ -9,6 +10,7 @@ dot.templateSettings.strip = false;
var id = require('../util/id');
var QueryBuilder = require('../filter/query-builder');
var Validator = require('./validator');
var NodeRequirements = require('./requirements');

var TYPE = require('./type');
module.exports.TYPE = TYPE;
Expand All @@ -28,7 +30,6 @@ var STATUS = {

module.exports.STATUS = STATUS;


var NODE_RESERVED_KEYWORDS = {
type: 1,
typeSignature: 1,
Expand Down Expand Up @@ -90,6 +91,8 @@ function Node() {
this.typeSignature = null;

this.version = 0;

this.requirements = new NodeRequirements(this);
}

Node.prototype.id = function(skipFilters) {
Expand Down Expand Up @@ -468,3 +471,44 @@ function validate(validator, params, expectedParamName) {

return param;
}

Node.prototype.computeRequirements = function(databaseService, limits, callback) {
var default_limit = null; // no limit by default
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use something like Infinity here?


// The default comptueRequirements methos computes the number estimated of output rows of the node.
// Two possible approaches to do this are:
// 1. use the node's query, e.g.
// this.requirements.setEstimatedNumberOfRowsFromQuery(databaseService, default_limit, limits, callback);
// (we could also use setNumberOfRowsFromQuery for the exact number of rows)
// 2. reduce overhead by basing the estimation on the previously computed estimations of the input nodes;
// here we would use some simple, common heuristic, such as:
// this.requirements.setMaxInputNumberOfRows(limits, default_limit);
// return callback(null);
// And we'd need to override this for node classes for which this not acceptable

this.requirements.setMaxInputNumberOfRows(limits, default_limit);
return callback(null);
};

Node.prototype.requirementMessages = function() {
var messages = [];

// Note that any falsy value (null, 0, '', ...) means no limits:
if (this.requirements.getLimit('maximumNumberOfRows')) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would hide these details in requirements,

if (this.requirements.getEstimatedRequirement('numberOfRows') >
this.requirements.getLimit('maximumNumberOfRows')) {
messages.push('too many result rows');
}
}

return messages;
};

Node.prototype.validateRequirements = function() {
var messages = this.requirementMessages();
var err = null;
if (messages.length > 0) {
err = new Error(messages.join('\n'));
}
return err;
};
6 changes: 6 additions & 0 deletions lib/node/nodes/aggregate-intersection.js
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,9 @@ var queryAggregateTemplate = Node.template([
'WHERE ST_Intersects(_cdb_analysis_source.the_geom, _cdb_analysis_target.the_geom)',
'GROUP BY {{=it.groupByColumns}}'
].join('\n'));

AggregateIntersection.prototype.computeRequirements = function(databaseService, limits, callback) {
// we estimate the maximum possible number of rows of the result
this.requirements.setProductInputNumberOfRows(limits, 1000000);
return callback(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, not use callbacks with synchronous code.

Copy link
Contributor Author

@jgoizueta jgoizueta Nov 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case we do need the callback parameter, so we have a common interface for all nodes, because some node classes need to perform asynchronous operations in the computeRequirements function (executing SQL code in the database). This particular specialization of computeRequirements is synchronous, but we need to provide for the other cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, but be careful with CPU intensive tasks and consider use process.nextTick() if required.

};
12 changes: 12 additions & 0 deletions lib/node/nodes/buffer.js
Original file line number Diff line number Diff line change
Expand Up @@ -108,3 +108,15 @@ Buffer.prototype.isolinesDissolvedQuery = function() {
_query: this.source.getQuery()
});
};

Buffer.prototype.computeRequirements = function(databaseService, limits, callback) {
var default_limit;
if (this.dissolved) {
default_limit = 10000;
}
else {
default_limit = 1000000;
}
this.requirements.setSingleInputNumberOfRows('source', limits, default_limit);
return callback(null);
};
5 changes: 5 additions & 0 deletions lib/node/nodes/georeference-admin-region.js
Original file line number Diff line number Diff line change
Expand Up @@ -56,3 +56,8 @@ var queryTemplate = Node.template([
' ) AS the_geom',
'FROM ({{=it.source}}) AS _camshaft_georeference_admin_region_analysis'
].join('\n'));

GeoreferenceAdminRegion.prototype.computeRequirements = function(databaseService, limits, callback) {
this.requirements.setSingleInputNumberOfRows('source', limits, 1000);
return callback(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No callback is necessary here

};
7 changes: 7 additions & 0 deletions lib/node/nodes/georeference-country.js
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,10 @@ var queryTemplate = Node.template([
' cdb_dataservices_client.cdb_geocode_admin0_polygon({{=it.country}}) AS the_geom',
'FROM ({{=it.source}}) AS _camshaft_georeference_country_analysis'
].join('\n'));

GeoreferenceCountry.prototype.computeRequirements = function(databaseService, limits, callback) {
// given that country polygons are large and there are less than 200 countries in the world
// we'll set a modest default limit for the number of resulting rows
this.requirements.setSingleInputNumberOfRows('source', limits, 500);
return callback(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same.

};
12 changes: 12 additions & 0 deletions lib/node/nodes/georeference-postal-code.js
Original file line number Diff line number Diff line change
Expand Up @@ -77,3 +77,15 @@ var queryTemplate = Node.template([
' ) AS the_geom',
'FROM ({{=it.source}}) AS _camshaft_georeference_postal_code_analysis'
].join('\n'));

GeoreferencePostalCode.prototype.computeRequirements = function(databaseService, limits, callback) {
var default_limit;
if (getGecoderFunction(this.output_geometry_type) === 'cdb_geocode_postalcode_polygon') {
default_limit = 10000;
}
else {
default_limit = 1000000;
}
this.requirements.setSingleInputNumberOfRows('source', limits, default_limit);
return callback(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same. This makes code more complex than needed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sorry that we need this at the moment 😬 , but we'll get rid of it in the next generation 😁

};
31 changes: 31 additions & 0 deletions lib/node/nodes/line-sequential.js
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
'use strict';

var Node = require('../node');
var NodeRequirements = require('../requirements');
var debug = require('../../util/debug')('analysis:line-sequential');

var TYPE = 'line-sequential';
Expand Down Expand Up @@ -43,3 +44,33 @@ var routingSequentialQueryTemplate = Node.template([
' {{? it.category_column }}GROUP BY {{=it.category_column}}{{?}}',
') _cdb_analysis_line_sequential'
].join('\n'));

LineSequential.prototype.computeRequirements = function(databaseService, limits, callback) {
var inputRows = this.source.requirements.getEstimatedRequirement('numberOfRows');
var numCategories = 1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of closured variables, I'd pass them as parameter in function calls instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And receive them from callbacks (asynchronous) or from returned values of functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since: a) this code will be eventually replaced b) it seems to work c) the closure vars that bother you are encapsulated in a single function, I'll keep this as is, for fear of introducing new bugs. are you OK with it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

go ahead!

var self = this;
var rowsLimit = NodeRequirements.getNodeLimit(limits, TYPE, 'maximumNumberOfRows', 1000000);
var pointsLimit = NodeRequirements.getNodeLimit(limits, TYPE, 'maximumNumberOfPointsPerLine', 100000);
function storeRequirements(err) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd try to avoid closure functions. Extract it at module level or use private methods (prototype._storeRequirements) instead

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the same reasons mention above, I'd prefer to keep this as is (I consider this throw-away code have our limits working until we implement proper limit-checking)

// we make a rough estimate of numCategories result rows and inputRows/numCategories points per line
self.requirements.setEstimatedRequirement('numberOfRows', numCategories);
self.requirements.setEstimatedRequirement('numberOfPointsPerLine', inputRows/(numCategories || 1));
self.requirements.setLimit('maximumNumberOfRows', rowsLimit);
self.requirements.setLimit('maximumNumberOfPointsPerLine', pointsLimit);
return callback(err);
}
storeRequirements(null);
};

LineSequential.prototype.requirementMessages = function() {
var messages = [];
if (this.requirements.getEstimatedRequirement('numberOfRows') >
this.requirements.getLimit('maximumNumberOfRows')) {
messages.push('too many result rows');
}
if (this.requirements.getEstimatedRequirement('numberOfPointsPerLine') >
this.requirements.getLimit('maximumNumberOfPointsPerLine')) {
messages.push('too many points per line');
}
return messages;
};
12 changes: 12 additions & 0 deletions lib/node/nodes/line-source-to-target.js
Original file line number Diff line number Diff line change
Expand Up @@ -59,3 +59,15 @@ LineSourceToTarget.prototype.sql = function() {

return sql;
};

LineSourceToTarget.prototype.computeRequirements = function(databaseService, limits, callback) {
var default_limit = 1000000;
if (this.closest) {
this.requirements.setSingleInputNumberOfRows('source', limits, default_limit);
} else {
// this is the absolute maximum number of lines; it could be greatly reduced
// if this.node.source_column && this.node.target_column
this.requirements.setProductInputNumberOfRows(limits, default_limit);
}
return callback(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't use callbacks for synchronous functions.

};
7 changes: 6 additions & 1 deletion lib/node/nodes/source.js
Original file line number Diff line number Diff line change
Expand Up @@ -30,5 +30,10 @@ Source.prototype.sql = function() {
* @returns {Node.STATUS}
*/
Source.prototype.getStatus = function() {
return Node.STATUS.READY;
return Node.STATUS.READY; // TODO: this ignores the possibility of requirements exceeding the limits
};

Source.prototype.computeRequirements = function(databaseService, limits, callback) {
// By default we don't limit the number of rows of source nodes
this.requirements.setEstimatedNumberOfRowsFromQuery(databaseService, null, limits, callback);
};
Loading