Question about columns names assignment #213

jgoizueta · 2016-10-21T13:45:29Z

I have a doubt; I'm thinking about ways to have nodes generate its requirements using its input nodes' requirements, and I was looking at how the column names for each node are computed.

I see here that columns are obtained for each node, and that the queries to obtain the columns will be executed asynchronously.

Since the columns of a node may depend on the columns of its input nodes (imagine the node generates a query that involves its input column names) I would have thought that computing the column need to be performed sequentially.

I'm probably missing something (or many things!) but can't this have a problem if the columns for a node are requested before the nodes it depend on have had their columns assigned?

jgoizueta · 2016-10-25T13:41:30Z

I'm not sure if my concerns were valid, but I guess e4be2ab would solve it anyway.

/cc @dgaubert

dgaubert · 2016-10-27T07:51:44Z

Actually nodes that belong to the same graph are performed sequentially (see create function) using recursion (asynchronous recursion ¿? ).

e4be2ab solves another problem (regenerating a cache node) whether that node is used in two graphs (map with two layers with analysis performing the same node).

dgaubert · 2016-10-27T11:57:44Z

@Hey @jgoizueta! I was totally wrong on my previous explanation about e4be2ab:

The problem was whether you have an analysis graph that needs two node inputs and those child nodes have the same input node(s) that needs an intermediate table (cached). Child nodes were performed in parallel and sometimes it raised a race condition when the intermediate table was recreating. For instance:

                                 (child-1: source) <- [ C1: kmeans ] <- [ C0: source ] 
[ B2: line-source-to-target ] <-
                                 (child-2: target) <- [ A2: weighted-centroid ] <- [ C1: kmeans ] <- [ C0: source ]

Both child-1 and child-2 were performed in parallel using async.map() and C1 is a cached node, when C0 changes the intermediate table needs to be recreated and some scenarios when child-1 was getting input columns child-2 was recreating cached table raising a DB error. So, I've used async.mapSeries() to perform node children in series, so C1 in child-2 is already recalculated avoiding race conditions.

Hope this helps.

jgoizueta added the question label Oct 21, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about columns names assignment #213

Question about columns names assignment #213

jgoizueta commented Oct 21, 2016

jgoizueta commented Oct 25, 2016

dgaubert commented Oct 27, 2016

dgaubert commented Oct 27, 2016 •

edited

Loading

Question about columns names assignment #213

Question about columns names assignment #213

Comments

jgoizueta commented Oct 21, 2016

jgoizueta commented Oct 25, 2016

dgaubert commented Oct 27, 2016

dgaubert commented Oct 27, 2016 • edited Loading

dgaubert commented Oct 27, 2016 •

edited

Loading