Understanding TSP2D #8

jasontlam · 2018-07-10T16:27:52Z

Hello,

I was going through the code for s2v_tsp2d and was wondering if you could clarify something. When you call the predict function you use arg_max to select the best node even though it is a minimization problem. I also tried setting int sign = -1; in tsp2d_env and still got minimum tour lengths. Could you explain where the minimization is occurring ?

Thank you for your time !

The text was updated successfully, but these errors were encountered:

yywe · 2018-07-10T17:01:41Z

I guess part of reason is because of the insert strategy. The arg_max function will choose the next vertex to grow, then the selected vertex will be inserted into the sequence in somewhere lead to the minimal increase of the total tour length.

jasontlam · 2018-07-10T17:34:27Z

But since we get the same results by flipping the sign of the reward does it not mean we are not really learning the policy ? When the nodes are inserted in a place where it always minimizes the distance, it seems like the node we are picking does not really matter.

yywe · 2018-07-10T17:42:05Z

the arriving order of the next vertex matters according to my experiments. Say you have k nodes, each time one node arrives and you insert it. Different arriving order will lead to different order after the insert operation. The Q network here just learning which vertex to pick next, as for the inner order after picking the vertex, it is maintained by the insert function.

Hanjun-Dai · 2018-11-04T12:39:48Z

thanks for @stweiyy 's reply. Yes we have external maintainer for constructing the tour. Some heuristics like random insertion, closest insertion are all using this maintainer. So we try to see whether we can learn better than those heuristics.

But with this maintainer, the RL becomes a bit weird, since it suggest more on the long term reward -- and that's why we find flipping the reward would also help (since RL typically is biased towards short-term rewards).

fantasy-fish · 2018-11-16T02:37:42Z

But in the sense of minimizing the expected sum of rewards in the future(which is what Q-value means), it should be negative？ right?

Hanjun-Dai · 2018-11-16T17:07:43Z

No, you are maximizing the expected reward. So flipping the sign is a trick which doesn't quite make sense, but it just works well in this particular case...

fantasy-fish · 2018-11-17T00:08:49Z

By the way, is it possible to learn a mechanism which predicts the position of the newly added node?

Hanjun-Dai · 2018-11-17T11:40:56Z

Then it is equivalent to choose the node and append to the end of tour (i.e., without the post-processing helper). It also learns up to some degree, but is not as good as the one with helper.

Amirinezhad · 2019-08-04T14:21:16Z

Hello,

I have a question, where do you have implemented helper function in code?

Thanks.

Hanjun-Dai · 2019-08-05T16:58:57Z

What do you mean by 'helper function?'.

For example for tsp2d, all the cpp implementations can be found under:
https://github.com/Hanjun-Dai/graph_comb_opt/tree/master/code/s2v_tsp2d/tsp2d_lib

And the python code is just a wrapper for it.

Amirinezhad · 2019-08-05T17:33:35Z

In the paper (page 6, table 1), there is a helper function in the tsp2d problem that is for inserting operations, is it right? I want to know how you have implemented this. Is it implemented as an explicit c++ function or is it written on other functions?

Hanjun-Dai · 2019-08-06T19:17:16Z

It is implemented in the environment. The RL agent is not aware of it.

https://github.com/Hanjun-Dai/graph_comb_opt/blob/master/code/s2v_tsp2d/tsp2d_lib/src/lib/tsp2d_env.cpp#L65

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Understanding TSP2D #8

Understanding TSP2D #8

jasontlam commented Jul 10, 2018

yywe commented Jul 10, 2018

jasontlam commented Jul 10, 2018

yywe commented Jul 10, 2018 •

edited

Loading

Hanjun-Dai commented Nov 4, 2018

fantasy-fish commented Nov 16, 2018

Hanjun-Dai commented Nov 16, 2018

fantasy-fish commented Nov 17, 2018

Hanjun-Dai commented Nov 17, 2018

Amirinezhad commented Aug 4, 2019

Hanjun-Dai commented Aug 5, 2019

Amirinezhad commented Aug 5, 2019 •

edited

Loading

Hanjun-Dai commented Aug 6, 2019

Understanding TSP2D #8

Understanding TSP2D #8

Comments

jasontlam commented Jul 10, 2018

yywe commented Jul 10, 2018

jasontlam commented Jul 10, 2018

yywe commented Jul 10, 2018 • edited Loading

Hanjun-Dai commented Nov 4, 2018

fantasy-fish commented Nov 16, 2018

Hanjun-Dai commented Nov 16, 2018

fantasy-fish commented Nov 17, 2018

Hanjun-Dai commented Nov 17, 2018

Amirinezhad commented Aug 4, 2019

Hanjun-Dai commented Aug 5, 2019

Amirinezhad commented Aug 5, 2019 • edited Loading

Hanjun-Dai commented Aug 6, 2019

yywe commented Jul 10, 2018 •

edited

Loading

Amirinezhad commented Aug 5, 2019 •

edited

Loading