Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Understanding TSP2D #8

Open
jasontlam opened this issue Jul 10, 2018 · 12 comments
Open

Understanding TSP2D #8

jasontlam opened this issue Jul 10, 2018 · 12 comments

Comments

@jasontlam
Copy link

Hello,

I was going through the code for s2v_tsp2d and was wondering if you could clarify something. When you call the predict function you use arg_max to select the best node even though it is a minimization problem. I also tried setting int sign = -1; in tsp2d_env and still got minimum tour lengths. Could you explain where the minimization is occurring ?

Thank you for your time !

@yywe
Copy link

yywe commented Jul 10, 2018

I guess part of reason is because of the insert strategy. The arg_max function will choose the next vertex to grow, then the selected vertex will be inserted into the sequence in somewhere lead to the minimal increase of the total tour length.

@jasontlam
Copy link
Author

But since we get the same results by flipping the sign of the reward does it not mean we are not really learning the policy ? When the nodes are inserted in a place where it always minimizes the distance, it seems like the node we are picking does not really matter.

@yywe
Copy link

yywe commented Jul 10, 2018

the arriving order of the next vertex matters according to my experiments. Say you have k nodes, each time one node arrives and you insert it. Different arriving order will lead to different order after the insert operation. The Q network here just learning which vertex to pick next, as for the inner order after picking the vertex, it is maintained by the insert function.

@Hanjun-Dai
Copy link
Owner

thanks for @stweiyy 's reply. Yes we have external maintainer for constructing the tour. Some heuristics like random insertion, closest insertion are all using this maintainer. So we try to see whether we can learn better than those heuristics.

But with this maintainer, the RL becomes a bit weird, since it suggest more on the long term reward -- and that's why we find flipping the reward would also help (since RL typically is biased towards short-term rewards).

@fantasy-fish
Copy link

But in the sense of minimizing the expected sum of rewards in the future(which is what Q-value means), it should be negative? right?

@Hanjun-Dai
Copy link
Owner

No, you are maximizing the expected reward. So flipping the sign is a trick which doesn't quite make sense, but it just works well in this particular case...

@fantasy-fish
Copy link

By the way, is it possible to learn a mechanism which predicts the position of the newly added node?

@Hanjun-Dai
Copy link
Owner

Then it is equivalent to choose the node and append to the end of tour (i.e., without the post-processing helper). It also learns up to some degree, but is not as good as the one with helper.

@Amirinezhad
Copy link

Hello,

I have a question, where do you have implemented helper function in code?

Thanks.

@Hanjun-Dai
Copy link
Owner

What do you mean by 'helper function?'.

For example for tsp2d, all the cpp implementations can be found under:
https://github.com/Hanjun-Dai/graph_comb_opt/tree/master/code/s2v_tsp2d/tsp2d_lib

And the python code is just a wrapper for it.

@Amirinezhad
Copy link

Amirinezhad commented Aug 5, 2019

In the paper (page 6, table 1), there is a helper function in the tsp2d problem that is for inserting operations, is it right? I want to know how you have implemented this. Is it implemented as an explicit c++ function or is it written on other functions?

@Hanjun-Dai
Copy link
Owner

It is implemented in the environment. The RL agent is not aware of it.

https://github.com/Hanjun-Dai/graph_comb_opt/blob/master/code/s2v_tsp2d/tsp2d_lib/src/lib/tsp2d_env.cpp#L65

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants